User Guide

CEDS User Guide

Table of Contents

1. Input Data
2. CEDS System Code
- 2.1. Miscellaneous Coding Notes
- 2.2. R Issues
3. How to Include Supplemental Combustion Energy Activity in CEDS
Modules

1. Input Data

1.1. General Assumptions

1.1.1. Updating The Year Range

To extend the system to run to a later year, the key input data file to change is the BP energy statistics. Overwrite the current file with a more recent version. The file is located in /input/energy/. Note that this file needs to be in .xlsx format.

Then update the parameters BP_years and end_year in the file: /code/parameters/common_data.r

Note that the BP data must exert to the latest years specified here.

Clean and re-run make. The emissions data should now extend to the latest year specified. The data are simply extrapolated, updating emission inventory data (and detailed IEA energy data) will produce a more accurate estimate for recent years.

1.1.2. Adding A New Sector

CEDS has two types of sectors (set in the file Master_Fuel_Sector_List.xlsx):

combustion sectors: Emissions from these sectors use energy data by fuel and sector as driver data. Default emissions are calculated by multiplying an emission factor times fuel consumption (minus an optional control fraction).
non-combustion sectors: Emissions from these sectors use some other data (default is population) as driver data. (Also referred to in CEDS documents as process emissions.) Default emissions are read-in from an external inventory source, user data, or a sector-specific script. Note that, physically, emissions from a CEDS non-combustion sector may be from fuel combustion. This designation refers only to how emissions are calculated within the CEDS system.

Adding a process (non-combustion) emission sector

In addition to indicating your data’s sector in your data source (the U.* file you used to import the data), you will need to edit 2 files in CEDS. They are:

CEDS/input/mappings/Master_Sector_Level_map.csv
- Add a new row to the spreadsheet where appropriate. The row will contain five columns of data:
  1. The detailed sector name: a unique sector ID (one word)
  2. working sectors v1 and
  3. working sector v2: these can be either your detailed name or a first-level aggregation; I think they may not be used in the model itself but are process documentation
  4. The aggregate sector: if appropriate, the aggregate sector name will be identical to an existing aggregate sector
  5. Figure_sector: this should be identical to an existing Figure_sector: this is the category in which your data will be displayed in CEDS graphical outputs
CEDS/input/mappings/Master_Fuel_Sector_List.xlsx
- Add a new row to the spreadsheet at the appropriate location in the “Sectors” sheet only. This row will contain 4 columns of data:
  1. The detailed sector name
  2. The activity type
  3. Units of analysis
  4. Type: comb (combustion) or NC (non-combustion)

1.2. Energy Data

The core data needed to run the data system is the IEA OECD and non-OECD energy statistics.

1.2.1. Adding or updating the IEA Energy Statistics

The IEA energy statistics database needs to be purchased from the IEA and the data exported into csv format in order to run the CEDS system. The instructions below refer to the cd-rom distribution: the entire IEA energy database needs to be exported for use in the data system.

Steps to import the IEA energy data

Export the statistics for OECD and non-OECD countries into two .csv files
1. The first column is full name (spelled out).
2. The second column is flow (as IEA abbreviation, because names are not unique otherwise. To change to abbreviation, click on the flow icon, then go to Dimensions → Change label).
3. The third column is fuel (spelled out).

(To export from the IEA beyond 20/20 data browser, drag the icon for country to the left to form a column and icon for time to the right to create a row with years. Then drag the icon for flows between the column for countries and data for the first year; it will add a column for flows. Then drag the icon for fuel between column for flows and data for the first year. This will result in a large table that contains all the data that can then be exported as a csv.)

In a text editor:
1. Replace .., c, and x, in the data values with zeros (note these can occur at end of lines)
2. Get rid of special characters and apostrophe’s
  1. Côte d’Ivoire → Cote dIvoire
  2. Dem. People’s Rep. of Korea → Dem. Peoples Rep. of Korea
  3. People’s Republic of China → Peoples Republic of China
  4. Curaçao → Curacao.

If the data is the same release used in the version of the CEDS system that you have (you can check this in the metadata file that is released with the system) then there are no further steps.

However, if you are using a newer (or older) version of the IEA/OECD statistics, then the following additional steps are needed.

Update year ranges in code\parameters\common_data.R. To replace the IEA data from 2012 edition to 2015 edition, change the parameter IEA_years ← 1960:2010 to IEA_years ← 1960:2013.

The BP energy statistics is used to extend data to 2014. If you use the IEA data from 2015 edition, change the parameter BP_years ← 2011:2014 to BP_years ← 2014.

If there are new countries or new country names - the master country list will need to be updated input\mappings\Master_Country_List.csv.
If there are any new fuels these might need to be updated in the master fuel list input\mappings\energy\IEA_product_fuel.csv.

1.3. Process Emissions Driver Data

In order to more accurately extend process emissions time series, driver data for the appropriate emissions time series is needed.

In the first phase of this project, where we are focusing on recent decades, complete, consistent time series estimates exist for most emissions (e.g. EDGAR, FAO, etc.). For this reason, process emissions driver data are not critical to this first phase and most of this data has not been incorporated.

1.3.1. User Added Process Emissions

The User can add process (non combustion) emissions to CEDS by adding inventory files or instructions for using processed inventory files (from module E) in the intermediate_output folder.

Individual Files

CSV files with process emissions data may be added to input/default-emissions-data/non-combustion-emissions folder. Files should be named with "U.<em>_" followed by a description or identifier. The system will not import files named without the .<em> (example "U.SO2"). Clean commands (executed by the make file) will delete files in the folder with "C." Files should be in standard CEDS format with column headings iso-sector-fuel-units-Xyears similar to output emissions and EF files produced by the system. Year columns must be in the format “Xyear” such as X1980 or X2005. Files may contain any number of emission years in any order. Script will automatically order years and linearly interpolate between years. This script does not extend emissions to other years outside given data. Files must contain iso-sector-fuel-units. Entries that are not exact matches for those 4 id columns to entries in CEDS NC_database will not be added. The script automatically filters out entries which are not mapped to non combustion sectors (designated by input/mappings/Master_Sector_Fuel_LIst.xlsx) or have “process” as fuel.

Select Emissions Inventories from module E

Data lines from processed inventory files (from module E) in the intermediate_output folder may be adding lines to input/default-emissions-data/non-combustion-emissions/add_inventory_instructions.csv

Data specified must be inv - the name of the inventory file in the intermediate-output file such as E.SO2_EMEP_NFR09_inventory em - the emission species iso - country code inv_sector - exact match of the name of the inventory sector specified in the inventory file (inv) ceds_sector - the CEDS sector the emissions should be matched too

Data must be mapped to non combustion sectors (designated by input/mappings/Master_Sector_Fuel_LIst.xlsx).

2. CEDS System Code

2.1. Miscellaneous Coding Notes

Note when using GREP to select input files, that one cannot grep for "OC", for example, as this will also capture "NMVOC". You must use an appropriate wildcard match that distinguishes between "NMVOC" and "OC", and "CO" vs "CO2".

2.2. R Issues

If you encounter an error where a package is reported to not be available even though you installed is already, try installing without specifying a lib argument (e.g., install.packages( 'package-name' ) ) so that the package is installed in the default location. (Note that GUI’s such as RStudio might sometimes install a package in the wrong place.)
When continually running code from individual R scripts, using the function logStart() (called in the initialize function at the beginning of every script) without logStop() (called at the end of every script) will keep the log files open. An R session can only handle so many open log files before the following error occurs:
```
Error in sink(paste(logpath, fn, ".log", sep = ""), split = T) :
  sink stack is full
```
To resolve, clear the global environment manually or by restarting the R session.
Similar to the error above, having too many files open can create the following error:
```
Error in textConnection("rval", "w", local = TRUE) :
  all connections are in use
```
To resolve, enter the command closeAllConnections() into the console.

3. How to Include Supplemental Combustion Energy Activity in CEDS

CEDS has the capacity to dynamically include user-defined activity in a number of ways. This section outlines how to include supplemental combustion activity data in a run of CEDS.

3.1. Formatting the Data

Every supplemental dataset is required to be in a .csv format and must be accompanied by a corresponding instructions file. Additionally, a mapping (.xlsx) file is required for any dataset that is not already in the standard CEDS format.

These files are tied together by their root filename, with the non-data files specified by an extension of -instructions.csv or -mapping.xlsx. All files must be saved to the folder input/extension/user-defined-energy in order to be included. For example, your extension directory might look like this:

input/
├── extension/
│   ├── user-defined-energy/
│   │   ├── mydata.csv
│   │   ├── mydata-instructions.csv
│   │   ├── mydata-mapping.xlsx
│   │   ├── USA_historical_coal.csv
│   │   └── USA_historical_coal-instructions.csv
... ...

If the files are formatted correctly, they need only be placed in this folder, and CEDS will automatically identify and process the data.

Below is a detailed guide to creating and formatting these files.

3.1.1. The Data File: [filename].csv

The data file is expected in wide form. There must be exactly one column giving information on the country, and at least one column giving the fuel type. Additionally, one or two columns are allowed for specifying sector depending on the level of specificity. The activity data itself should have year or Xyear headers (e.g. 1950, 1951 or X1950, X1951).

A dataframe in CEDS format with all allowed columns might look like this:

iso	agg_fuel	CEDS_fuel	agg_sector	CEDS_sector	X1970	…
deu	coal	coal_coke	1A1_Energy-transformation	1A1a_Electricity-public	1150.79	…

3.1.2. The Mapping File: [filename]-mapping.xlsx

Since CEDS operates under the principle of preserving raw input data when possible, the input dataset does not need to be neatly named to CEDS sectors and fuels. The purpose of the mapping file is so the system can identify how input data corresponds to CEDS data.

There should be one sheet in this Excel file for each ID column in the input data, and the sheet names must be the name of the resulting CEDS column. If a data ID column is already in CEDS form, no mapping sheet is needed. There are five possible sheet names:

CEDS_sector
CEDS_fuel
agg_sector
agg_fuel
iso

Any mapping file may include any or all of these, as needed. Other sheets will not be identified.

Each sheet should contain two columns, one headed by the name of the column (same as the sheet name) and the other bearing the header corresponding to the header in the data frame. The data in the columns are the equivalent IDs.

The following is an example of what a mapping sheet titled "CEDS_sector" might look like:

my_sector_name	CEDS_sector
public_electric	1A1a_Electricity-public
auto_electric	1A1a_Electricity-autoproducer
heat_production	1A1a_Heat-production

The raw data corresponding to this example could look something like this:

iso	my_sector_name	agg_fuel	X1970	…
usa	public_electric	oil	16.21	…
usa	auto_electric	oil	105.5	…
usa	heat_production	oil	124.8	…

In the case that your data cannot be easily mapped, you can make use of the parameter preprocessing_script described in section 3.2 below. If no mapping file is included, it is assumed the data is already correctly mapped.

3.1.3. The Instructions File: [filename]-instructions.csv

The instructions file is the place to define any parameters for how specifically to process the input dataset. This file is used to determine both which data to bring into the system from your dataset, and how it should be integrated into the default data.

The instructions file should have a row for each combination of data in the corresponding data file:

iso	CEDS_fuel	CEDS_sector	start_year	end_year	options…
deu	coal_coke	1A1a_Electricity-public	1931	1934	…
deu	hard_coal	1A1a_Electricity-public	1932	1936	…
deu	brown_coal	1A1a_Electricity-public	1931	1936	…
deu	coal_coke	1A1a_Electricity-autoproducer	1931	1936	…

This example shows all of the necessary columns for reading in data with CEDS_fuel and CEDS_sector specificity. To include all sectors, simply leave that column out of the instructions file, or alternatively provide the sector name all. CEDS provides several options (listed in Section 3.2 below) for specifying how to integrate the supplemental data into the default data.

Tip	These instructions must be in CEDS ID form because they specify how the system will use the data once mapped—they correspond directly to components of the CEDS activity data.

3.2. Use Instructions Options

There are several use instructions that can be specified by the user. If a given option is not included, it will be set to the default. These options can be set for each row of the instructions file for a dataset by including a column with the option as the header (case-sensitive).

priority is a tool for manually specifying the order in which datasets are included in the system (see Default Order in Notes). Priority is given as integers; data with priority 1 will be dominant over priority 2, which will be dominant over data with no priority specified. Defaults to NA.
override_normalization takes a boolean argument. If TRUE, the data’s aggregate group will not be normalized during incorporation into the activity dataframe. Defaults to FALSE.
use_as_trend takes a boolean argument. If TRUE, the data will be used as a trend rather than as raw data; values will be scaled to CEDS values for a given match_year. Defaults to FALSE.
match_year takes an integer year argument. Required if use_as_trend is TRUE, otherwise defults to NA.
start_continuity is used to specify whether data should be made continuous at its beginning. Takes a boolean; defaults to TRUE.
end_continuity (see start_continuity)
specified_breakdowns takes a boolean. This is used to indicate that you have included a percent breakdowns file that will be used instead of CEDS default percent breakdowns to disaggregate your dataset. The file must be named [filename]-breakdowns.csv, and it must include data at the most disaggregate CEDS level.
interpolation_method defines how to treat missing values in the data. Must be one of the following:
- linear (default)
- match_to_default — fills in missing values based on the trend of the default activity data
- match_to_trend — fills in missing values based on a trend provided by the user; if specified, the parameter matching_file_name must be present
matching_file_name is the name of a file containing values to be used as a trend for interpolating missing values from the data. Columns outside of the years specified by start_year and end_year will be ignored. Defaults to NA.
preprocessing_script is the name of an R script to be run before attempting to map or load the data associated with this instruction. Expects a file path relative to the user-defined-energy directory.

3.3. Operations

This section details some of the major functions of the user data processing system.

3.3.1. Mapping

Occurs during pre-processing of data, but after running any user pre-processing script. This section uses user-specified *-mapping.xlsx files to bring data into CEDS form. Any data at the detail level of CEDS_fuel or CEDS_sector will be automatically have the aggregate fuel or sector mapped on.

3.3.2. Interpolation

Interpolation occurs during pre-processing of data. The process fills holes in data that has gaps or that has less-than-annual (e.g. every 5 years) data. Interpolation can occur linearly (the default) or on a trend specified in the Interp_instructions sheet of [filename]-instructions.csv.

3.3.3. Normalization

Normalization is the process by which data is included in the greater activity database without losing aggregate totals. CEDS activity defaults are generated by using percentage breakdowns to disaggregate high-level (aggregate fuel per country) data. When user-specified data is added, the system will include it by offsetting the user-defined changes in other areas of the aggregate group.

By adding specific fuel by sector activity in one place, CEDS adjusts the breakdown of fuel activity, not the total fuel activity.

Normalization Exceptions:

Whole-group overwrite: if all elements of an aggregate group are specified, the aggregate sum is overwritten (see Batching).
If a user-specified subset exceeds an aggregate group total, that total will be overwritten.

3.3.4. Batching

If several instructions correspond to the same aggregate group, these instructions will need to be processed together all at once. Groups of user data in the same batch are handled as a single input, in that they are normalized in one step. In the case that a user specifies rows of data for an entire aggregate group for a given time period, they will be batched together and will overwrite the normalization process. If they have different but overlapping year ranges, each dataset will be subsetted to year ranges allowing for the processing of overlapping sections separate from non-overlapping sections.

3.3.5. Enforcing Continuity

By default, user-specified data is made continuous with the CEDS defaults at its beginning and end. The data are linearly adjusted over a specified year range (7 years by default, fewer if necessary) so that the value of the first year represents 1/7 new data and 6/7 CEDS data and the value of the 7th year is 6/7 of the new data plus 1/7 of CEDS data.

3.4. Notes

3.4.1. Default Order of Operations

Instructions are ordered by:

Priority
Aggregation specificity
Start year

Meaning that all data with high priority will supersede data with lower priority; within equal priority, more specific data will supersede less specific (more aggregate) data; and, all else being equal, older data will supersede newer data. This order only matters if more than one dataset will impact the same activity cell.

Modules

The Community Emissions Data System (CEDS) is at its core a selection of R scripts and data files linked together by a Makefile. CEDS is flexible to user input. Throughout the system are built-in mechanics for automatically identifying and processing user-added data and scripts.

CEDS code execution is divided into modules, groups of code executed together for a common purpose. The nine CEDS modules are as follows:

Name	Purpose
Module A	Activity and driver data processing
Module B	Combustion emissions factors
Module C	Non-combustion emissions and emissions factors
Module D	Default emissions calculations
Module E	Emissions inventory processing
Module F	Scaling to inventories
Module G	Gridding
Module H	Historical extension
Module S	Summary and final data processing

This documentation provides information module by module. To find instructions for a desired change or input, identify the module purpose which best fits the aspect of CEDS you will change.

1. Module A

Module A runs initial processing on driver data, and creates the total activity driver database.

Module A is not designed to be as flexible as the other modules. Preserving Module A defaults is recommended, except where overwriting a particular input. In general, additional supplemental data is best added later in the system.

Module A is unique in CEDS in that it contains no emissions-specific processing. It handles activity and driver data, and not emissions or emissions factors. Because of this, Module A only needs to be executed once even during a recursive make.

1.1. Module A.1

Population data is created from UN and HYDE population inputs. Adjustments to population data must be made in these inputs or in A1.1.UN_pop_WB_HYDE_extension.R.
A.1* contains other driver scripts dependent on only population (biomass dataset, pre-processing of IEA energy data, coal heat content). Pre-processing emissions-nonspecific scripts can be added to this section.

1.2. Module A.2

Module A.2 handles specific adjustments to IEA data, including converting to CEDS sectors and fuels.

1.3. Module A.3, A.4

Modules A.3 and A.4 handle expanding the activity database to include complete CEDS specificity and fuel/sector combinations. The results of this section are the activity databases A.comb_activity.csv and A.NC_activity_energy.csv, which store activity data defaults used throughout CEDS.

1.4. Module A.5

A.5 is responsible for processing various non-combustion drivers.

1.5. Specifying combustion vs. non-combustion sectors

Combustion Energy data is primarily from IEA and BP data (processed in Module A2-A4), while non-combustion driver data is from various sources (Module A5).
Combustion or non-combustion sectors are specified in the Master_Fuel_Sector_List.xlsx. IEA process sectors are identified in IEA_process_sector.csv.
The important distinction between combustion and non-combustion activity is driver; combustion sectors have fuel drivers, while non-combustion sectors have proxy process driver data (population, pulp paper production, etc.).

2. Module B

Module B is responsible for processing combustion emissions factors.

2.1. Structure

Module B executes in 3 steps:

B1.1 creates blank or base-level databases for default emissions factors, activity data, and default emissions (B1.1.base_…)
B1.2 reformats specific datasets and use header functions to add the results to their databases. (B1.2.add_…) There can be any number of “add” scripts per section.
B1.3 “processes” activity (B1.3.proc_…)

Module B uses a parental structure to call scripts. “B1.1_base_comb_EF.R” and “B1.2.add_comb_EF.R” are the only two scripts executed by the Makefile. Each script identifies and executes a series of other scripts based on the emissions species, for example

if ( em == "BC" || em == "OC" ){
  scripts <- c( 'B1.2.add_BCOC_recent_control_percent.R' )
}
…
invisible( lapply( scripts, source_child ) )

Any script added to the list “scripts” as a string will be executed by the parent script.

There are two types of B1.2 files. Some files generate processed data as intermediate output files, creating data on control percents, ash content, etc. (most notably for emissions species SO₂). Other scripts read in all data files of a certain type, which may have been produced earlier in B1.2, or may have been included as defaults.

2.2. Adding a Processing Script

Adding a processing script to Module B requires:

A script in the module-B folder, named according to conventions described in the CEDS style guide.
A change to whichever parent file is appropriate for sourcing the new script.
Any input data will need to be included in the input folder.

An example of a change in a parent script: if I want to add a new BCOC processing file, 'B1.2.add_BCOC_additional.R', the above would become:

if ( em == "BC" || em == "OC" ){
  scripts <- c( 'B1.2.add_BCOC_recent_control_percent.R', 'B1.2.add_BCOC_additional.R')
}

This modular structure means that no changes to the Makefile are needed to add scripts in Module B.

2.3. Adding Combustion Emissions Factor Data

Raw emissions factors can be directly incorporated into the CEDS emissions factor database.

Save the data in a .csv file with columns for iso, fuel, sector, unit (usually "fraction"), and data years in Xyears, in the folder input/default-emissions-data/EF_parameters/. Name the file U.[em]_*[suffix].csv where * represents any descriptive, meaningful title for the data and [suffix] is any of the following:

Pattern	Use
"_EF"	Adds the data as raw emissions factors
"_control_percent"	Adds the data as control percents (SO₂ only)
"_s_ash_ret"	Adds data as sulfur ash retention data (SO₂ only)
"s_content"	Adds data as sulfur content data (SO₂ only)

Files without any of these suffixes, or without the emissions species in the file name, are ignored.

2.4. Output Files

Module B outputs B.[em]_comb_EF_db.csv, a database of combustion emissions factors.

Scripts in the "add" section (B1.2) also produce files to the folder input/default-emissions-data/EF_parameters/.

3. Module C

Module C is responsible for processing non-combustion emissions and emissions factors.

3.1. Structure

Module C follows the same three-part structure as Module B:

C1.1 creates blank or base-level databases for default emissions factors, activity data, and default emissions (C1.1.base_…)
C1.2 reformats specific datasets and use header functions to add the results to their databases. (C1.2.add_…) There can be any number of “add” scripts per section. C1.2 uses a parent script model.
C1.3 “processes” activity (C1.3.proc_…) C1.3 (the “process” group) does not use a parent script model, so adding a process script requires editing the Makefile.

3.2. Adding Non-Combustion Emissions Factors

Non-combustion emissions factors can be added to Module C without the inclusion of a new script.

Save a dataset as a .csv file in the folder input/non-combustion-emissions with headers indicating iso, fuel, sector, and years of emissions (the data will be in wide form — a column for each year). The filename must contain the emissions species.

3.3. Adding Non-Combustion Emissions

4. Module D

Module D contains a single script for initializing emissions databases based on driver and activity data calculated in modules A through C. It is relatively inflexible and is meant to bridge emissions factors + drivers and emissions. It calculates emissions, creating a default that will be scaled and extended by modules F and H.

5. Module E

Module E processes emissions inventories. Each script is tailored to its particular inventory. Each script outputs a processed form of the raw inventory made compatible for CEDS analysis.

Module E is typically executed immediately after Module A.

5.1. Scripts

Typically, Module E scripts have three sections.

The first defines inventory-specific parameters: file paths, year ranges, etc.
The second reads in and processes the data, shaping the inventory to a standard format (wide-form, iso tags) but does not map to CEDS sectors or fuels.
The third writes the data to intermediate-output.

Module E scripts diverge from this format when further data processing is required to make scripts in standard form.

5.2. Adding an Inventory Processing Script

Add raw input files to input/emissions-inventories/
Add a processing script to the module-E/ folder
Add a section of code to the Makefile in the area handling emissions inventories. The line should look like the following (for example script “E.myinventory_emissions.R”):
```
# process emissions from 'myinventory'
$(MED_OUT)/E.$(EM)_myinventory.csv : \
	$(MOD_E)/E.myinventory_emissions.R
	Rscript $< $(EM) --nosave --no-restore
```
This code indicates that “module-E/E.myinventory_emissions.R” needs to be executed as an Rscript, and that it will produce the output file E.[em]_myinventory.csv.

6. Module F

The purpose of Module F is to scale subsets of CEDS emissions data to the emissions data reported in other inventories. In doing so, CEDS reinforces its accuracy at an aggregate level while retaining the specificity of CEDS fuels, sectors and isos that distinguish the model from the scaling inventories.

6.1. Structure of Execution

Module F consists of:

A header file, emissions_scaling_functions.R
- The header file contains generalized functions that are called in each scaling script. These functions are used to read and write data, apply mapping files, and perform scaling calculations.
A parent script, F1.inventory_scaling.R
- The parent script calls inventory-specific scaling script depending on the emissions species.
A series of scaling scripts corresponding to each inventory, (e.g. F1.1.UNFCCC_scaling.R)
- Each scaling script reads in an inventory dataset and updates the default data in the CEDS data sets.
Mapping files for each inventory dataset used

Module F is executed by running the parent script. Depending on the emissions species provided, the parent script calls a series of scaling scripts, which execute scaling and then write to an intermediate output file to be scaled by the next script. Scaling the same region more than once will overwrite the earlier scaled values. This means that the order of the scaling scripts is important, and inventories with greater accuracy should be included later to avoid being overwritten by a less accurate inventory.

6.2. Structure of Scripts

Each Scaling script has a similar structure:

Section 0: Universal section, the same for all scripts
Section 1: Defines inventory-specific variables such as file names, countries, years the inventory includes, and scaling method
Section 1.5: Import inventory-specific data and put in standard inventory format (iso-sector-fuel-years or iso-sector/fuel-years)
Section 2: Read in all other scaling data and define variables using scaling functions
Section 3: Aggregate CEDS and inventory data to scaling sectors/fuels using scaling functions
Section 4: Calculate scaling factors and apply scaling factors to default emissions and emission factors using scaling functions
Section 5: Write scaled data to intermediate output file

Section 1 – 1.5 are unique to each inventory used for scaling. Sections 0, 2-5 can be identical for all scaling scripts, unless the user would like to define different default options in Section 4 to create scaling factors with the function “F.scaling”.

6.3. Required Files

Inventory files can be excel sheets that are imported and processed to standard format within the scaling routine (ex. Canada), or imported and processes within Module E (ex. UNFCCC). By section 2, inventory data must be in standard form with iso, CEDS sector/fuel (or both) columns and years in Xyear format.

Mapping files define how to relate scaling inventory and CEDS default data through scaling sectors or scaling fuels, as well iso-sector-fuel specific options for scaling routines. Mapping file templates for the 3 possible scaling methods (sector, fuel, both) are in the documentation file. Mapping files must be xlsx spreadsheet with 3 sheets named ‘map’, ’method’, and ‘year’ and located in the CEDS/input/mappings/scaling folder.

The “map” sheet relates the inventory data to the CEDS data by scaling method: either fuel, sector or both. It relates the inventory sector/fuel to the scaling sector/fuel and the scaling sector/fuel to CEDS sector/fuel. For example using the sector scaling method, the inv_sector column maps to the scaling_sector column, and the ceds_sector column maps to the scaling_sector column, but the inv_sector column does not map to the ceds_sector column. Entries on the same row in the inv_sector and ceds_sector columns have no meaning. Inventory sectors/fuels or CEDS sectors/fuels should only be mapped to one scaling sector (although multiple sectors/fuels can be mapped to one scaling sector). If an inventory or CEDS sector/fuel is mapped to more than one scaling sector/fuel, the system will match to the first pair in the data frame. The selected scaling sectors/fuels are applied to all countries in the inventory. An example map sheet:

inv_sector	scaling_sector	ceds_sector	Notes
Electricity and gas supply	energy	1A1a_Electricity-public
Industry_Electricity	energy	1A1a_Heat-production
Industry_Oil refinery	other-transformation	1A1bc_Other-transformation
…	…	…	…

The “method” sheet defines interpolation and extrapolation methods for handling data if they differ from the default. The F.scaling function is used to execute the instructions in this sheet. Method sheet columns:

iso: can be "all" or specific isos
scaling_sector: cannot be "all". Must be specified for each sector.
other: [[[[[[[[[ I DON’T KNOW WHAT OTHER DOES ]]]]]]]]]
pre_ext_method: how the data will be extended backward in time from its beginning
interp_method: how internal holes in data will be filled
post_ext_method: how the data will be extended forward in time from its end

An example method sheet:

iso	scaling_sector	other	pre_ext_method	interp_method	post_ext_method
twn	SLV	2000	linear_1	linear	constant
twn	waste_water	2000	linear_1	linear	constant
twn	waste-incineration	2000	linear_1	linear	constant
twn	AGR	2000	linear_1	linear	constant
twn	rail	1999	linear_1	linear	constant

Extension methods:

method	description	valid columns
constant	use the edge scaling factor constantly across all extension years	all
linear	extend the scaling factor trend linearly	all
linear_1	linearly extend the scaling factor to reach a value of 1 in the final extension year	post_ext_method, pre_ext_method

The "year" sheet defines the year extent of the processes defined in the "method" sheet. It allows the user to extend scaling factor to different years for individual iso-sector/fuels. It follows a similar structure to the "method" sheet with these columns:
- iso: can be "all" or specific isos
- scaling_sector: cannot be "all". Must be specified for each sector.
- pre_ext_year: The year in which the scaling data will begin (after extension, if necessary)
- post_ext_year: The year in which the scaling data will end (after extension, if necessary)

6.4. Defined Variables

The following variables must be defined in Section 1 of any scaling script in order to use the modular Sections 2-5.

inventory_data_file - the name of the inventory file, without the extension
inv_data_folder - name of the path to the folder the inventory file is in, from domainmapping.csv (usually "EM_INV" for emissions-inventories/)
sector_fuel_mapping - the name of the inventory mapping file, without the extension
mapping_method - mapping method. Must be "sector", "fuel", or "both"
inv_name - name of the inventory (for labeling diagnostic/intermediate output, not for reading input files)
region - iso countries included in the inventory
inv_years - years covered by the inventory

6.5. Scaling Functions

The following functions are used throughout Module F. They are defined in code/parameters/emissions_scaling_functions

F.readScalingData( inventory=inventory_data_file, inv_data_folder, mapping=sector_fuel_mapping, method=mapping_method, region, inv_name, inv_years )

Reads in all scaling data, defines variables for scaling and assigns them to the global environment.
F.invAggregate( std_form_inv, region, mapping_method, zeroed_terms=c(NA, 'NA', 'NA ', '-'))

Aggregates inventory data to scaling sectors/fuels. There are no user-defined options in this function.
F.cedsAggregate( input_em, region, method=mapping_method )

Aggregates CEDS data to scaling sectors/fuels. There are no user-defined options in this function.
F.scaling( ceds_data, inv_data, region, ext_start_year=start_year, ext_end_year=end_year, ext=TRUE, interp_default='linear', pre_ext_default='constant', post_ext_default='constant', replacement_method='none', max_scaling_factor=100, replacement_scaling_factor=max_scaling_factor )

Calculates scaling factors where both inventory and CEDS data are available. Interpolates and extends scaling factors forward and backward if ‘ext’ = TRUE. Also checks and replaces scaling factors if too small or too large.

Parameters:
- ext_start_year - Year to extend scaling factors back to. Defaults to global environment variable ‘start_year’ (1960)
- ext_end_year - Year to extend scaling factors forward to. Defaults to global environment variable ‘end_year’ (2014)
- interp_default - Default interpolation method for scaling factors within the inventory years. Either ‘interpolation’ or ‘constant’. Defaults to linear interpolation.
- pre_ext_default - Default extrapolation method for pre inventory years. Either ‘interpolation’ or ‘constant’. Defaults to ‘constant’.
- post_ext_default - Default extrapolation method for post inventory years. Either ‘interpolation’ or ‘constant’. Defaults to ‘constant’.
- replacement_method - Either 'none' or ‘replace’. If ‘replace’ then function checks scaling factors and replaces values above and below the threshold defined by max_scaling_factor.
- max_scaling_factor - If replacement method == ‘replace,’ Scaling factors greater than max_scaling_factor and less than 1/max_scaling_factor are replaced by replacement_scaling_factor or 1/replacement_scaling_factor, respectively.
- replacement_scaling_factor - value to replace too large scaling factors with. Defaults to max_scaling_factor. Small values are replaced by 1/replacement_scaling_factor.
F.applyScale (scaling_factors)

Applies scaling factors to CEDS default data. Creates scaled EF and scaled emissions.
F.write( scaled_ef=scaled_ef, scaled_em=scaled_em, domain="MED_OUT")

Writes scaled emissions factors to intermediate output folder.

6.6. Value Metadata

Module F tracks scaling by collecting scaling value metadata. The script global_settings.R contains a boolean switch, Write_value_metadata; if TRUE, CEDS will generate value metadata reports across every combination of fuel, sector, iso, and year indicating which scaling factors were applied and whether the cell was scaled directly to an inventory or to an extension of an inventory.

The output file of this process is F.[em]_scaled_EF-value_metadata.csv.

Two diagnostic pieces of code, code/diagnostic/Create_Val_Metadata_Heatmap.R and code/diagnostic/Create_Master_Val_Meta_Heatmap.R, provide functions for analyzing and displaying graphically trends in the value metadata.

7. Module G

Module G handles gridding, the process by which spatial distributions of CEDS final emissions are calculated.

7.1. Structure

Module G is composed of three main sections. Each section executes 4 scripts. Scripts are executed sequentially; no parent model is used. Twelve grids are created for each year (monthly emissions, incorporating seasonality) from 1750-2014, for each emissions species and sector.

The three main sections are:

G1 creates yearly spatial grids.
G2 chunks these grids in 50-year groups.
G3 creates grids and chunks for methane RCP.

Each section has four scripts; these each handle a different type of input data.

G*.1 handles bulk emissions. The input data for this grid is CEDS final emissions by country and sector (no fuel information) for all sectors except aircraft. These scripts handle each emissions species and each sector.
G*.2 handles subVOC emissions. For NMVOC emissions, individual grids are generated for each VOC.
G*.3 handles aircraft emissions. In addition to 12 monthly grids for each year, aircraft emissions have 25 levels of gridding corresponding to different altitudes.
G*.4 handles emissions from solid biofuels only.

7.2. General Methodology

Spatial distributions are generated by applying CEDS final emissions to normalized country-level spatial proxy data. Spatial proxies are chosen for each gridding sector, emissions species, and year in input/gridding/gridding-mappings/proxy_mapping.csv.

8. Module H

Module H is responsible for the extension of CEDS data from 1960 back to 1750. This documentation corresponds to the planned restructuring of Module H conducted in July 2017 and may not correspond to the code yet.

8.1. Default extension

CEDS activity is extended using CDIAC trends, which hold information per aggregate fuel and country.

8.2. Supplemental Combustion Activity Data

Module H contains modular scripts for including user-specified activity data for combustion sectors in order to allow the flexible addition of specific reliable datasets.

For in-depth instructions on including this data see the User Guide.

9. Module S

Module S conducts final processing and summary procedures. This is the last Module in the CEDS system. Its input is intermediate-output/[em]_total_CEDS_emissions.csv and its output is a series of final emissions breakdowns and summaries, notably

9.1. Code Structure

The main body of Module S is contained in a single script, S1.1.write_summary_data.R.

Module S begins by reading in the final emissions disaggregated data. The script aggregates the data to all levels required.

The script then checks if an older run of this emissions species is present in the final output folder (which is not wiped clean by the Makefile during an execution of make clean-all).

If no older data is present, the script writes its summary files.

If an older dataset is present for this emissions species, the script executes a comparison between the two datasets. The script overwrites the old data but also produces a series of diagnostic files exploring differences between the outputs of the two runs in the diagnostics/ folder.

The script then sources three files:

Figures.R creates and outputs a series of figures to the summary-plots/ folder including global emissions graphs and further aggregations.
Compare_to_RCP.R is called except when the emissions species is 'CO2'.
- This script creates global, regional, and sectoral comparisons between the CEDS output and the RCP inventory emissions as *.csv files in the ceds-comparisons/ subfolder of diagnostic-output/.
- It also produces graphical comparisons of the same data.
Compare_to_GAINS.R is called except when the emissions species is 'CO2' or 'NH3'.
- This script creates global comparisons, including specific comparisons for residential and non-residential emissions.
- It also produces graphical comparisons of the same data.

9.2. CEDS Final Outputs

Module S produces the following summary files:

All bunker (international aviation and shipping) emissions, S.[em]_bunker_emissions.csv
CEDS final emissions aggregated to different levels:
- Aggregated to each country and aggregate sector, CEDS_[em]_emissions_by_country_sector.csv
- Aggregate to country totals, CEDS_[em]_emissions_by_country.csv
- Global emissions per specific fuel, CEDS_[em]_global_emissions_by_fuel.csv
- Emissions aggregated to CEDS sectors and countries, CEDS_[em]_emissions_by_country_CEDS_sector.csv
- Global emissions per CEDS sector, CEDS_[em]_global_emissions_by_CEDS_sector.csv

Each file is also suffixed by the date of the execution of the run.

If the results of the current run are different from those of previous run of the CEDS system for this emissions species, the following comparison diagnostics are produced, if there is relevant data.

Files that show changes in rows and columns:
- CEDS_[em]_emissions_by_country_sector_dropped-rows
- CEDS_[em]_emissions_by_country_sector_added-rows
- CEDS_[em]_emissions_by_country_sector_dropped-cols
- CEDS_[em]_emissions_by_country_sector_added-cols
A percent comparison identifying changes between the two outputs, CEDS_[em]_emissions_by_country_sector_comparison

10. The Makefile

CEDS is executed using a makefile system. A single file, called the Makefile and saved in the main CEDS folder, contains instructions for the execution of the entire CEDS system.

The Makefile is execute on the command line of your choice using the command make * where * is a valid command line argument.

Some Makefile execution commands:

`make all`	Executes a run of CEDS for each valid emissions species except CH4
`make CO2-emissions`	Executes CEDS for emissions species CO2, or any other specified emissions species (generic `make [em]-emissions`)
`make clean-all`	Deletes all intermediate, diagnostic, and final output files
`make clean-modB`	Deletes all files output by Module B (valid for all modules)
`make clean-CO2`	Deletes all intermediate files relating to CO2

The Makefile is made up of "Code blocks". Each code block is headed by the output file that will be created, and is followed by all of the input files and scripts required to create that file. Most code blocks will include an indicator that one or more Rscripts should be executed.

If an input file is missing, or if an Rscript fails to create an intermediate file needed by another script, the Makefile will throw an error, saying that there is no rule to build the missing file.

11. The Parameters Folder

The CEDS code contains a "parameters" folder. This folder stores header files. These files are sourced at the beginning of some scripts to load functions and global data.

The files in this folder are as followed:

File	Contains
analysis_functions.R	Functions that map to CEDS, check if all sectors/countries/fuels are present.
common_data.R	Global variables, e.g. years, default conversion factors.
data_functions.R	Various data processing functions, e.g. %!in%, replacing data, build CEDS template, remove NAs or blanks, etc.
diagnostic_functions.R	A function that compares two identically formatted dataframes for equality.
emissions_scaling_functions.R	All functions specific to Module F, e.g. value-metadata functions, scaling functions, functions that add to scaled databases, etc.
global_settings.R	To be called at the beginning of every script. Initializes CEDS version number and global options.
gridding_functions.R	All functions specific to Module G.
header.R	Contains functions required for initializing the log and for sourcing other parameters scripts. Contains the `initialize` function for smoothly sourcing header scripts and beginning the log.
interpolation_extension_functions.R	Contains functions for interpolate or extend time series data (NOT interpolate_NAs, extend on trend).
IO_functions.R	Contains readData, writeData, and printLog, along with other functions for reading in or outputting information.
ModH_extension_functions.R	All functions specific to Module H, including data processing, merging, and disaggregating functions.
nc_generation_functions.R	Some supplemental non-combustion gridding functions
process_db_functions.R	Contains functions for generically adding data to databases (e.g. addToEmissionsDb). Also contains a cleanData function.
timeframe_functions.R	Contains a series of helper functions for dealing with data time range, from identification to truncation.

12. The Diagnostic Folder

The CEDS code contains a "diagnostic" folder. This folder stores scripts used for processing graphics and figures for publication. It also contains comparison scripts that report how CEDS outputs compare to specific emissions inventories.

13. Troubleshooting Odd Results

13.1. Results ordered differently

CEDS should produce the same results when running on different machines or versions of R. It is possible, however, that the order of the results will differ when using an older version of R. Recent versions of R have changed the order that base::merge combines two data frames, which will produce final results in a different order, but with equal values.

13.2. Packages masking objects

When adding new scripts to CEDS, make sure to correctly scope functions from any packages you are using. For example, plyr and dplyr contain many functions with the same name but different functionality. You cannot count on CEDS to load packages in the order that you want and will get errors or wrong results if you make that assumption.