-
Notifications
You must be signed in to change notification settings - Fork 0
User Guide
- 1. Input Data
- 2. CEDS System Code
- 3. How to Include Supplemental Combustion Energy Activity in CEDS
- Modules
To extend the system to run to a later year, the key input data file to change is the BP energy statistics. Overwrite the current file with a more recent version. The file is located in /input/energy/. Note that this file needs to be in .xlsx format.
Then update the parameters BP_years and end_year in the file: /code/parameters/common_data.r
Note that the BP data must exert to the latest years specified here.
Clean and re-run make. The emissions data should now extend to the latest year specified. The data are simply extrapolated, updating emission inventory data (and detailed IEA energy data) will produce a more accurate estimate for recent years.
CEDS has two types of sectors (set in the file Master_Fuel_Sector_List.xlsx):
-
combustion sectors: Emissions from these sectors use energy data by fuel and sector as driver data. Default emissions are calculated by multiplying an emission factor times fuel consumption (minus an optional control fraction).
-
non-combustion sectors: Emissions from these sectors use some other data (default is population) as driver data. (Also referred to in CEDS documents as process emissions.) Default emissions are read-in from an external inventory source, user data, or a sector-specific script. Note that, physically, emissions from a CEDS non-combustion sector may be from fuel combustion. This designation refers only to how emissions are calculated within the CEDS system.
Adding a process (non-combustion) emission sector
In addition to indicating your data’s sector in your data source (the U.* file you used to import the data), you will need to edit 2 files in CEDS. They are:
-
CEDS/input/mappings/Master_Sector_Level_map.csv
-
Add a new row to the spreadsheet where appropriate. The row will contain five columns of data:
-
The detailed sector name: a unique sector ID (one word)
-
working sectors v1 and
-
working sector v2: these can be either your detailed name or a first-level aggregation; I think they may not be used in the model itself but are process documentation
-
The aggregate sector: if appropriate, the aggregate sector name will be identical to an existing aggregate sector
-
Figure_sector: this should be identical to an existing Figure_sector: this is the category in which your data will be displayed in CEDS graphical outputs
-
-
-
CEDS/input/mappings/Master_Fuel_Sector_List.xlsx
-
Add a new row to the spreadsheet at the appropriate location in the “Sectors” sheet only. This row will contain 4 columns of data:
-
The detailed sector name
-
The activity type
-
Units of analysis
-
Type: comb (combustion) or NC (non-combustion)
-
-
The core data needed to run the data system is the IEA OECD and non-OECD energy statistics.
The IEA energy statistics database needs to be purchased from the IEA and the data exported into csv format in order to run the CEDS system. The instructions below refer to the cd-rom distribution: the entire IEA energy database needs to be exported for use in the data system.
Steps to import the IEA energy data
-
Export the statistics for OECD and non-OECD countries into two .csv files
-
The first column is full name (spelled out).
-
The second column is flow (as IEA abbreviation, because names are not unique otherwise. To change to abbreviation, click on the flow icon, then go to Dimensions → Change label).
-
The third column is fuel (spelled out).
-
(To export from the IEA beyond 20/20 data browser, drag the icon for country to the left to form a column and icon for time to the right to create a row with years. Then drag the icon for flows between the column for countries and data for the first year; it will add a column for flows. Then drag the icon for fuel between column for flows and data for the first year. This will result in a large table that contains all the data that can then be exported as a csv.)
-
In a text editor:
-
Replace .., c, and x, in the data values with zeros (note these can occur at end of lines)
-
Get rid of special characters and apostrophe’s
-
Côte d’Ivoire → Cote dIvoire
-
Dem. People’s Rep. of Korea → Dem. Peoples Rep. of Korea
-
People’s Republic of China → Peoples Republic of China
-
Curaçao → Curacao.
-
-
If the data is the same release used in the version of the CEDS system that you have (you can check this in the metadata file that is released with the system) then there are no further steps.
However, if you are using a newer (or older) version of the IEA/OECD statistics, then the following additional steps are needed.
-
Update year ranges in code\parameters\common_data.R. To replace the IEA data from 2012 edition to 2015 edition, change the parameter IEA_years ← 1960:2010 to IEA_years ← 1960:2013.
The BP energy statistics is used to extend data to 2014. If you use the IEA data from 2015 edition, change the parameter BP_years ← 2011:2014 to BP_years ← 2014.
-
If there are new countries or new country names - the master country list will need to be updated input\mappings\Master_Country_List.csv.
-
If there are any new fuels these might need to be updated in the master fuel list input\mappings\energy\IEA_product_fuel.csv.
In order to more accurately extend process emissions time series, driver data for the appropriate emissions time series is needed.
In the first phase of this project, where we are focusing on recent decades, complete, consistent time series estimates exist for most emissions (e.g. EDGAR, FAO, etc.). For this reason, process emissions driver data are not critical to this first phase and most of this data has not been incorporated.
The User can add process (non combustion) emissions to CEDS by adding inventory files or instructions for using processed inventory files (from module E) in the intermediate_output folder.
CSV files with process emissions data may be added to input/default-emissions-data/non-combustion-emissions folder. Files should be named with "U.<em>_" followed by a description or identifier. The system will not import files named without the .<em> (example "U.SO2"). Clean commands (executed by the make file) will delete files in the folder with "C." Files should be in standard CEDS format with column headings iso-sector-fuel-units-Xyears similar to output emissions and EF files produced by the system. Year columns must be in the format “Xyear” such as X1980 or X2005. Files may contain any number of emission years in any order. Script will automatically order years and linearly interpolate between years. This script does not extend emissions to other years outside given data. Files must contain iso-sector-fuel-units. Entries that are not exact matches for those 4 id columns to entries in CEDS NC_database will not be added. The script automatically filters out entries which are not mapped to non combustion sectors (designated by input/mappings/Master_Sector_Fuel_LIst.xlsx) or have “process” as fuel.
Data lines from processed inventory files (from module E) in the intermediate_output folder may be adding lines to input/default-emissions-data/non-combustion-emissions/add_inventory_instructions.csv
Data specified must be inv - the name of the inventory file in the intermediate-output file such as E.SO2_EMEP_NFR09_inventory em - the emission species iso - country code inv_sector - exact match of the name of the inventory sector specified in the inventory file (inv) ceds_sector - the CEDS sector the emissions should be matched too
Data must be mapped to non combustion sectors (designated by input/mappings/Master_Sector_Fuel_LIst.xlsx).
-
Note when using GREP to select input files, that one cannot grep for "OC", for example, as this will also capture "NMVOC". You must use an appropriate wildcard match that distinguishes between "NMVOC" and "OC", and "CO" vs "CO2".
-
If you encounter an error where a package is reported to not be available even though you installed is already, try installing without specifying a lib argument (e.g.,
install.packages( 'package-name' ) )
so that the package is installed in the default location. (Note that GUI’s such as RStudio might sometimes install a package in the wrong place.) -
When continually running code from individual R scripts, using the function
logStart()
(called in theinitialize
function at the beginning of every script) withoutlogStop()
(called at the end of every script) will keep the log files open. An R session can only handle so many open log files before the following error occurs:Error in sink(paste(logpath, fn, ".log", sep = ""), split = T) : sink stack is full
To resolve, clear the global environment manually or by restarting the R session.
-
Similar to the error above, having too many files open can create the following error:
Error in textConnection("rval", "w", local = TRUE) : all connections are in use
To resolve, enter the command
closeAllConnections()
into the console.
CEDS has the capacity to dynamically include user-defined activity in a number of ways. This section outlines how to include supplemental combustion activity data in a run of CEDS.
Every supplemental dataset is required to be in a .csv format and must be accompanied by a corresponding instructions file. Additionally, a mapping (.xlsx) file is required for any dataset that is not already in the standard CEDS format.
These files are tied together by their root filename, with the non-data files
specified by an extension of -instructions.csv
or -mapping.xlsx
. All files
must be saved to the folder input/extension/user-defined-energy
in order to be
included. For example, your extension
directory might look like this:
input/
├── extension/
│ ├── user-defined-energy/
│ │ ├── mydata.csv
│ │ ├── mydata-instructions.csv
│ │ ├── mydata-mapping.xlsx
│ │ ├── USA_historical_coal.csv
│ │ └── USA_historical_coal-instructions.csv
... ...
If the files are formatted correctly, they need only be placed in this folder, and CEDS will automatically identify and process the data.
Below is a detailed guide to creating and formatting these files.
The data file is expected in wide form. There must be exactly one column giving
information on the country, and at least one column giving the fuel type.
Additionally, one or two columns are allowed for specifying sector depending on
the level of specificity. The activity data itself should have year or
Xyear headers (e.g. 1950
, 1951
or X1950
, X1951
).
A dataframe in CEDS format with all allowed columns might look like this:
iso |
agg_fuel |
CEDS_fuel |
agg_sector |
CEDS_sector |
X1970 |
… |
deu |
coal |
coal_coke |
1A1_Energy-transformation |
1A1a_Electricity-public |
1150.79 |
… |
Since CEDS operates under the principle of preserving raw input data when possible, the input dataset does not need to be neatly named to CEDS sectors and fuels. The purpose of the mapping file is so the system can identify how input data corresponds to CEDS data.
There should be one sheet in this Excel file for each ID column in the input data, and the sheet names must be the name of the resulting CEDS column. If a data ID column is already in CEDS form, no mapping sheet is needed. There are five possible sheet names:
-
CEDS_sector
-
CEDS_fuel
-
agg_sector
-
agg_fuel
-
iso
Any mapping file may include any or all of these, as needed. Other sheets will not be identified.
Each sheet should contain two columns, one headed by the name of the column (same as the sheet name) and the other bearing the header corresponding to the header in the data frame. The data in the columns are the equivalent IDs.
The following is an example of what a mapping sheet titled "CEDS_sector" might look like:
my_sector_name |
CEDS_sector |
public_electric |
1A1a_Electricity-public |
auto_electric |
1A1a_Electricity-autoproducer |
heat_production |
1A1a_Heat-production |
The raw data corresponding to this example could look something like this:
iso |
my_sector_name |
agg_fuel |
X1970 |
… |
usa |
public_electric |
oil |
16.21 |
… |
usa |
auto_electric |
oil |
105.5 |
… |
usa |
heat_production |
oil |
124.8 |
… |
In the case that your data cannot be easily mapped, you can make use of the
parameter preprocessing_script
described in section 3.2 below. If no mapping
file is included, it is assumed the data is already correctly mapped.
The instructions file is the place to define any parameters for how specifically to process the input dataset. This file is used to determine both which data to bring into the system from your dataset, and how it should be integrated into the default data.
The instructions file should have a row for each combination of data in the corresponding data file:
iso |
CEDS_fuel |
CEDS_sector |
start_year |
end_year |
options… |
deu |
coal_coke |
1A1a_Electricity-public |
1931 |
1934 |
… |
deu |
hard_coal |
1A1a_Electricity-public |
1932 |
1936 |
… |
deu |
brown_coal |
1A1a_Electricity-public |
1931 |
1936 |
… |
deu |
coal_coke |
1A1a_Electricity-autoproducer |
1931 |
1936 |
… |
This example shows all of the necessary columns for reading in data with CEDS_fuel
and CEDS_sector specificity. To include all sectors, simply leave that column
out of the instructions file, or alternatively provide the sector name all
.
CEDS provides several options (listed in Section 3.2
below) for specifying how to integrate the supplemental data into the default data.
Tip
|
These instructions must be in CEDS ID form because they specify how the system will use the data once mapped—they correspond directly to components of the CEDS activity data. |
There are several use instructions that can be specified by the user. If a given option is not included, it will be set to the default. These options can be set for each row of the instructions file for a dataset by including a column with the option as the header (case-sensitive).
-
priority
is a tool for manually specifying the order in which datasets are included in the system (see Default Order in Notes). Priority is given as integers; data with priority 1 will be dominant over priority 2, which will be dominant over data with no priority specified. Defaults toNA
. -
override_normalization
takes a boolean argument. IfTRUE
, the data’s aggregate group will not be normalized during incorporation into the activity dataframe. Defaults toFALSE
. -
use_as_trend
takes a boolean argument. IfTRUE
, the data will be used as a trend rather than as raw data; values will be scaled to CEDS values for a givenmatch_year
. Defaults toFALSE
. -
match_year
takes an integer year argument. Required ifuse_as_trend
isTRUE
, otherwise defults toNA
. -
start_continuity
is used to specify whether data should be made continuous at its beginning. Takes a boolean; defaults toTRUE
. -
end_continuity
(seestart_continuity
) -
specified_breakdowns
takes a boolean. This is used to indicate that you have included a percent breakdowns file that will be used instead of CEDS default percent breakdowns to disaggregate your dataset. The file must be named[filename]-breakdowns.csv
, and it must include data at the most disaggregate CEDS level. -
interpolation_method
defines how to treat missing values in the data. Must be one of the following:-
linear
(default) -
match_to_default
— fills in missing values based on the trend of the default activity data -
match_to_trend
— fills in missing values based on a trend provided by the user; if specified, the parametermatching_file_name
must be present
-
-
matching_file_name
is the name of a file containing values to be used as a trend for interpolating missing values from the data. Columns outside of the years specified bystart_year
andend_year
will be ignored. Defaults toNA
. -
preprocessing_script
is the name of an R script to be run before attempting to map or load the data associated with this instruction. Expects a file path relative to theuser-defined-energy
directory.
This section details some of the major functions of the user data processing system.
Occurs during pre-processing of data, but after running any user pre-processing script. This section uses user-specified *-mapping.xlsx files to bring data into CEDS form. Any data at the detail level of CEDS_fuel or CEDS_sector will be automatically have the aggregate fuel or sector mapped on.
Interpolation occurs during pre-processing of data. The process fills holes in data that has gaps or that has less-than-annual (e.g. every 5 years) data. Interpolation can occur linearly (the default) or on a trend specified in the Interp_instructions sheet of [filename]-instructions.csv.
Normalization is the process by which data is included in the greater activity database without losing aggregate totals. CEDS activity defaults are generated by using percentage breakdowns to disaggregate high-level (aggregate fuel per country) data. When user-specified data is added, the system will include it by offsetting the user-defined changes in other areas of the aggregate group.
By adding specific fuel by sector activity in one place, CEDS adjusts the breakdown of fuel activity, not the total fuel activity.
Normalization Exceptions:
-
Whole-group overwrite: if all elements of an aggregate group are specified, the aggregate sum is overwritten (see Batching).
-
If a user-specified subset exceeds an aggregate group total, that total will be overwritten.
If several instructions correspond to the same aggregate group, these instructions will need to be processed together all at once. Groups of user data in the same batch are handled as a single input, in that they are normalized in one step. In the case that a user specifies rows of data for an entire aggregate group for a given time period, they will be batched together and will overwrite the normalization process. If they have different but overlapping year ranges, each dataset will be subsetted to year ranges allowing for the processing of overlapping sections separate from non-overlapping sections.
By default, user-specified data is made continuous with the CEDS defaults at its beginning and end. The data are linearly adjusted over a specified year range (7 years by default, fewer if necessary) so that the value of the first year represents 1/7 new data and 6/7 CEDS data and the value of the 7th year is 6/7 of the new data plus 1/7 of CEDS data.
Instructions are ordered by:
-
Priority
-
Aggregation specificity
-
Start year
Meaning that all data with high priority will supersede data with lower priority; within equal priority, more specific data will supersede less specific (more aggregate) data; and, all else being equal, older data will supersede newer data. This order only matters if more than one dataset will impact the same activity cell.
The Community Emissions Data System (CEDS) is at its core a selection of R scripts and data files linked together by a Makefile. CEDS is flexible to user input. Throughout the system are built-in mechanics for automatically identifying and processing user-added data and scripts.
CEDS code execution is divided into modules, groups of code executed together for a common purpose. The nine CEDS modules are as follows:
Name |
Purpose |
Module A |
Activity and driver data processing |
Module B |
Combustion emissions factors |
Module C |
Non-combustion emissions and emissions factors |
Module D |
Default emissions calculations |
Module E |
Emissions inventory processing |
Module F |
Scaling to inventories |
Module G |
Gridding |
Module H |
Historical extension |
Module S |
Summary and final data processing |
This documentation provides information module by module. To find instructions for a desired change or input, identify the module purpose which best fits the aspect of CEDS you will change.
Module A runs initial processing on driver data, and creates the total activity driver database.
Module A is not designed to be as flexible as the other modules. Preserving Module A defaults is recommended, except where overwriting a particular input. In general, additional supplemental data is best added later in the system.
Module A is unique in CEDS in that it contains no emissions-specific processing. It handles activity and driver data, and not emissions or emissions factors. Because of this, Module A only needs to be executed once even during a recursive make.
-
Population data is created from UN and HYDE population inputs. Adjustments to population data must be made in these inputs or in A1.1.UN_pop_WB_HYDE_extension.R.
-
A.1* contains other driver scripts dependent on only population (biomass dataset, pre-processing of IEA energy data, coal heat content). Pre-processing emissions-nonspecific scripts can be added to this section.
-
Module A.2 handles specific adjustments to IEA data, including converting to CEDS sectors and fuels.
-
Modules A.3 and A.4 handle expanding the activity database to include complete CEDS specificity and fuel/sector combinations. The results of this section are the activity databases A.comb_activity.csv and A.NC_activity_energy.csv, which store activity data defaults used throughout CEDS.
-
Combustion Energy data is primarily from IEA and BP data (processed in Module A2-A4), while non-combustion driver data is from various sources (Module A5).
-
Combustion or non-combustion sectors are specified in the Master_Fuel_Sector_List.xlsx. IEA process sectors are identified in IEA_process_sector.csv.
-
The important distinction between combustion and non-combustion activity is driver; combustion sectors have fuel drivers, while non-combustion sectors have proxy process driver data (population, pulp paper production, etc.).
Module B is responsible for processing combustion emissions factors.
Module B executes in 3 steps:
-
B1.1 creates blank or base-level databases for default emissions factors, activity data, and default emissions (
B1.1.base_…
) -
B1.2 reformats specific datasets and use header functions to add the results to their databases. (
B1.2.add_…
) There can be any number of “add” scripts per section. -
B1.3 “processes” activity (
B1.3.proc_…
)
Module B uses a parental structure to call scripts. “B1.1_base_comb_EF.R” and “B1.2.add_comb_EF.R” are the only two scripts executed by the Makefile. Each script identifies and executes a series of other scripts based on the emissions species, for example
if ( em == "BC" || em == "OC" ){
scripts <- c( 'B1.2.add_BCOC_recent_control_percent.R' )
}
…
invisible( lapply( scripts, source_child ) )
Any script added to the list “scripts” as a string will be executed by the parent script.
There are two types of B1.2 files. Some files generate processed data as intermediate output files, creating data on control percents, ash content, etc. (most notably for emissions species SO2). Other scripts read in all data files of a certain type, which may have been produced earlier in B1.2, or may have been included as defaults.
Adding a processing script to Module B requires:
-
A script in the module-B folder, named according to conventions described in the CEDS style guide.
-
A change to whichever parent file is appropriate for sourcing the new script.
-
Any input data will need to be included in the input folder.
An example of a change in a parent script: if I want to add a new BCOC processing file, 'B1.2.add_BCOC_additional.R', the above would become:
if ( em == "BC" || em == "OC" ){
scripts <- c( 'B1.2.add_BCOC_recent_control_percent.R', 'B1.2.add_BCOC_additional.R')
}
This modular structure means that no changes to the Makefile are needed to add scripts in Module B.
Raw emissions factors can be directly incorporated into the CEDS emissions factor database.
Save the data in a .csv file with columns for iso, fuel, sector, unit (usually "fraction"),
and data years in Xyears, in the folder input/default-emissions-data/EF_parameters/
.
Name the file U.[em]_*[suffix].csv
where *
represents any descriptive,
meaningful title for the data and [suffix] is any of the following:
Pattern |
Use |
"_EF" |
Adds the data as raw emissions factors |
"_control_percent" |
Adds the data as control percents (SO2 only) |
"_s_ash_ret" |
Adds data as sulfur ash retention data (SO2 only) |
"s_content" |
Adds data as sulfur content data (SO2 only) |
Files without any of these suffixes, or without the emissions species in the file name, are ignored.
Module C is responsible for processing non-combustion emissions and emissions factors.
Module C follows the same three-part structure as Module B:
-
C1.1 creates blank or base-level databases for default emissions factors, activity data, and default emissions (
C1.1.base_…
) -
C1.2 reformats specific datasets and use header functions to add the results to their databases. (
C1.2.add_…
) There can be any number of “add” scripts per section. C1.2 uses a parent script model. -
C1.3 “processes” activity (
C1.3.proc_…
) C1.3 (the “process” group) does not use a parent script model, so adding a process script requires editing the Makefile.
Non-combustion emissions factors can be added to Module C without the inclusion of a new script.
Save a dataset as a .csv
file in the folder input/non-combustion-emissions
with
headers indicating iso, fuel, sector, and years of emissions (the data will be in
wide form — a column for each year). The filename must contain the emissions species.
Module D contains a single script for initializing emissions databases based on driver and activity data calculated in modules A through C. It is relatively inflexible and is meant to bridge emissions factors + drivers and emissions. It calculates emissions, creating a default that will be scaled and extended by modules F and H.
Module E processes emissions inventories. Each script is tailored to its particular inventory. Each script outputs a processed form of the raw inventory made compatible for CEDS analysis.
Module E is typically executed immediately after Module A.
Typically, Module E scripts have three sections.
-
The first defines inventory-specific parameters: file paths, year ranges, etc.
-
The second reads in and processes the data, shaping the inventory to a standard format (wide-form, iso tags) but does not map to CEDS sectors or fuels.
-
The third writes the data to
intermediate-output
.
Module E scripts diverge from this format when further data processing is required to make scripts in standard form.
-
Add raw input files to
input/emissions-inventories/
-
Add a processing script to the
module-E/
folder -
Add a section of code to the Makefile in the area handling emissions inventories. The line should look like the following (for example script “E.myinventory_emissions.R”):
This code indicates that “module-E/E.myinventory_emissions.R” needs to be executed as an Rscript, and that it will produce the output file E.[em]_myinventory.csv.# process emissions from 'myinventory' $(MED_OUT)/E.$(EM)_myinventory.csv : \ $(MOD_E)/E.myinventory_emissions.R Rscript $< $(EM) --nosave --no-restore
The purpose of Module F is to scale subsets of CEDS emissions data to the emissions data reported in other inventories. In doing so, CEDS reinforces its accuracy at an aggregate level while retaining the specificity of CEDS fuels, sectors and isos that distinguish the model from the scaling inventories.
Module F consists of:
-
A header file,
emissions_scaling_functions.R
-
The header file contains generalized functions that are called in each scaling script. These functions are used to read and write data, apply mapping files, and perform scaling calculations.
-
-
A parent script,
F1.inventory_scaling.R
-
The parent script calls inventory-specific scaling script depending on the emissions species.
-
-
A series of scaling scripts corresponding to each inventory, (e.g.
F1.1.UNFCCC_scaling.R
)-
Each scaling script reads in an inventory dataset and updates the default data in the CEDS data sets.
-
-
Mapping files for each inventory dataset used
Module F is executed by running the parent script. Depending on the emissions species provided, the parent script calls a series of scaling scripts, which execute scaling and then write to an intermediate output file to be scaled by the next script. Scaling the same region more than once will overwrite the earlier scaled values. This means that the order of the scaling scripts is important, and inventories with greater accuracy should be included later to avoid being overwritten by a less accurate inventory.
Each Scaling script has a similar structure:
-
Section 0: Universal section, the same for all scripts
-
Section 1: Defines inventory-specific variables such as file names, countries, years the inventory includes, and scaling method
-
Section 1.5: Import inventory-specific data and put in standard inventory format (iso-sector-fuel-years or iso-sector/fuel-years)
-
Section 2: Read in all other scaling data and define variables using scaling functions
-
Section 3: Aggregate CEDS and inventory data to scaling sectors/fuels using scaling functions
-
Section 4: Calculate scaling factors and apply scaling factors to default emissions and emission factors using scaling functions
-
Section 5: Write scaled data to intermediate output file
Section 1 – 1.5 are unique to each inventory used for scaling. Sections 0, 2-5 can be identical for all scaling scripts, unless the user would like to define different default options in Section 4 to create scaling factors with the function “F.scaling”.
-
Inventory files can be excel sheets that are imported and processed to standard format within the scaling routine (ex. Canada), or imported and processes within Module E (ex. UNFCCC). By section 2, inventory data must be in standard form with iso, CEDS sector/fuel (or both) columns and years in Xyear format.
-
Mapping files define how to relate scaling inventory and CEDS default data through scaling sectors or scaling fuels, as well iso-sector-fuel specific options for scaling routines. Mapping file templates for the 3 possible scaling methods (sector, fuel, both) are in the documentation file. Mapping files must be xlsx spreadsheet with 3 sheets named ‘map’, ’method’, and ‘year’ and located in the
CEDS/input/mappings/scaling
folder.-
The “map” sheet relates the inventory data to the CEDS data by scaling method: either fuel, sector or both. It relates the inventory sector/fuel to the scaling sector/fuel and the scaling sector/fuel to CEDS sector/fuel. For example using the sector scaling method, the
inv_sector
column maps to thescaling_sector
column, and theceds_sector
column maps to thescaling_sector
column, but theinv_sector
column does not map to theceds_sector
column. Entries on the same row in theinv_sector
andceds_sector
columns have no meaning. Inventory sectors/fuels or CEDS sectors/fuels should only be mapped to one scaling sector (although multiple sectors/fuels can be mapped to one scaling sector). If an inventory or CEDS sector/fuel is mapped to more than one scaling sector/fuel, the system will match to the first pair in the data frame. The selected scaling sectors/fuels are applied to all countries in the inventory. An example map sheet:inv_sector
scaling_sector
ceds_sector
Notes
Electricity and gas supply
energy
1A1a_Electricity-public
Industry_Electricity
energy
1A1a_Heat-production
Industry_Oil refinery
other-transformation
1A1bc_Other-transformation
…
…
…
…
-
The “method” sheet defines interpolation and extrapolation methods for handling data if they differ from the default. The
F.scaling
function is used to execute the instructions in this sheet. Method sheet columns:-
iso: can be "all" or specific isos
-
scaling_sector: cannot be "all". Must be specified for each sector.
-
other: [[[[[[[[[ I DON’T KNOW WHAT OTHER DOES ]]]]]]]]]
-
pre_ext_method: how the data will be extended backward in time from its beginning
-
interp_method: how internal holes in data will be filled
-
post_ext_method: how the data will be extended forward in time from its end
-
An example method sheet:
iso
scaling_sector
other
pre_ext_method
interp_method
post_ext_method
twn
SLV
2000
linear_1
linear
constant
twn
waste_water
2000
linear_1
linear
constant
twn
waste-incineration
2000
linear_1
linear
constant
twn
AGR
2000
linear_1
linear
constant
twn
rail
1999
linear_1
linear
constant
-
Extension methods:
method
description
valid columns
constant
use the edge scaling factor constantly across all extension years
all
linear
extend the scaling factor trend linearly
all
linear_1
linearly extend the scaling factor to reach a value of 1 in the final extension year
post_ext_method, pre_ext_method
-
-
The "year" sheet defines the year extent of the processes defined in the "method" sheet. It allows the user to extend scaling factor to different years for individual iso-sector/fuels. It follows a similar structure to the "method" sheet with these columns:
-
iso: can be "all" or specific isos
-
scaling_sector: cannot be "all". Must be specified for each sector.
-
pre_ext_year: The year in which the scaling data will begin (after extension, if necessary)
-
post_ext_year: The year in which the scaling data will end (after extension, if necessary)
-
-
The following variables must be defined in Section 1 of any scaling script in order to use the modular Sections 2-5.
-
inventory_data_file
- the name of the inventory file, without the extension -
inv_data_folder
- name of the path to the folder the inventory file is in, from domainmapping.csv (usually "EM_INV" foremissions-inventories/
) -
sector_fuel_mapping
- the name of the inventory mapping file, without the extension -
mapping_method
- mapping method. Must be "sector", "fuel", or "both" -
inv_name
- name of the inventory (for labeling diagnostic/intermediate output, not for reading input files) -
region
- iso countries included in the inventory -
inv_years
- years covered by the inventory
The following functions are used throughout Module F. They are defined in
code/parameters/emissions_scaling_functions
-
F.readScalingData( inventory=inventory_data_file, inv_data_folder, mapping=sector_fuel_mapping, method=mapping_method, region, inv_name, inv_years )
Reads in all scaling data, defines variables for scaling and assigns them to the global environment.
-
F.invAggregate( std_form_inv, region, mapping_method, zeroed_terms=c(NA, 'NA', 'NA ', '-'))
Aggregates inventory data to scaling sectors/fuels. There are no user-defined options in this function.
-
F.cedsAggregate( input_em, region, method=mapping_method )
Aggregates CEDS data to scaling sectors/fuels. There are no user-defined options in this function.
-
F.scaling( ceds_data, inv_data, region, ext_start_year=start_year, ext_end_year=end_year, ext=TRUE, interp_default='linear', pre_ext_default='constant', post_ext_default='constant', replacement_method='none', max_scaling_factor=100, replacement_scaling_factor=max_scaling_factor )
Calculates scaling factors where both inventory and CEDS data are available. Interpolates and extends scaling factors forward and backward if ‘ext’ = TRUE. Also checks and replaces scaling factors if too small or too large.
Parameters:
-
ext_start_year
- Year to extend scaling factors back to. Defaults to global environment variable ‘start_year’ (1960) -
ext_end_year
- Year to extend scaling factors forward to. Defaults to global environment variable ‘end_year’ (2014) -
interp_default
- Default interpolation method for scaling factors within the inventory years. Either ‘interpolation’ or ‘constant’. Defaults to linear interpolation. -
pre_ext_default
- Default extrapolation method for pre inventory years. Either ‘interpolation’ or ‘constant’. Defaults to ‘constant’. -
post_ext_default
- Default extrapolation method for post inventory years. Either ‘interpolation’ or ‘constant’. Defaults to ‘constant’. -
replacement_method
- Either 'none' or ‘replace’. If ‘replace’ then function checks scaling factors and replaces values above and below the threshold defined bymax_scaling_factor
. -
max_scaling_factor
- If replacement method == ‘replace,’ Scaling factors greater than max_scaling_factor and less than 1/max_scaling_factor
are replaced byreplacement_scaling_factor
or 1/replacement_scaling_factor
, respectively. -
replacement_scaling_factor
- value to replace too large scaling factors with. Defaults to max_scaling_factor. Small values are replaced by 1/replacement_scaling_factor
.
-
-
F.applyScale (scaling_factors)
Applies scaling factors to CEDS default data. Creates scaled EF and scaled emissions.
-
F.write( scaled_ef=scaled_ef, scaled_em=scaled_em, domain="MED_OUT")
Writes scaled emissions factors to intermediate output folder.
Module F tracks scaling by collecting scaling value metadata.
The script global_settings.R
contains a boolean switch, Write_value_metadata
;
if TRUE
, CEDS will generate value metadata reports across every combination of fuel, sector, iso, and year
indicating which scaling factors were applied and whether the cell was scaled directly
to an inventory or to an extension of an inventory.
The output file of this process is F.[em]_scaled_EF-value_metadata.csv
.
Two diagnostic pieces of code, code/diagnostic/Create_Val_Metadata_Heatmap.R
and code/diagnostic/Create_Master_Val_Meta_Heatmap.R
, provide functions for
analyzing and displaying graphically trends in the value metadata.
Module G handles gridding, the process by which spatial distributions of CEDS final emissions are calculated.
Module G is composed of three main sections. Each section executes 4 scripts. Scripts are executed sequentially; no parent model is used. Twelve grids are created for each year (monthly emissions, incorporating seasonality) from 1750-2014, for each emissions species and sector.
The three main sections are:
-
G1 creates yearly spatial grids.
-
G2 chunks these grids in 50-year groups.
-
G3 creates grids and chunks for methane RCP.
Each section has four scripts; these each handle a different type of input data.
-
G*.1 handles bulk emissions. The input data for this grid is CEDS final emissions by country and sector (no fuel information) for all sectors except aircraft. These scripts handle each emissions species and each sector.
-
G*.2 handles subVOC emissions. For NMVOC emissions, individual grids are generated for each VOC.
-
G*.3 handles aircraft emissions. In addition to 12 monthly grids for each year, aircraft emissions have 25 levels of gridding corresponding to different altitudes.
-
G*.4 handles emissions from solid biofuels only.
Module H is responsible for the extension of CEDS data from 1960 back to 1750. This documentation corresponds to the planned restructuring of Module H conducted in July 2017 and may not correspond to the code yet.
CEDS activity is extended using CDIAC trends, which hold information per aggregate fuel and country.
Module S conducts final processing and summary procedures. This is the last Module
in the CEDS system. Its input is intermediate-output/[em]_total_CEDS_emissions.csv
and its output is a series of final emissions breakdowns and summaries, notably
The main body of Module S is contained in a single script, S1.1.write_summary_data.R
.
Module S begins by reading in the final emissions disaggregated data. The script aggregates the data to all levels required.
The script then checks if an older run of this emissions species is present in the
final output folder (which is not wiped clean by the Makefile during an execution
of make clean-all
).
If no older data is present, the script writes its summary files.
If an older dataset is present for this emissions species, the script executes
a comparison between the two datasets. The script overwrites the old data but also
produces a series of diagnostic files exploring differences between the outputs of
the two runs in the diagnostics/
folder.
The script then sources three files:
-
Figures.R
creates and outputs a series of figures to thesummary-plots/
folder including global emissions graphs and further aggregations. -
Compare_to_RCP.R
is called except when the emissions species is 'CO2'.-
This script creates global, regional, and sectoral comparisons between the CEDS output and the RCP inventory emissions as
*.csv
files in theceds-comparisons/
subfolder ofdiagnostic-output/
. -
It also produces graphical comparisons of the same data.
-
-
Compare_to_GAINS.R
is called except when the emissions species is 'CO2' or 'NH3'.-
This script creates global comparisons, including specific comparisons for residential and non-residential emissions.
-
It also produces graphical comparisons of the same data.
-
Module S produces the following summary files:
-
All bunker (international aviation and shipping) emissions,
S.[em]_bunker_emissions.csv
-
CEDS final emissions aggregated to different levels:
-
Aggregated to each country and aggregate sector,
CEDS_[em]_emissions_by_country_sector.csv
-
Aggregate to country totals,
CEDS_[em]_emissions_by_country.csv
-
Global emissions per specific fuel,
CEDS_[em]_global_emissions_by_fuel.csv
-
Emissions aggregated to CEDS sectors and countries,
CEDS_[em]_emissions_by_country_CEDS_sector.csv
-
Global emissions per CEDS sector,
CEDS_[em]_global_emissions_by_CEDS_sector.csv
-
Each file is also suffixed by the date of the execution of the run.
If the results of the current run are different from those of previous run of the CEDS system for this emissions species, the following comparison diagnostics are produced, if there is relevant data.
-
Files that show changes in rows and columns:
-
CEDS_[em]_emissions_by_country_sector_dropped-rows
-
CEDS_[em]_emissions_by_country_sector_added-rows
-
CEDS_[em]_emissions_by_country_sector_dropped-cols
-
CEDS_[em]_emissions_by_country_sector_added-cols
-
-
A percent comparison identifying changes between the two outputs,
CEDS_[em]_emissions_by_country_sector_comparison
CEDS is executed using a makefile system. A single file, called the Makefile and saved in the main CEDS folder, contains instructions for the execution of the entire CEDS system.
The Makefile is execute on the command line of your choice using the command
make *
where * is a valid command line argument.
Some Makefile execution commands:
|
Executes a run of CEDS for each valid emissions species except CH4 |
|
Executes CEDS for emissions species CO2, or any other specified emissions species (generic |
|
Deletes all intermediate, diagnostic, and final output files |
|
Deletes all files output by Module B (valid for all modules) |
|
Deletes all intermediate files relating to CO2 |
The Makefile is made up of "Code blocks". Each code block is headed by the output file that will be created, and is followed by all of the input files and scripts required to create that file. Most code blocks will include an indicator that one or more Rscripts should be executed.
If an input file is missing, or if an Rscript fails to create an intermediate file needed by another script, the Makefile will throw an error, saying that there is no rule to build the missing file.
The CEDS code contains a "parameters" folder. This folder stores header files. These files are sourced at the beginning of some scripts to load functions and global data.
The files in this folder are as followed:
File |
Contains |
analysis_functions.R |
Functions that map to CEDS, check if all sectors/countries/fuels are present. |
common_data.R |
Global variables, e.g. years, default conversion factors. |
data_functions.R |
Various data processing functions, e.g. %!in%, replacing data, build CEDS template, remove NAs or blanks, etc. |
diagnostic_functions.R |
A function that compares two identically formatted dataframes for equality. |
emissions_scaling_functions.R |
All functions specific to Module F, e.g. value-metadata functions, scaling functions, functions that add to scaled databases, etc. |
global_settings.R |
To be called at the beginning of every script. Initializes CEDS version number and global options. |
gridding_functions.R |
All functions specific to Module G. |
header.R |
Contains functions required for initializing the log and for sourcing other parameters scripts. Contains the |
interpolation_extension_functions.R |
Contains functions for interpolate or extend time series data (NOT interpolate_NAs, extend on trend). |
IO_functions.R |
Contains readData, writeData, and printLog, along with other functions for reading in or outputting information. |
ModH_extension_functions.R |
All functions specific to Module H, including data processing, merging, and disaggregating functions. |
nc_generation_functions.R |
Some supplemental non-combustion gridding functions |
process_db_functions.R |
Contains functions for generically adding data to databases (e.g. addToEmissionsDb). Also contains a cleanData function. |
timeframe_functions.R |
Contains a series of helper functions for dealing with data time range, from identification to truncation. |
The CEDS code contains a "diagnostic" folder. This folder stores scripts used for processing graphics and figures for publication. It also contains comparison scripts that report how CEDS outputs compare to specific emissions inventories.
CEDS should produce the same results when running on different machines or versions of R. It is possible, however, that the order of the results will differ when using an older version of R. Recent versions of R have changed the order that base::merge
combines two data frames, which will produce final results in a different order, but with equal values.
When adding new scripts to CEDS, make sure to correctly scope functions from any packages you are using. For example, plyr
and dplyr
contain many functions with the same name but different functionality. You cannot count on CEDS to load packages in the order that you want and will get errors or wrong results if you make that assumption.