Skip to content

Running ORAC

Adam Povey edited this page Feb 4, 2022 · 4 revisions

The Python scripts

Three Python scripts are provided to simplify the process of running ORAC.

  • orac.py fully processes a single file and is the main script you should use;
  • single_process.py runs one step of the ORAC processor on a single file;
  • regression.py runs a suite of regression tests.

A full list of arguments for each script can be found by calling any script with the --help argument. This page introduces the most common arguments.

orac.py

This script has a single mandatory argument: the name of a satellite imagery file to process. If not an absolute path, specify the directory with --in_dir. Please do no rename satellite files as the formatting is used to determine much of the file's metadata. These formats are specified in the FileName class, which you may edit to accommodate new sensors.

The most commonly used arguments are,

  • --out_dir to specify the output directory;
  • --preset_settings indicates which predefined settings (from your local defaults should be used;
  • --limit X0 X1 Y0 Y1 limits processing to the rectangle specified (in 1-indexed satellite pixel count);
  • --l1_land_mask uses the land mask in the satellite data and is highly recommended for polar orbiting satellites;
  • --skip_ecmwf_hr is recommended for new users as this feature isn't very important when using new meteorological data;
  • --use_oc uses Ocean Colour CCI data in the sea-surface reflectance calculation and is highly recommended with aerosol retrievals;
  • --revision sets the revision number, which is written into the file names. It must be set if not using a Git repository;
  • --procs N to use N cores during processing;
  • --clobber X sets the clobber level: 3 overrides all existing files, 2 only overrides final results, 1 overrides everything except pre-processed files, and 0 leaves all existing files in place.

ORAC can be run through a batch queuing system rather than on your local machine with the --batch argument. The batch system is specified at the bottom of your local_defaults.py. Controls for that are,

  • --label sets the name of the job;
  • --dur X Y Z specifies the maximal allowed duration (in HH:MM) for the pre, main and post processors (X, Y, Z, respectively);
  • --ram X Y Z specifies the maximal allowed memory (in Mb) for the pre, main and post processors.

The following arguments are useful for debugging,

  • --script_verbose prints the driver files to the screen;
  • --verbose activates full verbosity, printing all progress within the program;
  • --dry_run is a dry-run, which will print driver files to the screen but not call any executables;
  • --keep_driver to keep the driver files after processing;
  • --timing prints the duration of each process;
  • --available_channels to specify the channels to read from the satellite data (though they may not necessarily be used);
  • --settings allows direct specification of settings (without --preset_settings). It may be given multiple times to specify multiple processings;
  • --settings_file FILE works like --settings, but each line of FILE specifies a processing.

If you wish to run with a different ORAC executable (e.g. because you have made some changes and wish to compare the results with and without them), use --orac_dir to specify the root of the altered source code directory and/or --orac_lib to specify the new library dependencies.

For users familiar with the driver file format, it is possible to directly write lines by two means. In these, SECTION can take the values pre, main or post to indicate which part of the processor to affect.

  • --additional SECTION KEY VALUE will set variable KEY to equal VALUE in the driver file when running SECTION;
  • --extra_lines SECTION FILE will copy the contents of FILE into the driver file of SECTION.

Local defaults

The main job of the Python scripts is, given a satellite file, to locate the appropriate auxiliary files to pass to ORAC. It does so by searching various paths for expected filenames. To save typing them in each call, these paths are consolidated in a single file: local_defaults.py. You will need to prepare one to describe your local environment. A description of each variable is provided in our general example while a specific example is available for processing on JASMIN.

If you installed ORAC using Anaconda, your local defaults file should be stored at ${CONDA_PREFIX}/lib/python3.7/site-packages/pyorac (adjusting the Python version as appropriate). Otherwise, leave it in tools/pyorac.

The values defined in this file are only defaults. All can be overridden for a call using the arguments --aux (for paths), --global_att (for NCDF attributes) or --batch_settings (for batch processing settings). These all use the syntax -x KEY VALUE, where KEY is the name of the variable you wish to set and VALUE is its new value.

single_process.py

This script takes a single argument, as above, and

  • if it is a satellite image, runs the pre-processor;
  • if it is any output of the pre-processor, runs the main processor once;
  • if it is any output of the main processor, runs the post-processor.

Each of these has associated arguments to control the operation of ORAC. The same arguments are used by --settings or the retrieval_settings in your local defaults.

Pre-processor
  • --day_flag N specifies if only day (1) or night (2) should be processed. Default behaviour (0) is to process everything. (Twilight is neither day nor night.)
  • --dellat and --dellon set the reciprocal of the resolution of the pre-processing grid. RTTOV is only run over that grid and then interpolated for each satellite pixel. These should be less than or equal the equivalent for the meteorological data used.
  • --ir_only skips all visible channels. Saves time for cloud top height retrievals.
  • --camel_emis uses the CAMEL surface emissivity library rather than the RTTOV atlas.
  • --ecmwf_nlevels [60, 91, 137] specifies the number of levels in the meteorological data given.
  • --use_ecmwf_snow uses the snow/ice fields in the meteorological data rather than from NISE.
  • --no_snow_corr skips the snow/ice correction. Saves time for geostationary imagery.
Main processor
  • --approach gives the forward model to be used. These are,
    • AppCld1l for single-layer cloud;
    • AppCld2l for two-layer cloud;
    • AppAerOx for aerosol over sea (using a BRDF surface model);
    • AppAerSw for aerosol over land and multiple-view imagery (using the Swansea surface model);
    • AppAerO1 for aerosol over land and single-view imagery (using a BRDF surface model).
  • --ret_class allows alteration of the approach and only needs to be set if you wish to experiment with the forward model. Options are,
    • ClsCldWat for water cloud;
    • ClsCldIce for ice cloud;
    • ClsAerOx for aerosol over sea (using a BRDF surface model);
    • ClsAerSw for multiple-view aerosol (using the Swansea surface model);
    • ClsAerBR for multiple-view aerosol (using a BRDF-resolving Swansea surface model);
    • ClsAshEyj for ash.
  • --phase gives the type of particle to evaluate. These are,
    • WAT for water cloud;
    • ICE for ice cloud;
    • A70 for dust;
    • A71 for polluted dust;
    • A72 for light polluted dust;
    • A73 for light dust;
    • A74 for light clean dust;
    • A75 for Northern Hemisphere background;
    • A76 for clean maritime;
    • A77 for dirty maritime;
    • A78 for polluted maritime;
    • A79 for smoke;
    • EYJ for ash.
  • --multilayer PHS CLS sets the --phase and --ret_class for the lower layer in a two-layer retrieval.
  • --use_channels set which channels should be used. Requested channels that weren't made available with -c are quietly ignored.
  • --types allows the user to limit which pixels are processed to those listed. Pixels are flagged by type in pre-processing as one of CLEAR, SWITCHED_TO_WATER, FOG, WATER, SUPERCOOLED, SWITCHED_TO_ICE, OPAQUE_ICE, CIRRUS, OVERLAP, PROB_OPAQUE_ICE, or PROB_CLEAR. By default, all are processed.
  • --no_land skips all land pixels;
  • --no_sea skips all ocean pixels;
  • --cloud_only skips all clear-sky pixels;
  • --aerosol_only skips all cloudy pixels.
Post-processor
  • --phases specifies which phases should be combined into this file. The code will not automatically work out which ones you want during single processing (but will work fine during normal running).
  • --chunking splits the satellite orbit in 4096 line chunks. Useful for machines with limited memory.
  • --compress compresses the data in the final output. This can significantly reduce the size of aerosol files, which contain many fill values.
  • --no_night_opt suppresses the output of cloud optical properties at night.
  • --switch_phase is a correction of cloud-only processing, whereby water pixels with a CTT below the freezing point are forced to ice (and vice versa).

regression.py

This runs the ORAC regression tests, a sampling of orbits over Australia on 20 June 2008. If you intend to commit code to this repository, make certain that it compiles and can run these tests without unexpected changes.

The script accepts all of the arguments from the scripts above but ignores any --settings in favour of built in tests. Additional arguments are,

  • --tests specifies which tests should be run. They are,
    • The short tests (five lines containing both cloud and clear-sky) DAYMYDS, NITMYDS, DAYAATSRS, NITAATSRS, DAYAVHRRS, NITAVHRRS. These are sufficient in most circumstances and are run by default.
    • The long tests (processing the entire image) DAYMYD, NITMYD, AATSR, AVHRR. All of these can be called by --long.
  • --test_type specifies which manner of test should be run (specifically, which suffix to use when setting --preset_settings): C for cloud, A for aerosol, or J for joint.
  • --benchmark suppresses comparison. By default, the script will increment the revision number of the repository by 1 and compare the new outputs to the previous.

Examples

orac.py /network/group/aopp/eodg/atsr/aatsr/v3/2008/06/03/ATS_TOA_1PUUPA20080603_160329_000065272069_00111_32730_5967.N1 \
--out_dir /data/MEXICO --available_channels 1 2 3 4 5 6 7 8 9 10 11 12 13 14 \
--limit 1 512 17200 18200 -S AATSR_J --l1_land_mask --use_oc --procs 7

This will process an AATSR orbit from 3 June 2008 stored in Oxford, saving the results to the folder /data/MEXICO. All 14 channels in the data will be used (--available_channels) over a 512x101 pixel block of the orbit (--limit). The orbit will be evaluated using the preset settings for a joint retrieval (-S AATSR_J; 23 runs covering two single-layer clouds, one multilayer cloud, ten sea-only BRDF-surface aerosol retrievals, and ten land-only Swansea-surface aerosol retrievals) but using the satellite data's own land/sea mask (--l1_land_mask) and input from Ocean Colour CCI data (--use_oc). Seven cores will be used for this processing (--procs 7).

orac.py -i /network/group/aopp/eodg/atsr/aatsr/v3/2008/01/19 \
-o /network/aopp/apres/users/povey/settings_eval/default_land \
--day_flag 1 --dellon 1.5 --dellat 1.5 --ecmwf_flag 4 \
-x ecmwf_dir /network/aopp/matin/eodg/ecmwf/Analysis/REZ_0750 --skip_ecmwf_hr \
--settings_file ~/new_retrieval --use_oc --l1_land_mask --keep_driver \
--batch --ram 4000 4000 4000 -b queue legacy \
-g project AERONETCOLLOCATION -g product_name N0183-L2 \
ATS_TOA_1PUUPA20080119_103548_000065272065_00165_30780_4013.N1

This will process the day-time segment (--day_flag) of an AATSR file from 19 January 2008, saving the result to a folder called default_land (-o). The pre-processing grid will have a resolution of 0.75° (--dellon --dellat) and draw from operational ECMWF forecasts (--ecmwf_flag -x ecmwf_dir) only (--skip_ecmwf_hr). The settings are drawn from the file new_retrieval in my home directory (--settings_file), though Ocean Colour CCI data and the L1 land mask are added. The driver files will be retained after processing (--keep_driver). The processing will be batch processed on the queue legacy (--batch -b queue), allocating 4Gb of RAM to each stage (--ram). The project name will be AERONETCOLLOCATION and the product name will be N0183-L2 (-g project -g product_name).

regression.py -o /data/testoutput --l1_land_mask --procs 8 -r 1870 -C1

This runs the six short, cloud regression tests (the default), saving the results to /data/testoutput. The L1 land/sea mask is used. Eight cores will be used and the result labelled as revision 1870. Any existing files of that pre-processor files of that revision will be kept (-C1). (This call was made during debugging, where the pre-processor worked fine but the main processor failed, so we had no need to repeat the successful steps.)

Troubleshooting

The Python scripts in orac/tools are fairly simple wrappers for the code in orac/tools/pyorac. The files there are,

  • arguments.py defines all of the command line arguments for the various scripts and functions that check the inputs are valid;
  • batch.py defines the interface to call a batch queuing system;
  • definitions.py defines some classes used throughout the code. All satellite instruments and particle types need to be defined in here;
  • drivers.py contains functions that create driver files for each part of ORAC. This is where most of the work actually happens;
  • local_defaults.py defines the default locations that the scripts search for input files so you don't have to type the full paths every time;
  • mappable.py contains a class that used for plotting satellite swaths on maps (it's sort of like Mappoints from IDL);
  • regression_tests.py defines which files and pixels to run during testing;
  • run.py contains functions that call everything else (process_all contains everything needed to run ORAC);
  • swath.py contains a class that used for loading and filtering ORAC data;
  • util.py collects various routine functions used throughout the code.

The most common error is an environment error. To run, the scripts require a number of external libraries to be installed and to be able to find the orac/tools/pyorac folder. The conda installation should do all of that. When something goes wrong, try cd $ORACDIR/tools (if that helps, it means PYTHONPATH hasn't been updated correctly). If you want to quickly check the scripts compile, call orac.py -h for the help prompt.

The next most common error is from the local_defaults.py file pointing to folders that don't exist. That probably comes up as some sort of "File not found" error. Then there are file name errors. The script makes certain assumptions about the format for the input files and when those change, the script fails with unhelpful error messages from drivers.py.

Other common errors:

  • When multiple syntax errors happen, it means Python can't compile the code. Check your ORAC environment is activated (as you might be using the default Python version, which is rather old).
  • If you get error code -11, try ulimit -s 500000 to increase your stack size. If that works, consider adding that line to your .bashrc file or the ORAC activation script.