Creating a robust process for evaluating the GCIS project.
Communication of the provenance provided for the primary publications in GCIS.
Primary Publications: nca3, Impacts of Climate Change on Human Health, Indicators, and others.
Informing decisions on where to direct data management work by identifying weak or broken provenance that can be improved
Perl v5.14 or higher
apt-get -y update && apt-get -y install perlbrew
perlbrew init
echo "source ~/perl5/perlbrew/etc/bashrc" >>~/.bashrc
source ~/.bashrc
perlbrew install perl-5.20.3 # Takes about 25 minutes!
perlbrew install-cpanm
perlbrew install-patchperl
perlbrew switch perl-5.20.3
These modules are required, install via cpanm
:
cpanm install Getopt::Long Pod::Usage YAML::XS Data::Dumper Clone::PP Time::HiRes Path::Class JSON::XS Mojo::UserAgent
If you run into any error that mentions a missing module, install it similarly with: cpanm install Module::Name
.
Required repos:
GCIS Scripts : https://github.com/USGCRP/gcis-scripts/
GCIS Perl Client : https://github.com/USGCRP/gcis-pl-client/
GCIS Provenance Evaluator : https://github.com/USGCRP/gcis-pl-client/
Clone these and add their lib
directories to your local PERL5LIB
Initial Setup:
mkdir ~/repos # only if you have no existing 'repos' directory
cd ~/repos
ls # check 'gcis-scripts' and 'gcis-pl-client' exist
git clone https://github.com/USGCRP/gcis-scripts/
git clone https://github.com/USGCRP/gcis-pl-client/
git clone https://github.com/USGCRP/gcis-provenance-evaluator/
ls # should see all three now
echo "export PERL5LIB=$PERL5LIB:~/repos/gcis-pl-client/lib:~/repos/gcis-scripts/lib/" >>~/.bashrc
. ~/.bashrc
Refresh the repos if it's been several months!
cd ~/repos/gcis-scripts
git pull
cd ~/repos/gcis-pl-client
git pull
cd ~/repos/gcis-provenance-evaluator
git pull
See the scripts documentation & examples:
cd ~/repos/gcis-provenance-evaluator
perldoc ./generate_resource_scores.pl
To Document
To Document
- Establish Scores & Configuration
- Establish a scoring metric for each GCIS resource
- See default example, format
- Establish a scoring metric for each GCIS connection
- See default example, format
- Establish the components for each GCIS resource
- See defaut example, format
- Establish a scoring metric for each GCIS resource
- Generate the score tree
- Decide how many levels deep you want to analyse your resource.
- Select your GCIS instance to run against (or load the pertinent database dump into a local instance) (default production).
- Run the command to generate the tree: See next section.
- Name the trees in an informative way and store them somewhere safe!
- Run analysis on the Score Tree
- See each evaluation folder.
Note 1: This script is long running on larger chapters! To be safe, you should run it in a screen session on a long-living server to prevent interruptions. For our purposes at USGCRP, I suggest running this script on data-review.
Note 2: This process is likely to generate a multitude of output trees we will want to keep track of. I strongly encourage a strict naming convention: "REPORT_CHAPTER_COMPONENT_SCORING_metrics.yaml".
- So, if you were to run this on the Executive Summary of the NCA3 with the default scores:
- "nca3_ch1_defaultscoring_metrics.yaml"
- Or to use your custom scoring file "super_strict_scores.yaml" on the CSSR Chapter Temperature Changes in the United States Figure 6.1:
- "cssr_ch6_fig1_superstrict_metrics.yaml"
- commit that scoring file!
If you want to use the default scores and configuration, the process is pretty straightforward:
screen -DRS "metrics screen" # creates or reconnects to the screen named "metrics screen"
cd ~/repos/gcis-provenance-evaluator
./generate_resource_scores.pl \
--resource /report/nca3/chapter/executive-summary \
--tree_file ./nca3_ch1_defaultscoring_metrics.yaml
exit
Running with all the custom options:
screen -DRS "metrics screen"
cd ~/repos/gcis-provenance-evaluator
./generate_resource_scores.pl \
--resource report/usgcrp-climate-human-health-assessment-2016/chapter/extreme-events \
--tree_file ./hhs2016_ch4_newscoring_newcomponents_metrics.yaml \
--url https://data-stage.globalchange.gov \
--depth 2 \ # WARNING - increasing depth can potentially make the run _exponentially_ longer!!!
--connection_score /tmp/new_scores.yml \
--internal_score /tmp/new_inner_scores.yml \
--components /tmp/comps.yml
Put the generated yaml file in the root directory. Make sure the above cpan modules are installed and run:
./translate_scoretree_for_d3.pl --tree_file nca4_ch22_butterfly_1219.yaml --d3_file nca4_ch22_butterfly_1219.json