-
Notifications
You must be signed in to change notification settings - Fork 9
GrFN Metadata Enrichment
GrFN metadata enrichment is the process of taking results from text reading and combining the extracted information into the relevant parts of a GrFN. This results in the different components of a GrFN (containers, variables, etc.) having metadata fields of different types associated to them. Every metadata object will have a field under "provenance" -> "method"
that indicates where the metadata came from. In this process, it will be filled in as “TEXT_READING”
.
The core code for metadata enrichment of a GrFN is located in a few files:
(NOTE: Currently the file text_reading_linker.py
is only available under the branch grfn-links-fixes-fixed
)
automates/model_assembly/linking.py
automates/model_assembly/interfaces.py
automates/model_assembly/text_reading_linker.py
automates/model_assembly/metadata.py
There are a few input files that are required to do the metadata enrichment:
- A GrFN with variables (example file: SIR-simple—GrFN.json - Google Drive)
- Source code comments file (example file: SIR-simple—documentation.json - Google Drive)
- Documentation PDF related to code (example file: ideal_sir_model_without_vital_dynamics.pdf - Google Drive)
- Equations file (example file: ideal_sir_model_without_vital_dynamics-equations.txt - Google Drive)
These four files are passed to the TR web app and produces and alignment file. An alignment file is a JSON file holding links between entities from the four original files passed in. These links look like the following:
Then, the library processes it in different ways. Below we will discuss how the different library files work.
interfaces.py
This file holds the interface to the TR web app. The code in here passes the four original files passed in to the TR app and gets the alignment file as a result.
There are two interfaces in this file, one that actually calls to the TR app, and one for easier development that just serves files located on your machine. Note that you need to the files to exist to use the later interface.
linking.py
This file operates on the alignment file to build a NetworkX graph of the defined structure above. It parses all of the information from the alignment file into that structure, giving each edge a weight equal to the score that the alignment results gave the link. After the graph is built, there is code that processes it to find the strongest link between every Code Var and their related metadata nodes (i.e. Unit Data, Parameter setting, etc.). This is then turned into a dictionary where the keys are the Code Var (GrFN vars) that maps to their metadata.
automates/model_assembly/metadata.py
This file defines metadata structures that should exist in GrFN. The structure of this data is slightly different than the structure in the alignment file and it gives us actual objects to work with.
text_reading_linker.py
The code in this file "orchestrates" the metadata enrichment. It takes the four input files, calls an interface to get an alignment files, calls linking.py
code, and takes the resulting dictionary from linking.py
and processes it further. It takes the metadata inside of the dictionary result from linking that is structured how it was originally found in the alignment file and turns it into the expected metadata structures defined in metadata.py
. Then, it associates these metadata objects into the GrFN variable objects.
These scripts can help with the development of metadata enrichment:
scripts/gromet/gromet_metadata_enrichment.py
scripts/model_assembly/grfn_tr_alignment_merge.py
scripts/model_assembly/grfn_links_to_csv.py
scripts/model_assembly/variable_name_alignment.py