-
Notifications
You must be signed in to change notification settings - Fork 3
Introduction to Back End
The back-end mostly consists of the python uploader for data, Blazegraph, and Flask (as the intermediate between database and front-end).
#Uploader The uploader is responsible for uploading data to Blazegraph, our triplestore database.
Most of the back-end is in the folder superphy/src/upload
upload
├── ontology
├── python
│ ├── classes
│ ├── data
│ ├── ontologies -> ../ontology
│ ├── outputs
│ ├── release-rgi-v3.0.1
│ ├── samples
│ └── tmp
├── tests
└── uml
###data/
Contains JSON files with information on hosts
, host_categories
, microbes
, sources
, and syndromes
for pre-loading into the database, as well as the gene data JSON files. A validation database for NCBI BLAST is also present.
Note: if you are missing the superphy_vf.xml
file in this repository when you run the uploader, you can retrieve it from the NAS. It is a virulence factor BLAST result file.
###ontology/ All ontologies used in SuperPhy. More is described below.
###outputs/
Where error logs are sent. Error logging is currently done through file IO statements, but it could be replaced with Python's logging
module.
###python/ Python scripts for the uploader. There are also python libs for retrieval that internally use sparql
####Workflow Currently the workflow is divided into separate files.
-
main.py
: Initalizes the Blazegraph namespace and uploads all ontologies to the database. -
metadata_upload.py
: Uploads sample genome files and gene files.- for the gene uploads,
data/superphy_vf.json
is the virulence factor file anddata/card.json
is the AMR gene one.
- for the gene uploads,
-
contig_upload.py
: Uploads contigs for all the genomes without contigs that are uploaded into the database by downloading the sequence FASTA file. Also performs sequence validation using methods fromsequence_validation.py
-
gene_location_upload.py
: Performs the gene identification analysis. Reference genes must be found for virulence factors, and then they are BLASTed for gene identification. AMR gene analysis uses the RGI from CARD.
Examples can be found at the bottom of each file.
###release-rgi-v3.0.1/ Folder for RGI. Documentation can be found in this folder's README.
###samples/ Sample files for uploading.
###tests/ Unit tests for uploader. Some of the tests are run on the assumption that Blazegraph has no data in it.
###uml/ UML diagrams for the classes in the uploader
Note that not all the relationships between classes are shown since the UML of the entire project is broken up into categories below
Ontologies are almost like the backbone of the triplestore database, as they lay out the models and relationships for the data. We've been using Protégé to edit our ontologies as it provides a nice user interface. There is a tutorial available in the resources page.
###Notes (i.e. haven't really figured out how to organize these thoughts yet)
- Format the data in Flask using Python when sending/getting requests, not in Mithril.