-
Notifications
You must be signed in to change notification settings - Fork 0
Native Installation
The HCV-GLUE resource can be used "offline" to organise and analyse sequence data on a private computer. Offline HCV-GLUE takes the form of a GLUE project build (a linked dataset and set of analysis functions). This project build can be loaded into an instance of the GLUE engine.
A certain level of Unix command-line computing experience is required to install and use Offline HCV-GLUE. Please follow the instructions below to set up and use offline HCV-GLUE on your computer. A working installation of offline HCV-GLUE is capable of a range of analysis functions. Some examples are given below; the GLUE engine website documents how other functions may be accessed.
- Install the GLUE engine
- Download and install the HCV-GLUE project build
- Check the project build is working
- Example use: Basic analysis of deep sequencing data
- Example use: Combined genotyping and drug resistance analysis reports
Please contact Josh Singer or post a question on the GLUE support forum with any questions or queries about offline HCV-GLUE
Please follow the GLUE installation instructions. We would strongly recommend a Docker-based installation of GLUE.
Note: This step will erase anything that was previously in your database.
-
Make sure the
gluetools-mysql
container is running. -
The following UNIX command will install the HCV-GLUE project build.
$ docker exec gluetools-mysql installGlueProject.sh ncbi_hcv_glue
-
Download ncbi_hcv_glue.sql.gz
-
Load the build into your MySQL databse using a Unix command line, adjusting the details according to your system:
$ gunzip -c ncbi_hcv_glue.sql.gz | /usr/local/mysql/bin/mysql --user=gluetools --password=glue12345 GLUE_TOOLS
-
Download the files resistanceGeno1.fasta and ngsData.bam to a local directory, e.g.
/home/fred/glue_data
-
Start the GLUE command line, ensuring that your local directory is accessible to GLUE:
$ docker run --rm -it --name gluetools -v /home/fred/glue_data:/home/fred/glue_data -w /home/fred/glue_data --link gluetools-mysql cvrbioinformatics/gluetools:latest
$ cd /home/fred/glue_data $ gluetools.sh
-
Use
list project
to check that the HCV project is there.Mode path: / GLUE> list project +======+================================+ | name | description | +======+================================+ | hcv | Hepatitis C variation analysis | +======+================================+ Projects found: 1
-
We will use the
maxLikelihoodGenotyper
module within HCV-GLUE to assign a genotype and subtype for the sequence in the FASTA file. Enter the GLUE commands below and check the output.GLUE> project hcv OK Mode path: /project/hcv GLUE> module maxLikelihoodGenotyper genotype file --fileName resistanceGeno1.fasta +===========+====================+===================+ | queryName | genotypeFinalClade | subtypeFinalClade | +===========+====================+===================+ | EF407428 | AL_1 | AL_1a | +===========+====================+===================+
-
We will use the
phdrFastaSequenceReporter
module within HCV-GLUE to translate the NS5A gene of this sequence to amino acids. Enter the GLUE command below and check the output. Use 'Q' to exit the interactive table.Mode path: /project/hcv GLUE> module phdrFastaSequenceReporter amino-acid -i resistanceGeno1.fasta -r REF_MASTER_NC_004102 -f NS5A -t REF_1a_M62321 -a AL_UNCONSTRAINED +==========+=========+==========+==========+===========+===========+===========+ |codonLabel| queryNt | relRefNt | codonNts | aminoAcid |definiteAas|possibleAas| +==========+=========+==========+==========+===========+===========+===========+ |1 | 6164 | 6258 | TCC | S |S |S | |2 | 6167 | 6261 | GGC | G |G |G | |3 | 6170 | 6264 | TCC | S |S |S | |4 | 6173 | 6267 | TGG | W |W |W | |5 | 6176 | 6270 | CTA | L |L |L | |6 | 6179 | 6273 | AGG | R |R |R | |7 | 6182 | 6276 | GAC | D |D |D | |8 | 6185 | 6279 | ATC | I |I |I | |9 | 6188 | 6282 | TGG | W |W |W | |10 | 6191 | 6285 | GAC | D |D |D | |11 | 6194 | 6288 | TGG | W |W |W | |12 | 6197 | 6291 | ATA | I |I |I | |13 | 6200 | 6294 | TGC | C |C |C | |14 | 6203 | 6297 | GAG | E |E |E | |15 | 6206 | 6300 | GTG | V |V |V | |16 | 6209 | 6303 | TTG | L |L |L | |17 | 6212 | 6306 | AGC | S |S |S | |18 | 6215 | 6309 | GAC | D |D |D | |19 | 6218 | 6312 | TTT | F |F |F | +==========+=========+==========+==========+===========+===========+===========+ Rows 1 to 19 of 448 [F:first, L:last, P:prev, N:next, Q:quit]
Offline HCV-GLUE can be used for minority variant analysis of deep sequencing data in the form of SAM/BAM files We presume here that a suitable sequence assembly method has been used to generate the SAM/BAM file from the raw sequencing data.
We will use the phdrSamReporter
module within HCV-GLUE to translate those reads within the file which map to the NS5A gene. The function will report the balance of different amino acids residues at different locations within this gene.
Enter the GLUE command below and check the output. Use 'Q' to exit the interactive table.
GLUE> module phdrSamReporter amino-acid -i ngsData.bam -r REF_MASTER_NC_004102 -f NS5A -p -a AL_UNCONSTRAINED
+============+==========+==========+===========+=============+============+
| codonLabel | samRefNt | relRefNt | aminoAcid | readsWithAA | pctAaReads |
+============+==========+==========+===========+=============+============+
| 1 | 6147 | 6258 | Y | 1 | 0.32 |
| 1 | 6147 | 6258 | S | 314 | 99.37 |
| 1 | 6147 | 6258 | P | 1 | 0.32 |
| 2 | 6150 | 6261 | S | 1 | 0.32 |
| 2 | 6150 | 6261 | R | 1 | 0.32 |
| 2 | 6150 | 6261 | G | 313 | 99.37 |
| 3 | 6153 | 6264 | S | 315 | 100.00 |
| 4 | 6156 | 6267 | W | 315 | 100.00 |
| 5 | 6159 | 6270 | L | 314 | 100.00 |
| 6 | 6162 | 6273 | R | 312 | 99.36 |
| 6 | 6162 | 6273 | K | 1 | 0.32 |
| 6 | 6162 | 6273 | G | 1 | 0.32 |
| 7 | 6165 | 6276 | D | 314 | 100.00 |
| 8 | 6168 | 6279 | I | 274 | 100.00 |
| 9 | 6171 | 6282 | W | 274 | 100.00 |
| 10 | 6174 | 6285 | D | 253 | 100.00 |
| 11 | 6177 | 6288 | * | 1 | 0.40 |
| 11 | 6177 | 6288 | W | 252 | 99.60 |
| 12 | 6180 | 6291 | I | 197 | 100.00 |
+============+==========+==========+===========+=============+============+
Rows 1 to 19 of 753 [F:first, L:last, P:prev, N:next, Q:quit]
Offline HCV-GLUE contains modules which run combined genotyping and drug resistance analysis procedures, and generate detailed reports.
This first example will read in the FASTA file, resistanceGeno1.fasta
and produce an HTML report file resistanceGeno1.html
. Click here for a preview of the report.
Mode path: /project/hcv
GLUE> module phdrReportingController invoke-function reportFastaAsHtml resistanceGeno1.fasta resistanceGeno1.html
The second example will read in the BAM file, ngsData.bam
and produce a new HTML report ngsData.html
, using a 15% reads percentage minimum threshold for reporting polymorphisms. Click here for a preview of the report.
Mode path: /project/hcv
GLUE> module phdrReportingController invoke-function reportBamAsHtml ngsData.bam 15.0 ngsData.html
HCV-GLUE can also generate files containing genotyping and drug resistance reporting data in a machine-readable format (XML or JSON). These could be used e.g. for integration into a bioinformatics pipeline. Contact the team if this is of interest.