Semester Final Project for BIO 550 - Bioinformatics. All information and relevant code is included in the PDF, and with time I would be more than happy to walk through the report in more detail.
In summary, though, the goal of the project was to assemble and identify an Escherichea coli genome present in provided read data. The primary steps were as follows:
- Visualizing read data from Nanopore and Illumina sources and comparing major differences
- Appropriate quality and read length filtering
- Assembling contigs and comparing the different assemblers
- Running and parsing the results of a pre-written taxonomized BLAST script, as prepared by the professor
- Using QUAST to better compare assembler results after BLAST filtering
- Running & comparing the results of genome annotation programs PROKKA and DFAST
- Using samtools and analyzing the results of mapping our original reads to our now-assembled reference genome
In the end, we were able to assemble and annotate a genome of Escherichea coli using the methods and steps listed above.