Skip to content

girochat/genome_assembly_course

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

30 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Genome assembly and annotation

Description of the project

This repository contains all the scripts and data that were used for the project of building a de novo assembly of the Shadhara accession (Sha) of Arabidopsis thaliana. The resulting Sha assembly and those of four other accessions (An-1, C24, Cvi-0 and Ler-3) were used for a comparative analysis of the corresponding gene and TE annotation. It is part of a collaborative work with four other teams, each working on a respective de novo assembly.
This project was part of two courses, "Genome and transcriptome assembly" and "Organisation and annotation of eukaryote genomes" organised by the University of Bern and Fribourg respectively in the context of the Master of Bioinformatics.

Workflow of the project

Assembly part

  • Quality control and kmer analysis with FASTQC and jellyfish
  • Long-read de novo assembly with Canu and Flye for the genomic data, and with Trinity for the transcriptomic data
  • Assembly polishing with Pilon after short-read mapping with BowTie2
  • Assembly evaluation with Busco, QUAST and Merqury
  • Dot plot between the de novo assemblies and reference genome with MUMmer

Annotation part

  • TE annotation and classification with EDTA and TESorter
  • TE dynamics analysis: TE insertion dating, TE genomic distribution plotting and TE clades phylogeny
  • Gene annotation with MAKER
  • Gene annotation evaluation with Busco (protein-level) and alignment to Uniprot protein sequences with blast.
  • Genetic comparative analysis between accessions with GENESPACE

Repository organisation

The repository is organised into three main directories:

  • scripts directory: all scripts that were used throughout the workflow of the project
  • data directory: all the data of the project, from raw reads to intermediate data produced during the steps of the project
  • analysis directory: all the results from any analysis that was performed

More information can be found in the README section of each directory.

Path for the repository of this project on the IBU cluster: /data/users/grochat/genome_assembly_course/
Link for the repository of this project on GitHub.com : https://github.com/girochat/genome_assembly_course/

Note: all the data is available on the IBU cluster but the GitHub repository contains only data of reasonable size (less than 100Mb) due to repository size limits.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published