Skip to content

bergmanlab/yeast-transposons

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

27 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

YEAST TRANSPOSON CANONICAL SEQUENCES

Here we present an open access library of transposable element canonical sequences for species in the genus Saccharomyces (Ty elements). Currently, this library contains sequences of full-length elements for major LTR retrotransposon families and subfamilies in S. cerevisiae and S. paradoxus.

The library here is derived from a related resource reported in in Carr et al. (2012), later adapted for re-use by Nelson et al. (2017) and Czaja et al (2020). We decided to generate an updated Ty query library for the following limitations: (i) the old Ty sequences were predominately adapted from the RepeatMasker Repbase library, which is currently closed access; (ii) the ultimate provenance of some sequences in this library was unclear (i.e. Ty1, Ty2) and some Ty sequences were adopted from legacy Genbank records which may have poor sequence quality (i.e. Ty3, Ty4); (iii) "full-length" Ty sequences in the old library were formatted with LTR and internal regions split from one another (to be used as input for RepeatMasker), rather than true full-length elements; (v) the species of origin for non-cerevisiae Ty elements was inconsistently labeled (i.e. Ty3_1p, Ty5); (vi) the old query sequences lacked systematic structural annotations (coordinates of LTRs and ORFs); and (vi) representatives of major subfamilies (i.e. Ty1' or Ty1p) and some newly-discovered Ty families (i.e. Tsu4) were not included.

In order to overcome these limitations, we created a comprehensive library of full-length canonical Ty elements for S. cerevisiae and S. paradoxus derived from high-quality public long-read assemblies. To select Ty elements for this library, we first applied the RepeatMasker-based Ty annotation pipeline described in Czaja et al (2020) to high quality public genome assemblies for S. cerevisiae and S. paradoxus. We next ran LTRharvest and LTRdigest on the same set of yeast genome assemblies to generate de novo predictions of full-length LTR elements. We then generated multiple sequence alignments for each Ty family using all full-length elements detected by the RepeatMasker-based pipeline plus the query element from the Czaja et al (2020) version of the old Ty library. Multiple alignments were then used to generate neighbor-joining phylogenetic trees for each Ty family with Seaview. We then selected a representative full-length genomic replacement that was as close to the original query element in the tree and was supported by a LTRharvest prediction. Additionally, we generated structural annotations, including flanking LTRs and ORFs (GAG and POL) for all genomic replacements. The coordinates of 5’ and 3’ LTRs were adopted from LTRharvest results, which were then validated with RepeatMasker annotations to make sure the length of annotated LTRs were consistent in two independent tools. ORF prediction was performed with NCBI ORFfinder.

To make the nomenclature of canonical Ty sequences generalizable to multiple species but as compatible as possible with previous work, we add the "p" suffix to all S. paradoxus family names. Specific subfamilies are denoted after the "_" delimiter. Subfamilies in S. paradoxus are labelled according to whether they are extracted from host strains from the Old World (ow) or New World (nw).

Ty library

About

Saccharomyces transposable element canonical sequences

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published