Code for Long-read isoform sequencing reveals tissue-specific isoform expression between active and hibernating brown bears (Ursus arctos)
We pooled all 18 SMRT Cells 1M into a single dataset and ran through Iso-Seq Analysis (v8.1) in SMRTLink. The input to the Iso-Seq Analysis was the pooled HiFi (CCS) reads and the output of the Iso-Seq Analysis was the high-quality, full-length transcript sequences (hq_transcripts.fasta
). The Iso-Seq SMRTLink Analysis Report PDF can be found here.
Additional required files are available as follows:
- hq_transcripts.fasta: https://doi.org/10.7273/000003944
- cluster_report.csv: https://doi.org/10.7273/000003945
We mapped the HQ transcripts to the bear genome using minimap2 and collapsed it using Cupcake, in particular the post-Iso-Seq processing tutorial.
minimap2 -ax splice -t 30 -uf --secondary=no -C5 \
GCF_003584765.1_ASM358476v1_genomic.fna \
hq_transcripts.fasta > \
hq_transcripts.fasta.sam
sort -k 3,3 -k 4,4n hq_transcripts.fasta.sam > hq_transcripts.fasta.sorted.sam
collapse_isoforms_by_sam.py --input hq_transcripts.fasta \
-s hq_transcripts.fasta.sorted.sam \
-c 0.99 -i 0.95 \
--dun-merge-5-shorter \
-o hq.no5merge
get_abundance_post_collapse.py \
hq.no5merge.collapsed \
cluster_report.csv
filter_away_subset.py hq.no5merge.collapsed
We create a custom classify_report.csv
where each full-length (FLNC) read has the proper sample labeling (ex: CF1N
). Samples are named by the convention [bear][tissue-F:fat,M:muscle,L:liver][1N:hibernation or 3N:active]
.
python <path_to_cupcake>/post_isoseq_cluster/demux_isoseq_with_genome.py \
--mapped_fafq hq.no5merge.collapsed.filtered.rep.fa \
--read_stat hq.no5merge.collapsed.read_stat.txt\
--classify_csv classify_report.csv \
-o hq.no5merge.collapsed.filtered.mapped_fl_count.txt
We used SQANTI3 to classify and filter the collapsed transcripts against the bear annotation.
python ~/GitHub/SQANTI3/sqanti3_qc.py \
hq.no5merge.collapsed.filtered_classification.filtered_lite.gtf \
GCF_003584765.1_ASM358476v1_genomic.gtf \
GCF_003584765.1_ASM358476v1_genomic.fna \
--fl_count hq.no5merge.collapsed.filtered.mapped_fl_count.txt \
-c splices_brownbear_shortread.tab \
--genename -n 20 --isoAnnotLite
python <path_to_sqanti3>/sqanti3_RulesFilter.py \
--faa hq.no5merge.collapsed.filtered_corrected.faa \
hq.no5merge.collapsed.filtered_classification.txt \
hq.no5merge.collapsed.filtered_corrected.fasta \
hq.no5merge.collapsed.filtered_corrected.gtf
The post-SQANTI3-filtering results (but before merging with the reference transcriptome) can be found here.
We used gffcompare (v 0.11.2) to merge the new transcripts with the existing reference annotation.
gffcompare \
-p BBEAR -o new_merge GCF_003584765.1_ASM358476v1_genomic.gff \
hq.no5merge.collapsed.filtered_classification.filtered_lite_corrected.gtf
The final, merged transcriptome can be found here.
hisat2 index isoseq-transcriptome hq.no5merge.collapsed.filtered_classification.filtered_lite_corrected.fasta
hisat2 --threads 4 --rf -x isoseq-transcriptome -1 ${reads1} -2 ${reads2} -S ${outfile_name}
kallisto-v0.46.1/kallisto/kallisto quant -i new_merge --rf-stranded -o ${file_name} -b 100 -t 5 ${reads}