Skip to content

Code for Long-read isoform sequencing reveals tissue-specific isoform expression between active and hibernating brown bears (Ursus arctos)

Notifications You must be signed in to change notification settings

jokelley/brownbear-isoseq-act-hib

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

16 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

brownbear-isoseq-act-hib

Code for Long-read isoform sequencing reveals tissue-specific isoform expression between active and hibernating brown bears (Ursus arctos)

workflow

1. Iso-Seq Analysis

(1a) Pooled Iso-Seq Analysis

We pooled all 18 SMRT Cells 1M into a single dataset and ran through Iso-Seq Analysis (v8.1) in SMRTLink. The input to the Iso-Seq Analysis was the pooled HiFi (CCS) reads and the output of the Iso-Seq Analysis was the high-quality, full-length transcript sequences (hq_transcripts.fasta). The Iso-Seq SMRTLink Analysis Report PDF can be found here.

Additional required files are available as follows:

(1b) Mapping and Collapsing

We mapped the HQ transcripts to the bear genome using minimap2 and collapsed it using Cupcake, in particular the post-Iso-Seq processing tutorial.

minimap2 -ax splice -t 30 -uf --secondary=no -C5 \ 
       GCF_003584765.1_ASM358476v1_genomic.fna \
       hq_transcripts.fasta > \
       hq_transcripts.fasta.sam
       
sort -k 3,3 -k 4,4n hq_transcripts.fasta.sam > hq_transcripts.fasta.sorted.sam

collapse_isoforms_by_sam.py  --input hq_transcripts.fasta \
       -s hq_transcripts.fasta.sorted.sam \
       -c 0.99 -i 0.95 \
       --dun-merge-5-shorter \
       -o hq.no5merge
       
get_abundance_post_collapse.py \
       hq.no5merge.collapsed \
       cluster_report.csv

filter_away_subset.py hq.no5merge.collapsed

(1c) Extracting Per-Sample Counts

We create a custom classify_report.csv where each full-length (FLNC) read has the proper sample labeling (ex: CF1N). Samples are named by the convention [bear][tissue-F:fat,M:muscle,L:liver][1N:hibernation or 3N:active].

python <path_to_cupcake>/post_isoseq_cluster/demux_isoseq_with_genome.py \
        --mapped_fafq hq.no5merge.collapsed.filtered.rep.fa \
        --read_stat hq.no5merge.collapsed.read_stat.txt\
        --classify_csv classify_report.csv \
        -o hq.no5merge.collapsed.filtered.mapped_fl_count.txt

(1d) Classification and Filtering using SQANTI3

We used SQANTI3 to classify and filter the collapsed transcripts against the bear annotation.

python ~/GitHub/SQANTI3/sqanti3_qc.py \
                 hq.no5merge.collapsed.filtered_classification.filtered_lite.gtf \
                 GCF_003584765.1_ASM358476v1_genomic.gtf \
                 GCF_003584765.1_ASM358476v1_genomic.fna \
                 --fl_count hq.no5merge.collapsed.filtered.mapped_fl_count.txt \
                 -c splices_brownbear_shortread.tab \
                 --genename -n 20 --isoAnnotLite
     
             
python <path_to_sqanti3>/sqanti3_RulesFilter.py \
                 --faa hq.no5merge.collapsed.filtered_corrected.faa \
                 hq.no5merge.collapsed.filtered_classification.txt \
                 hq.no5merge.collapsed.filtered_corrected.fasta \
                 hq.no5merge.collapsed.filtered_corrected.gtf

The post-SQANTI3-filtering results (but before merging with the reference transcriptome) can be found here.

(1e) Merging with Refererence Transcriptome

We used gffcompare (v 0.11.2) to merge the new transcripts with the existing reference annotation.

gffcompare \
                -p BBEAR -o new_merge GCF_003584765.1_ASM358476v1_genomic.gff \ 
                hq.no5merge.collapsed.filtered_classification.filtered_lite_corrected.gtf

The final, merged transcriptome can be found here.

2. Mapping short-read data

(2a) Mapping to Iso-Seq reference

hisat2 index isoseq-transcriptome hq.no5merge.collapsed.filtered_classification.filtered_lite_corrected.fasta

hisat2 --threads 4 --rf -x isoseq-transcriptome -1 ${reads1} -2 ${reads2} -S ${outfile_name} 

(2b) Kallisto mapping for quantification

kallisto-v0.46.1/kallisto/kallisto quant -i new_merge --rf-stranded -o ${file_name} -b 100 -t 5 ${reads} 

About

Code for Long-read isoform sequencing reveals tissue-specific isoform expression between active and hibernating brown bears (Ursus arctos)

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •