brownbear-isoseq-act-hib

Code for Long-read isoform sequencing reveals tissue-specific isoform expression between active and hibernating brown bears (Ursus arctos)

1. Iso-Seq Analysis

(1a) Pooled Iso-Seq Analysis

We pooled all 18 SMRT Cells 1M into a single dataset and ran through Iso-Seq Analysis (v8.1) in SMRTLink. The input to the Iso-Seq Analysis was the pooled HiFi (CCS) reads and the output of the Iso-Seq Analysis was the high-quality, full-length transcript sequences (hq_transcripts.fasta). The Iso-Seq SMRTLink Analysis Report PDF can be found here.

Additional required files are available as follows:

hq_transcripts.fasta: https://doi.org/10.7273/000003944
cluster_report.csv: https://doi.org/10.7273/000003945

(1b) Mapping and Collapsing

We mapped the HQ transcripts to the bear genome using minimap2 and collapsed it using Cupcake, in particular the post-Iso-Seq processing tutorial.

minimap2 -ax splice -t 30 -uf --secondary=no -C5 \ 
       GCF_003584765.1_ASM358476v1_genomic.fna \
       hq_transcripts.fasta > \
       hq_transcripts.fasta.sam
       
sort -k 3,3 -k 4,4n hq_transcripts.fasta.sam > hq_transcripts.fasta.sorted.sam

collapse_isoforms_by_sam.py  --input hq_transcripts.fasta \
       -s hq_transcripts.fasta.sorted.sam \
       -c 0.99 -i 0.95 \
       --dun-merge-5-shorter \
       -o hq.no5merge
       
get_abundance_post_collapse.py \
       hq.no5merge.collapsed \
       cluster_report.csv

filter_away_subset.py hq.no5merge.collapsed

(1c) Extracting Per-Sample Counts

We create a custom classify_report.csv where each full-length (FLNC) read has the proper sample labeling (ex: CF1N). Samples are named by the convention [bear][tissue-F:fat,M:muscle,L:liver][1N:hibernation or 3N:active].

python <path_to_cupcake>/post_isoseq_cluster/demux_isoseq_with_genome.py \
        --mapped_fafq hq.no5merge.collapsed.filtered.rep.fa \
        --read_stat hq.no5merge.collapsed.read_stat.txt\
        --classify_csv classify_report.csv \
        -o hq.no5merge.collapsed.filtered.mapped_fl_count.txt

(1d) Classification and Filtering using SQANTI3

We used SQANTI3 to classify and filter the collapsed transcripts against the bear annotation.

python ~/GitHub/SQANTI3/sqanti3_qc.py \
                 hq.no5merge.collapsed.filtered_classification.filtered_lite.gtf \
                 GCF_003584765.1_ASM358476v1_genomic.gtf \
                 GCF_003584765.1_ASM358476v1_genomic.fna \
                 --fl_count hq.no5merge.collapsed.filtered.mapped_fl_count.txt \
                 -c splices_brownbear_shortread.tab \
                 --genename -n 20 --isoAnnotLite
     
             
python <path_to_sqanti3>/sqanti3_RulesFilter.py \
                 --faa hq.no5merge.collapsed.filtered_corrected.faa \
                 hq.no5merge.collapsed.filtered_classification.txt \
                 hq.no5merge.collapsed.filtered_corrected.fasta \
                 hq.no5merge.collapsed.filtered_corrected.gtf

The post-SQANTI3-filtering results (but before merging with the reference transcriptome) can be found here.

(1e) Merging with Refererence Transcriptome

We used gffcompare (v 0.11.2) to merge the new transcripts with the existing reference annotation.

gffcompare \
                -p BBEAR -o new_merge GCF_003584765.1_ASM358476v1_genomic.gff \ 
                hq.no5merge.collapsed.filtered_classification.filtered_lite_corrected.gtf

The final, merged transcriptome can be found here.

2. Mapping short-read data

(2a) Mapping to Iso-Seq reference

hisat2 index isoseq-transcriptome hq.no5merge.collapsed.filtered_classification.filtered_lite_corrected.fasta

hisat2 --threads 4 --rf -x isoseq-transcriptome -1 ${reads1} -2 ${reads2} -S ${outfile_name}

(2b) Kallisto mapping for quantification

kallisto-v0.46.1/kallisto/kallisto quant -i new_merge --rf-stranded -o ${file_name} -b 100 -t 5 ${reads}

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
isoseq_figs		isoseq_figs
.DS_Store		.DS_Store
README.md		README.md
isoseq-Rcode.md		isoseq-Rcode.md
new_merge_estcounts_shortreads_ids.txt		new_merge_estcounts_shortreads_ids.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

brownbear-isoseq-act-hib

1. Iso-Seq Analysis

(1a) Pooled Iso-Seq Analysis

(1b) Mapping and Collapsing

(1c) Extracting Per-Sample Counts

(1d) Classification and Filtering using SQANTI3

(1e) Merging with Refererence Transcriptome

2. Mapping short-read data

(2a) Mapping to Iso-Seq reference

(2b) Kallisto mapping for quantification

About

Releases

Packages

Contributors 3

jokelley/brownbear-isoseq-act-hib

Folders and files

Latest commit

History

Repository files navigation

brownbear-isoseq-act-hib

1. Iso-Seq Analysis

(1a) Pooled Iso-Seq Analysis

(1b) Mapping and Collapsing

(1c) Extracting Per-Sample Counts

(1d) Classification and Filtering using SQANTI3

(1e) Merging with Refererence Transcriptome

2. Mapping short-read data

(2a) Mapping to Iso-Seq reference

(2b) Kallisto mapping for quantification

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Packages