Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Retreiving nucleotide coding sequence and genomic origin for peptides #45

Open
spvensko opened this issue Jun 13, 2024 · 1 comment
Open

Comments

@spvensko
Copy link

Hello,

My understanding is that the coord column contains the coordinates of the splicing event (e.g. the coordinates may be that of a skipped exon) and not the genomic coordinates of the sequence encoding the peptide. With that in mind, is it possible to retrieve the coding sequence and genomic origin of each peptide (as in, the actual coordinates where that peptide is encoded) out of the current outputs? It appears the coding sequence is utilized at https://github.com/spvensko/SNAF/blob/v0.7.0/snaf/snaf.py#L1190, but I wanted to check with you before I try to develop my own solution.

Thanks,
Steven V.

@frankligy
Copy link
Owner

frankligy commented Jun 14, 2024

Hi @spvensko,

I believe I have the solution for this. See below examples:

jcmq = snaf.JunctionCountMatrixQuery.deserialize('result/after_prediction.p')
uid = 'ENSG00000065609:E45.1-E47.1'
nj_list = jcmq.results[0]
for nj in nj_list:
    if nj is not None and nj.uid == uid:
        print(nj.junction)

You will get below junction sequence with , to delimit the two ends of splicing junction:

CCTCCTGCTGGGACAGGCATGCCCATGATGCCTCAGCAGCCGGTCATGTTTGCACAGCCCATGATGAGGCCCCCCTTTGGAGCTGCCGCTGTACCTGGCACGCAG,CTGCAATATTTGTGACTGAATAGGAAAATAAATGAGTTTGGAGACTTCAAATAAGATTGATGCTGAGTTTC

Let's BLAT the first and second half in UCSC genome browser:

First
Screenshot 2024-06-14 at 12 50 23 PM

Second
Screenshot 2024-06-14 at 12 50 40 PM

But you can also derive that using other codes, I shared one solution in this issue (#31.)

Hopefully this helps a bit,
Frank

@spvensko spvensko changed the title Retreiving nucleotide coding sequence and genomic oriogin for peptides Retreiving nucleotide coding sequence and genomic origin for peptides Jun 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants