-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
None of ['query'] are in the columns #32
Comments
Hi @haoqing12, Thanks for reaching out! This issue has been brought up before (#25), and as I illustrated in that thread, this error basically says the whole neoantigen prediction step fails, and all the In terms of why the prediction would fail, currently two things I observed are: [1] make sure netMHCpan path is set correctly I have more detailed explanation in issue 25 (the link I shared above). Would you mind checking if either of these apply to your case? And let me know if not, then we will dig further into that. Best, |
Thank you very much, I ignored the ‘HLA-’ prefix. Now, it‘s working. But, I met a new error. Traceback (most recent call last): Have you experienced this problem? |
Hi @haoqing12, I never ran SNAF or DeepImmuno on GPU node, because DeepImmuno is quite lightweight and can finish on CPU in seconds. I think the issue here is either since SNAF is memory-intensive (I am not proud of that), the GPU you were running doesn't have that much cpu memory, or you run DeepImmuno on GPU and it exceeds the GPU memory (I never tested how much GPU memory It will need if ran on GPU). I guess my suggestion is to run on a CPU node and see if the issue will go away, in terms of CPU memory, I did a test for another issue brought up (#27) using netMHCpan. This can serve as a guidance for memory setup. Let me know if the issue persists! Best, |
I am immensely grateful for your invaluable guidance. I've run through the whole pipeline, except for the B-cell antigen prediction. Following your suggestion, I solved the memory problem by running the pipeline on a CPU node and adjusting the parameter t_min to 200. However, I still have a few minor questions: 1, Interpretation of result/T_candidates/T_antigen_candidates_all.txt: 2, If I want to directly open a job that has already been run, what file do I need to load. 3, The detailed sequences about specific uid. Thank you very, very much for your time and continuous support. I truly appreciate your dedication to assisting others. Best regards, |
Hi @haoqing12, Glad you are able to get it to run and these are all good questions, I'll add the explanations to the doc as well. [1] [2] Sorry for not explaining that clearly, the issue is we still need to run the # read in the splicing junction matrix
df = pd.read_csv('/user/ligk2e/altanalyze_output/ExpressionInput/counts.original.pruned.txt',index_col=0,sep='\t')
# database directory (where you extract the reference tarball file) and netMHCpan folder
db_dir = '/user/ligk2e/download'
netMHCpan_path = '/user/ligk2e/netMHCpan-4.1/netMHCpan'
# demonstrate how to add additional control database, see below note for more
tcga_ctrl_db = ad.read_h5ad(os.path.join(db_dir,'controls','tcga_matched_control_junction_count.h5ad'))
gtex_skin_ctrl_db = ad.read_h5ad(os.path.join(db_dir,'controls','gtex_skin_count.h5ad'))
add_control = {'tcga_control':tcga_ctrl_db,'gtex_skin':gtex_skin_ctrl_db}
# initiate
snaf.initialize(df=df,db_dir=db_dir,binding_method='netMHCpan',software_path=netMHCpan_path,add_control=add_control)
# visualize
jcmq = snaf.JunctionCountMatrixQuery.deserialize('result/after_prediction.p')
jcmq.visualize(uid='ENSG00000167291:E38.6-E39.1',sample='TCGA-DA-A1I1-06A-12R-A18U-07.bed',outdir='./result')
# interactive
snaf.gtex_visual_combine_plotly(uid=uid,outdir='result_new/common',norm=False,tumor=df)
# static
dff = snaf.gtex_visual_combine(uid=uid,outdir='Frank_inspection',norm=False,tumor=df,group_by_tissue=False) [3] Let me know if what I am suggesting is not you were asking for, but if you are interesting in the flanking sequence, you can take a look at this issue (#31), if you are interested in obtaining the whole isoform sequence associated with Junction, you can refer to this issue (#22). Or if you have more customized demand, you can always take the chromosome coordinates reported in the result and query it in UCSC genome browser. If none of above is what you asked, let me know and I can further clarify. Thank you, |
Hi @haoqing12, Both of your observations are correct and I can clarify: [1] SNAF in silico translation is slightly different from the code I provided to (#31), that was meant to be a quick solution to check the flanking sequence. SNAF in silico translation rules are illustrated below using the specific example you brought up with: That explains why the peptide But of course you will wonder, then how do you explain the preceding stop codon associated with this translation phase? My rationale (feel free to disagree) to include these in the output bases on the fact that a lot of tumor antigens are deriving from non-canonical short ORF with non-ATG start codon, I want to point you to a recently published paper (https://www.nature.com/articles/s41467-024-46240-9 ) and if you navigate to the ![]() One thing I think we have to know is, an antigen doesn't need to be a functional stable protein to be able to presented by HLA, and the fact is it may be exact opposite, unstable degradable short peptides will rapidly be lysed by proteosome, which is exactly the mechanisms for presenting HLA-I presented peptides. So these peptides won't be detected using proteomes because they don't exist but will be detected in HLA-I pull-down immunopeptidome (https://aacrjournals.org/cancerimmunolres/article/8/8/1018/470266/Identification-of-the-Cryptic-HLA-I). [2] For this issue, I think the reason is only subset of NeoJunction whose full-length isoform can be inferred from the present database and pass certain criteria (no NMD, transcript is annotated as protein-coding etc). For instance, the very example you showed, probably won't be able to have a precise full-length isoform prediction by SNAF because I can not see overlapping documented transcripts with that junction on UCSC genome browser. It doesn't necessarily mean no full-length isoform exist but just the current database is limited, full-length long-read sequencing may provide more rich dataset for us to better annotate such junctions. Best, |
Thank you very much for your explanation, especially the fact that stabilizing proteins are not necessary for antigen presentation, which has enlightened me. Regarding indel, I usually only chose NMD-escaped mutations as a source of neoantigens previously. Now I need to revisit this. For SNAF, I have no more questions, and this issue can be closed. Best regards, |
hi, your tool is wonderful and your tutorials are very detailed. However, I encountered this error when running the T antigen workflow.
In [14]: snaf.JunctionCountMatrixQuery.generate_results(path='./result/after_prediction.p',outdir='./result')
...:
adding gene symbol
KeyError Traceback (most recent call last)
in
----> 1 snaf.JunctionCountMatrixQuery.generate_results(path='./result/after_prediction.p',outdir='./result')
~/anaconda3/envs/SNAF/lib/python3.7/site-packages/snaf/snaf.py in generate_results(path, outdir, criterion)
454 # add additional attributes
455 df = pd.read_csv(os.path.join(outdir,'frequency_stage{}_verbosity1_uid.txt'.format(stage)),sep='\t',index_col=0)
--> 456 enhance_frequency_table(df,True,True,outdir,'frequency_stage{}_verbosity1_uid_gene_symbol_coord_mean_mle.txt'.format(stage))
457 # report candidates
458 if stage == 3:
~/anaconda3/envs/SNAF/lib/python3.7/site-packages/snaf/snaf.py in enhance_frequency_table(df, remove_quote, save, outdir, name)
1401 '''
1402 print('adding gene symbol')
-> 1403 df = add_gene_symbol_frequency_table(df=df,remove_quote=remove_quote)
1404 print('adding chromosome coordinates') [0/1885]
1405 df = add_coord_frequency_table(df,remove_quote=False)
~/anaconda3/envs/SNAF/lib/python3.7/site-packages/snaf/downstream.py in add_gene_symbol_frequency_table(df, remove_quote)
892 df['samples'] = [literal_eval(item) for item in df['samples']]
893 ensg_list = [item.split(',')[1].split(':')[0] for item in df.index]
--> 894 symbol_list = ensemblgene_to_symbol(ensg_list,'human')
895 df['symbol'] = symbol_list
896 return df
~/anaconda3/envs/SNAF/lib/python3.7/site-packages/snaf/downstream.py in ensemblgene_to_symbol(query, species)
920 import mygene
921 mg = mygene.MyGeneInfo()
--> 922 out = mg.querymany(query,scopes='ensemblgene',fileds='symbol',species=species,returnall=True,as_dataframe=True,df_index=True)
923
924 df = out['out']
~/anaconda3/envs/SNAF/lib/python3.7/site-packages/biothings_client/base.py in _querymany(self, qterms, scopes, **kwargs)
597
598 if dataframe:
--> 599 out = self._dataframe(out, dataframe, df_index=df_index)
600 li_dup_df = DataFrame.from_records(li_dup, columns=["query", "duplicate hits"])
601 li_missing_df = DataFrame(li_missing, columns=["query"])
~/anaconda3/envs/SNAF/lib/python3.7/site-packages/biothings_client/base.py in _dataframe(obj, dataframe, df_index)
170 df = DataFrame.from_dict(obj)
171 if df_index:
--> 172 df = df.set_index("query")
173 return df
174
~/anaconda3/envs/SNAF/lib/python3.7/site-packages/pandas/util/_decorators.py in wrapper(*args, **kwargs)
309 stacklevel=stacklevel,
310 )
--> 311 return func(*args, **kwargs)
312
313 return wrapper
~/anaconda3/envs/SNAF/lib/python3.7/site-packages/pandas/core/frame.py in set_index(self, keys, drop, append, inplace, verify_integrity)
5449
5450 if missing:
-> 5451 raise KeyError(f"None of {missing} are in the columns")
5452
5453 if inplace:
KeyError: "None of ['query'] are in the columns"
================================
now, the result dir have these files:
Any help you can provide would be greatly appreciated.
The text was updated successfully, but these errors were encountered: