Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to generate the gtex_ctrl_db and tcga_matched_control_junction_count.h5ad, it is a good control. #41

Open
renyuan001 opened this issue May 10, 2024 · 5 comments

Comments

@renyuan001
Copy link

I think that the control is the best. How to generate the gtex_ctrl_db and tcga_matched_control_junction_count.h5ad, Maybe a number of *.fasq in tcga and GTEx analysed by SNAF first?

@frankligy
Copy link
Owner

Yes we ran AltAnalyze on GTEx and TCGA matched control first to get the count junction matrix, then we convert them into h5ad file. I saved scripts I used for the conversion (https://github.com/frankligy/SNAF/tree/main/images/db_build).

For generating your own control dataset, you can follow other issues post (#34).

Thank you,
Frank

@renyuan001
Copy link
Author

I want to use the both in the snaf.initialize step, and an error occured,

db_dir = '/home/ry-03/data/SNAF/data'
netMHCpan_path = '/home/ry-03/data/SNAF/netMHCpan-4.1/netMHCpan'
tcga_ctrl_db = ad.read_h5ad(os.path.join(db_dir,'controls','tcga_matched_control_junction_count.h5ad'))
gtex_ctrl_db = ad.read_h5ad(os.path.join(db_dir,'controls','GTEx_junction_counts.h5ad'))
add_control = {'tcga_control':tcga_ctrl_db,'gtex_ctrl':gtex_ctrl_db}
snaf.initialize(df=df,db_dir=db_dir,binding_method='netMHCpan',software_path=netMHCpan_path,add_control=add_control)
2024-05-12 19:15:40 starting initialization
Current loaded gtex cohort with shape (56692, 2629)
Adding cohort tcga_control with shape (54813, 705) to the database
now the shape of control db is (56999, 3334)
Traceback (most recent call last):
File "", line 1, in
File "/home/ry-03/miniconda3/envs/SNAF/lib/python3.7/site-packages/snaf/init.py", line 52, in initialize
adata = gtex_configuration(df,gtex_db,t_min,n_max,normal_cutoff, tumor_cutoff, normal_prevalance_cutoff, tumor_prevalance_cutoff, add_control)
File "/home/ry-03/miniconda3/envs/SNAF/lib/python3.7/site-packages/snaf/gtex.py", line 65, in gtex_configuration
assert len(set(control.var_names).intersection(tissue_dict.keys())) == 0
AssertionError

Maybe I need to filtered one by one, not together?

@renyuan001
Copy link
Author

But it worked as:

add_control = {'tcga_control':tcga_ctrl_db}
snaf.initialize(df=df,db_dir=db_dir,binding_method='netMHCpan',software_path=netMHCpan_path,add_control=add_control)
2024-05-12 20:09:18 starting initialization
Current loaded gtex cohort with shape (56692, 2629)
Adding cohort tcga_control with shape (54813, 705) to the database
now the shape of control db is (56999, 3334)
2024-05-12 20:10:10 finishing initialization

@frankligy
Copy link
Owner

Yes, the reason is the same as this (#40), GTEx control is built in so you don't need to additionally add it, only TCGA control needs to be added as add_control.

@renyuan001
Copy link
Author

Thank you so much for your patient and thoughtful answers.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants