Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FASTA Input to Generate SCENIC+ Databases for Rat #41

Open
jonhsussman opened this issue Aug 13, 2023 · 2 comments
Open

FASTA Input to Generate SCENIC+ Databases for Rat #41

jonhsussman opened this issue Aug 13, 2023 · 2 comments

Comments

@jonhsussman
Copy link

Hello,

Thanks for the helpful tutorial. I am trying to generate the relevant database files to use SCENIC+ for rat genome. I've generated the motifs table by using the motifs-v10-nr.mgi-m0.00001-o0.0.tbl file and replacing the gene_name with the homologous gene for rat. But it is a little unclear what to use as the other input files for create_cistarget_motif_databases.py. For the motif IDs I am using all the motifs available under "singletons." But I am not sure what is the most correct to use for the fasta filename. Currently, I am trying this with the whole genome (Rattus_norvegicus.mRatBN7.2.dna.toplevel.fa) but it seems that it would make sense for these regions to be filtered in some way.

Is there an example of what was used to generate the databases for mouse/human or other species?

Thanks,
Jonathan

@ghuls
Copy link
Member

ghuls commented Aug 16, 2023

You can use a BED file with pseudobulk peaks from your scATAC data and use bedtools to make the FASTA file:

bedtools getfasta \
    -fi Rattus_norvegicus.mRatBN7.2.dna.toplevel.fa \
    -bed pseudobulk_peaks_from_scATAC.bed \
    -fo rattus_norvegicus.pseudobulk_peaks.fa

@jonhsussman
Copy link
Author

jonhsussman commented Aug 16, 2023 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants