Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problems encountered during the construction of the Red Eared Turtle database #38

Open
zhongguodiyidao opened this issue May 14, 2023 · 4 comments

Comments

@zhongguodiyidao
Copy link

Dear author,

image
I built the library based on the method mentioned in the library building example you provided. In the end, your code will generate 8 feature files, but I only generated three,
image

And I don't know how to use the mouse motifs file that I replaced with homologous genes from turtles
image

Looking forward to your reply, thank you!

@ghuls
Copy link
Member

ghuls commented May 17, 2023

The test.regions_vs_motifs.rankings.feather file and motifs-v10nr_clust-nr2.tgi-m0.01-o0.0.tbl you can use with pySCENIC:
https://pyscenic.readthedocs.io/en/latest/installation.html#docker-podman-and-singularity-apptainer-images

Although for pySCENIC is would be better to use gene-based databases (-g option).

@zhongguodiyidao
Copy link
Author

First of all, thank you for your reply!
I'm so sorry,I've been quite confused lately. The species we are studying is the Brazilian red eared turtle,When using Python for scenic analysis, I prepared several files.Firstly, we used bedtools getfasta to create feature files for upstream and downstream 5kb and upstream 500bp, respectively.
,
0625064c51d5b376526d32530e8b98b

Furthermore, as per your previous suggestion, we have modified the gene name of the mouse in the motiftoTF file to that of the turtle,

f2de680fedc2fb06cbf26c131007e81

Finally, we replaced it https://resources.aertslab.org/cistarget/tf_lists/allTFs_mm.txt The gene name of the transfer factor.
May I ask if these three files are sufficient?
Looking forward to your reply, thank you again!

@ghuls
Copy link
Member

ghuls commented May 24, 2023

Yes, those should be sufficient.

@clarkzor
Copy link

Hi Ghuls, I don't think that these files are actually sufficient based on my experience trying to create the database for Xenopus tropicalis.

I have generated regions_vs_motifs.rankings.feather
image

To do this, I used
image

Where my .fasta was created by
image

I put gene names instead of coordinate positions because I am linking the genenames to GeneList in
image

Soo, when I upload my regions_vs_motifs.rankings.feather it looks like this
image
image

However, This is different than your sample data which looks like this
image
image

As you can see, your demo file has "features" with encode motif id, however, my feather file does NOT have the same information. No where in any documentation do you show the format of the fasta file required to generate such an output from another species, unless I have missed something.

When I further run my .feather file it looks like this:
image
image

So I get this weird warning, and then when I look at the output it looks like this:
image

BUT if I run using your demo data I get:
image

So my question is what do I need to add to my create_cistarget_motif_databases.py syntax to be able to actually link my .tbl database (with all lowercase Xenopus tropicalis genes) to my rankings.feather. I don't understand how to go from .regions_vs_motifs.rankings.feather to .genes_vs_motifs.ranking.feather.

Please help, I really really want to be able to use your awesome software on Xenopus single cell data.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants