-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
> @mtxellrb A script for creating a track database from bigWig TF ChIP-seq data is now added :create_cistarget_track_databases.py
#52
Comments
Your BED file is not in BED format: This will fix it:
|
Thank you for your reply! I tried running it with this suggestion, and the BED file now seems to be in the correct format, but the same error came up: Please let me know if you have any additional suggestions! Best, |
What is the output of for example the first
(best copy it from your slurm output as I might have made typos) You probably should make the BED files like I suggested:
with only 4 columns, as bigWigAverageOverBed probably does not like your 5th column. Creating a track database also only would make some sense when you have several hundreds of tracks (even better thousands), instead of only 4. |
Thanks for your help,I dropped it to 4 columns and it seems to have ran and created the .feather rankings and scores. |
Can you clarify what you mean by this to ensure I am using this pipeline correctly? I have run SCENIC+ using motifs and the motif database, which I understand has thousands of motifs. My understanding is that I am creating the track database from my ChIP-seq data (one track per sample), and scoring binding in these tracks against the region BED created by running pycisTopic on my ATAC-seq data to create a database that can be used in the SCENIC+ pipeline as way to target regions where I have TF binding in my samples. Am I understanding that incorrectly? If so, do you have any suggestions on the proper way to incorporate my TF-binding data into identification of GRNs? Thanks, |
Do you want to create a single database containing both motif and ChIP-seq scores? The latter probably does not make a lot of sense, given that you only have few tracks calculating the enrichment values will be impossible. The former might be a good idea, however in the context of SCENIC+ we have not tried this yet. All the best, Seppe |
@SeppeDeWinter Ideally, a database containing both motif and ChIP-seq scores. Thank you for the reply! |
@SeppeDeWinter Do you think it would be possible to use the outputs of a motif-discovery tool (we have successfully identified motifs in our ChIP-seq data using the MEME suite of tools) to create a motif database for SCENIC+ targeted to motifs in our ChIP-seq set? |
@MatthewTCManion That should be possible. Convert your motifs (from Homer/MEME/...) to ClusterBuster format (you can use BioPython for this if you want. Make sure that your PWM contains counts and not frequencies (else multiply by 100))) and use that together with our provide motif collection to make your own database. Later you will have to add your motif to the motif2tf table so if your motif is found, it will actually be used by the SCENIC+ analysis. Against which TFs did you do ChIP-seq and are the motifs you obtain not in our motif collection? |
I will try that! We're using Nkx2.1 for our ChIP-seq |
Looks like a motif for Nkx2-1 is at least in JASPAR, so it should be detected with our default motif collection already: |
We have seen some inconsistency between the Nkx2-1 motif between different databases, so one thing we' have done is to generate the motif from our own binding data in multiple Nkx2-1 expressing tissues to hopefully capture a more consistent motif, but I agree it should at least partially resemble the JASPAR Nkx2-1 motif |
Hello , I am running into an issue using this script where the .bed file with regions to score is not recognized correctly, and I have tried a few different formats with no success. For reference, here is a screenshot of my most recent attempt to run the script, as well as the format of my .bed:
I assume the issue is with the format of the .bed or the genes/regions data, but I can't find what the proper format should be.
Thanks,
Matt
Originally posted by @MatthewTCManion in #17 (comment)
The text was updated successfully, but these errors were encountered: