> @mtxellrb A script for creating a track database from bigWig TF ChIP-seq data is now added :`create_cistarget_track_databases.py` #52

MatthewTCManion · 2024-08-30T18:33:24Z

          > @mtxellrb A script for creating a track database from bigWig TF ChIP-seq data is now added :`create_cistarget_track_databases.py`

https://github.com/aertslab/create_cisTarget_databases#create_cistarget_track_databasespy

Hello , I am running into an issue using this script where the .bed file with regions to score is not recognized correctly, and I have tried a few different formats with no success. For reference, here is a screenshot of my most recent attempt to run the script, as well as the format of my .bed:

REGION_BED="/data/PetrosLab/Matt/scenicplus/chipseq/tracks/fwf_gene_assignments.bed"
DATABASE_PREFIX="CellType_750bp_with_binding"
SCRIPT_DIR="/data/PetrosLab/Matt/scenicplus/create_cisTarget_databases"
TRACKS_DIR="/data/PetrosLab/Matt/scenicplus/chipseq/tracks"
TRACK_LIST="track_names.txt"


"${SCRIPT_DIR}/create_cistarget_track_databases.py" \
	-b "${REGION_BED}" \
    -T "${TRACKS_DIR}" \
    -d "${TRACK_LIST}" \
    -o "${DATABASE_PREFIX}" \
    -t 20

I assume the issue is with the format of the .bed or the genes/regions data, but I can't find what the proper format should be.

Thanks,
Matt

Originally posted by @MatthewTCManion in #17 (comment)

The text was updated successfully, but these errors were encountered:

ghuls · 2024-09-03T08:46:00Z

Your BED file is not in BED format: chr start end name:

This will fix it:

tail -n +2 fwf_gene_assignments.bed | awk -F "\t" -v 'OFS=\t' '{ print $2, $3, $4, $1; }' > fwf_gene_assignments.fixed.bed

MatthewTCManion · 2024-09-03T16:15:51Z

Your BED file is not in BED format: chr start end name:

This will fix it:
tail -n +2 fwf_gene_assignments.bed | awk -F "\t" -v 'OFS=\t' '{ print $2, $3, $4, $1; }' > fwf_gene_assignments.fixed.bed

Thank you for your reply! I tried running it with this suggestion, and the BED file now seems to be in the correct format, but the same error came up:

Please let me know if you have any additional suggestions!

Best,
Matt

MatthewTCManion · 2024-09-03T20:00:30Z

Okay so I was able to get the BED file into proper format,

and ran create_cistarget_track_databases.py,

but I am getting an issue with scoring the tracks for each of my 4 input .bw files:

ghuls · 2024-09-04T15:24:18Z

What is the output of for example the first bigWigAverageOverBed command?:

/data/PetrosLab/Conda/envs/create_cistarget_databases/bin/bigAveragedOverBed -minMax /data/PetrosLab/Matt/scenicplus/chipseq/tracks/MGE2-60_S13.cpm.norm.bw /data/PetrosLab/Matt/scenicplus/chipseq/tracks/consensus_tabbed.bed /dev/stdout

(best copy it from your slurm output as I might have made typos)

You probably should make the BED files like I suggested:

tail -n +2 fwf_gene_assignments.bed | awk -F "\t" -v 'OFS=\t' '{ print $2, $3, $4, $1; }' > fwf_gene_assignments.fixed.bed

with only 4 columns, as bigWigAverageOverBed probably does not like your 5th column.

Creating a track database also only would make some sense when you have several hundreds of tracks (even better thousands), instead of only 4.

MatthewTCManion · 2024-09-04T17:35:35Z

Thanks for your help,I dropped it to 4 columns and it seems to have ran and created the .feather rankings and scores.

MatthewTCManion · 2024-09-04T19:10:28Z

Creating a track database also only would make some sense when you have several hundreds of tracks (even better thousands), instead of only 4.

Can you clarify what you mean by this to ensure I am using this pipeline correctly?

I have run SCENIC+ using motifs and the motif database, which I understand has thousands of motifs.

My understanding is that I am creating the track database from my ChIP-seq data (one track per sample), and scoring binding in these tracks against the region BED created by running pycisTopic on my ATAC-seq data to create a database that can be used in the SCENIC+ pipeline as way to target regions where I have TF binding in my samples. Am I understanding that incorrectly?

If so, do you have any suggestions on the proper way to incorporate my TF-binding data into identification of GRNs?

Thanks,
Matt

SeppeDeWinter · 2024-09-05T11:55:15Z

Hi @MatthewTCManion

Do you want to create a single database containing both motif and ChIP-seq scores?
Or a database containing only ChIP-seq data?

The latter probably does not make a lot of sense, given that you only have few tracks calculating the enrichment values will be impossible.

The former might be a good idea, however in the context of SCENIC+ we have not tried this yet.

All the best,

Seppe

MatthewTCManion · 2024-09-05T14:12:08Z

@SeppeDeWinter Ideally, a database containing both motif and ChIP-seq scores. Thank you for the reply!

MatthewTCManion · 2024-09-05T15:09:26Z

@SeppeDeWinter Do you think it would be possible to use the outputs of a motif-discovery tool (we have successfully identified motifs in our ChIP-seq data using the MEME suite of tools) to create a motif database for SCENIC+ targeted to motifs in our ChIP-seq set?

ghuls · 2024-09-09T15:34:04Z

@MatthewTCManion That should be possible. Convert your motifs (from Homer/MEME/...) to ClusterBuster format (you can use BioPython for this if you want. Make sure that your PWM contains counts and not frequencies (else multiply by 100))) and use that together with our provide motif collection to make your own database. Later you will have to add your motif to the motif2tf table so if your motif is found, it will actually be used by the SCENIC+ analysis.

Against which TFs did you do ChIP-seq and are the motifs you obtain not in our motif collection?

MatthewTCManion · 2024-09-19T16:28:10Z

I will try that!

We're using Nkx2.1 for our ChIP-seq

ghuls · 2024-09-20T15:04:39Z

Looks like a motif for Nkx2-1 is at least in JASPAR, so it should be detected with our default motif collection already:
https://jaspar.elixir.no/matrix/MA1994.2/

MatthewTCManion · 2024-09-20T15:39:25Z

We have seen some inconsistency between the Nkx2-1 motif between different databases, so one thing we' have done is to generate the motif from our own binding data in multiple Nkx2-1 expressing tissues to hopefully capture a more consistent motif, but I agree it should at least partially resemble the JASPAR Nkx2-1 motif

HOCOMOCO

JASPAR

MatthewTCManion closed this as completed Sep 4, 2024

MatthewTCManion reopened this Sep 4, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

> @mtxellrb A script for creating a track database from bigWig TF ChIP-seq data is now added :`create_cistarget_track_databases.py` #52

> @mtxellrb A script for creating a track database from bigWig TF ChIP-seq data is now added :`create_cistarget_track_databases.py` #52

MatthewTCManion commented Aug 30, 2024

ghuls commented Sep 3, 2024

MatthewTCManion commented Sep 3, 2024

MatthewTCManion commented Sep 3, 2024 •

edited

Loading

ghuls commented Sep 4, 2024 •

edited

Loading

MatthewTCManion commented Sep 4, 2024

MatthewTCManion commented Sep 4, 2024

SeppeDeWinter commented Sep 5, 2024

MatthewTCManion commented Sep 5, 2024

MatthewTCManion commented Sep 5, 2024

ghuls commented Sep 9, 2024

MatthewTCManion commented Sep 19, 2024

ghuls commented Sep 20, 2024

MatthewTCManion commented Sep 20, 2024 •

edited

Loading

> @mtxellrb A script for creating a track database from bigWig TF ChIP-seq data is now added :create_cistarget_track_databases.py #52

> @mtxellrb A script for creating a track database from bigWig TF ChIP-seq data is now added :create_cistarget_track_databases.py #52

Comments

MatthewTCManion commented Aug 30, 2024

ghuls commented Sep 3, 2024

MatthewTCManion commented Sep 3, 2024

MatthewTCManion commented Sep 3, 2024 • edited Loading

ghuls commented Sep 4, 2024 • edited Loading

MatthewTCManion commented Sep 4, 2024

MatthewTCManion commented Sep 4, 2024

SeppeDeWinter commented Sep 5, 2024

MatthewTCManion commented Sep 5, 2024

MatthewTCManion commented Sep 5, 2024

ghuls commented Sep 9, 2024

MatthewTCManion commented Sep 19, 2024

ghuls commented Sep 20, 2024

MatthewTCManion commented Sep 20, 2024 • edited Loading

> @mtxellrb A script for creating a track database from bigWig TF ChIP-seq data is now added :`create_cistarget_track_databases.py` #52

> @mtxellrb A script for creating a track database from bigWig TF ChIP-seq data is now added :`create_cistarget_track_databases.py` #52

MatthewTCManion commented Sep 3, 2024 •

edited

Loading

ghuls commented Sep 4, 2024 •

edited

Loading

MatthewTCManion commented Sep 20, 2024 •

edited

Loading