Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use tracks #17

Open
mtxellrb opened this issue Feb 7, 2022 · 5 comments
Open

Use tracks #17

mtxellrb opened this issue Feb 7, 2022 · 5 comments

Comments

@mtxellrb
Copy link

mtxellrb commented Feb 7, 2022

Hi,

Maybe I'm getting this totally wrong, but from the README file it seems that to annotate regulatory regions for each gene or region, you can either use a motif annotation generated by cluster-buster or you can use Chip-Seq tracks instead. However, the description seems to be focused entirely on motif annotation. Could you be so kind to provide me with an pipeline example for bigWig files of TF ChIP-seq data and gene fasta files? Thanks!

Best,

Meritxell

@ghuls
Copy link
Member

ghuls commented Feb 7, 2022

Yes for now the README is focused on motifs. For tracks the script still needs to be written but conceptually it is quite similar to https://github.com/aertslab/create_cisTarget_databases/blob/master/create_cistarget_motif_databases.py, but instead of using scoring motifs with Cluster-buster for a FASTA file with regions/genes/ of interest, you need to have a BED file with your regions/genes and use bigWigAverageOverBed to get the max score per region and rank those. I might look at this code soon as I have to generate some databases myself.

@wariobrega
Copy link

@ghuls I am also trying to understand whether I can use peak files generated from my ChiPSeq data!

@ghuls
Copy link
Member

ghuls commented Feb 17, 2022

yes you can, but you need to make sure you have a lot of ChIPseq tracks in your database as else they will always be enriched in each analysis. For a cisTarget database you just need some input data that you can rank also make sure in case of ties that you randomize those rank assignment so you don't get artificial high rankings for your first regions.

@ghuls
Copy link
Member

ghuls commented Dec 12, 2022

@mtxellrb
A script for creating a track database from bigWig TF ChIP-seq data is now added :create_cistarget_track_databases.py

https://github.com/aertslab/create_cisTarget_databases#create_cistarget_track_databasespy

@MatthewTCManion
Copy link

@mtxellrb A script for creating a track database from bigWig TF ChIP-seq data is now added :create_cistarget_track_databases.py

https://github.com/aertslab/create_cisTarget_databases#create_cistarget_track_databasespy

Hello @ghuls , I am running into an issue using this script where the .bed file with regions to score is not recognized correctly, and I have tried a few different formats with no success. For reference, here is a screenshot of my most recent attempt to run the script, as well as the format of my .bed:

REGION_BED="/data/PetrosLab/Matt/scenicplus/chipseq/tracks/fwf_gene_assignments.bed"
DATABASE_PREFIX="CellType_750bp_with_binding"
SCRIPT_DIR="/data/PetrosLab/Matt/scenicplus/create_cisTarget_databases"
TRACKS_DIR="/data/PetrosLab/Matt/scenicplus/chipseq/tracks"
TRACK_LIST="track_names.txt"


"${SCRIPT_DIR}/create_cistarget_track_databases.py" \
	-b "${REGION_BED}" \
    -T "${TRACKS_DIR}" \
    -d "${TRACK_LIST}" \
    -o "${DATABASE_PREFIX}" \
    -t 20

image

I assume the issue is with the format of the .bed or the genes/regions data, but I can't find what the proper format should be.

Thanks,
Matt

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants