-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Output listing only chr1 #15
Comments
Hi Lucas, I will try to debug this with you. It looks like the previous step only generated files for the first chromosome. Can you tell me what command did you use with Jordi |
Hi, |
Okay, so I understand that you submitted a single job for the #!/bin/bash
# Input
mat_files="$1"
phenotype="$2"
# Vars
total_chr=22
# Process BAM file to generate methylation matrices
for ((chr=1; chr<=total_chr; chr++))
do
sbatch << EOJ
#!/bin/sh
#SBATCH --job-name=iME-hg19-RUN-${phenotype}
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=1
#SBATCH --cpus-per-task=24
#SBATCH --mem=100G
#SBATCH --time=30:00:00
#SBATCH -o /scratch/shared/logfiles/%u/IME-hg19-RUN_%A_${phenotype}_${chr}.out
#SBATCH -e /scratch/shared/logfiles/%u/IME-hg19-RUN_%A_${phenotype}_${chr}.err
# Dependencies
module load matlab/R2017a
module load samtools/0.1.19
# Run
informME_run.sh -q 6 --time_limit 30 ${mat_files} ${phenotype} ${chr}
EOJ
sleep 1
done This submission script can be executed as follows: ./informME_submission.sh toy_sample_normal toy_normal In any case, what are the sizes of the |
I'm not running it on a cluster, just on a regular server this is why I thought I could run it within a loop regular bash loop. Furthermore all the informME_run.sh instances ended correctly without errors, this is why I don't understand what happened. |
I see, so that confirms that something went wrong in a previous step for the rest of the chromosomes. That is why both
-rw-rw-r-- 1 [email protected] group 11M Oct 31 2017 CpGlocationChr1.mat
-rw-rw-r-- 1 [email protected] group 11M Oct 31 2017 CpGlocationChr2.mat
-rw-rw-r-- 1 [email protected] group 7.8M Oct 31 2017 CpGlocationChr3.mat
-rw-rw-r-- 1 [email protected] group 7.1M Oct 31 2017 CpGlocationChr4.mat
-rw-rw-r-- 1 [email protected] group 7.2M Oct 31 2017 CpGlocationChr5.mat
-rw-rw-r-- 1 [email protected] group 7.0M Oct 31 2017 CpGlocationChr6.mat
-rw-rw-r-- 1 [email protected] group 7.4M Oct 31 2017 CpGlocationChr7.mat
-rw-rw-r-- 1 [email protected] group 6.2M Oct 31 2017 CpGlocationChr8.mat
-rw-rw-r-- 1 [email protected] group 5.8M Oct 31 2017 CpGlocationChr9.mat
-rw-rw-r-- 1 [email protected] group 6.4M Oct 31 2017 CpGlocationChr10.mat
-rw-rw-r-- 1 [email protected] group 6.1M Oct 31 2017 CpGlocationChr11.mat
-rw-rw-r-- 1 [email protected] group 6.0M Oct 31 2017 CpGlocationChr12.mat
-rw-rw-r-- 1 [email protected] group 3.9M Oct 31 2017 CpGlocationChr13.mat
-rw-rw-r-- 1 [email protected] group 4.1M Oct 31 2017 CpGlocationChr14.mat
-rw-rw-r-- 1 [email protected] group 4.1M Oct 31 2017 CpGlocationChr15.mat
-rw-rw-r-- 1 [email protected] group 5.0M Oct 31 2017 CpGlocationChr16.mat
-rw-rw-r-- 1 [email protected] group 5.3M Oct 31 2017 CpGlocationChr17.mat
-rw-rw-r-- 1 [email protected] group 3.2M Oct 31 2017 CpGlocationChr18.mat
-rw-rw-r-- 1 [email protected] group 4.8M Oct 31 2017 CpGlocationChr19.mat
-rw-rw-r-- 1 [email protected] group 3.4M Oct 31 2017 CpGlocationChr20.mat
-rw-rw-r-- 1 [email protected] group 1.8M Oct 31 2017 CpGlocationChr21.mat
-rw-rw-r-- 1 [email protected] group 2.7M Oct 31 2017 CpGlocationChr22.mat
|
I used as fasta reference the GRCh38.76_primary_assembly
As you can see, some of them are smaller than 50M but the majority is not. |
I see, alright. So the size of the Going over your emails with Garrett, it seems that you tried running Also, can you let me know the sizes of the BAM file you used as input? |
This is the size of my bam files: |
Sounds good. So the sizes we usually see for human data are between 15-30G's. That usually translates into average coverages of x20. In our experience, WGBS data that is ~125bp paired-end reads at 15X to 20X coverage results in high quality models across most regions of interest in the genome. Can you find out what kind of data you have? WGBS, RRBS? Paired-end, single-end? Read length? Also, do you have more replicates for each phenotype? |
Hi, so I asked to the lab that prepared the libraries and they told me that they followed the protocol provided by roches at this link: |
I see yes, it looks like some sort of targeted BS. We are not familiarized with this type of data, but as long as the reads are distributed uniformly within the target regions, and the regions are much larger than 3 kbp in size, then informME might still produce meaningful results (@GarrettJenkinson chime in if you think differently please). In fact, in the same link you shared with me they provide a way to download a BED file containing the targeted regions (130912_HG19_CpGiant_4M_EPI.bed). Can you download this file and look at the distribution in size of the regions? If indeed, our assumptions of uniformly distributed reads in the modeled 3 kbp windows holds, then you can try using the 3 unaffected samples to build a single pooled model for the unaffected phenotype, and using the 2 affected samples to build a pooled model for the affected phenotype. Hopefully, in doing so, the coverage will triplicate and duplicate respectively, and informME might be able to model more regions. However, let's try to see how the targeted regions look like first. |
Hi Lucas, so Jordi has asked all the right questions while I've been dealing with movers, etc. to get at the heart of your issue. I will chime in with my thoughts: The fact that your data is RRBS/capture rather than WGBS is somewhat problematic for our pipeline, since it has been designed to work with WGBS coverage patterns. See our FAQs for a discussion of this issue. If you do decide to continue to use informME on this data, we would be curious as to how well it works for you, since there is a lot of RRBS data out there, but for the most part you are blazing new trail here. The RRBS issue does not fully explain the problem you are having here, though, because it looks like your matrices files are large enough to produce some models on the other chromosomes if they produced models for chr1. So please let us know the results of your informME_run.sh on a single chromosome (if you are yet to do so, maybe chr2 is a good choice since the matrices files are bigger there). Remember to delete the "old" small ".mat" files that were produced by your previous informME_run submission, since informME will check if these files exist and not do anything if they find them there (which is helpful on clusters, but not so much in your case). One thing that concerns me is that you have listed a gcc compiler that is not compatible with your matlab version, see here: https://www.mathworks.com/support/compilers.html while this may not pose a problem (you did get results for chr1, which I would not expect if the binaries did not compile correctly), it very well could lead to strange/unpredictable behaviors. If the toy model does not work properly, then I can help you install a local gcc version compatible with matlab, since I did this on my ubuntu virtual machine that I use for testing informME. Returning to the issue of coverage, it is possible in theory that the informME_run stage chooses not to build any models in a given chromosome for RRBS depending on how the data is distributed along the genome, but it seems unlikely to me that you would randomly have models built on chr1 and none of the others, unless the design of your RRBS has denser capture on chr1. Also I don't recall if you ever ran the "toy model" that tests your installation, but that could be helpful in turning up any issues with the install, etc. We also understand, however, if you choose not to proceed, since the pipeline is not tailored to your data, just let us know what you decide so we can close this issue, etc. |
Thank you guys, I would like to try using it since is the best tool (mathematically speaking) I've seen so far, so I'm going to try the toy data set and I'm also going to try building a model for my data for a single chromosome (just curious to see what happens). |
Thanks for the compliments :) I hope informME will work for your data even though it is not tailored to it. Regarding truncating the reference genome to be only the targeted regions: I think this would only work if you treated each contiguous targeted region as its own "chromosome", because otherwise, if you just removed gaps between regions, then you would have incorrect distances between CpG sites that come from two different regions, which would lead to inaccurate models. Treating each region as a "chromosome" would work just fine, but I imagine it would be too hackish and clunky to be worth it. Really, there are a lot of tweaks required to "do targeted/RRBS modeling right", which is why we don't recommend informME right now for it. What you will probably end up with is a few good models in locations where the targeting lines up nicely with informME's 3kb estimation windows, but you will probably miss a lot of CpG sites that do have targeted data. Please do update us with the target regions and any results you get from your testing, as we are interested in exploring this RRBS issue with you and getting you up and running. |
After running both "singleMethAnalysisToBed" and "diffMethAnalysisToBed", it gives me output files listing only results for chr1.
I tried specifying as minchr 1 and maxchr 22 (even if it's the default option) but it still reports only results for chr1.
I also tried forcing it to analyze the chr5 (just a random number different from chr1) and it gives me all empty files.
Looking at the log provided by the tool while running there is no error and it says "Command succesful" at the end.
The exact commands I used are:
1)
for i in A C; do singleMethAnalysisToBed.sh --MC --ESI --MSI $i; done
where A and C are my phenotypes (affected and control)
diffMethAnalysisToBed.sh --MC --ESI --MSI A C
I'm running everything on an Ubuntu Server 14.04.5 LTS (32 CPUs, 64GB RAM)
-Samtools 0.1.19
-Matlab 2018a
-informME 0.3.2
-g++ (Ubuntu 4.8.4-2ubuntu1~14.04.3) 4.8.4
thank you in advance for your help
The text was updated successfully, but these errors were encountered: