Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Missing start=setting line in the core file generated by uniwig #43

Open
saanikat opened this issue Oct 31, 2024 · 7 comments
Open

Missing start=setting line in the core file generated by uniwig #43

saanikat opened this issue Oct 31, 2024 · 7 comments
Labels
bug Something isn't working likely solved uniwig

Comments

@saanikat
Copy link
Member

wigToBigWig gives me the following issue when I provide the .wig files generated by uniwig (dev branch):

Missing start= setting line 879394241 of /scratch/bam8pm/tlgl_atac/uniwig_wigs/_core.wig

It seems like some kind of a header line is missing in the core wig file generated by uniwig. The command I used to generate the wiggle files was:

cargo run uniwig -f /scratch/bam8pm/tlgl_atac/combined/combined_chrsort.bed -c /scratch/bam8pm/tlgl_atac/hg38.chrom.sizes -m 1 -s 1 -l /scratch/bam8pm/tlgl_atac/ -y wig -t bed

However, when I reset the dev branch to commit : 884e7b6 there is no issue with wigToBigWig indicating that the header start=setting line is generated in this case.

To reproduce this error, I have provided the bed file and the chrom sizes file :
/project/shefflab/brickyard/results_pipeline/tlgl_atac/combined_chrsort.bed
/project/shefflab/brickyard/results_pipeline/tlgl_atac/hg38.chrom.sizes

The command I used for wigToBigWig is:

wigToBigWig -clip /scratch/bam8pm/tlgl_atac/uniwig_wigs/_core.wig /scratch/bam8pm/tlgl_atac/hg38.chrom.sizes /scratch/bam8pm/tlgl_atac/tracks/all_core.bw
@nleroy917
Copy link
Member

It seems like some kind of a header line is missing in the core wig file generated by uniwig. The command I used to generate the wiggle files was:

What does the header of the bad .wig file look like?

Also, unrelated but you should run with --release. It will be much faster. I like to just do:

cargo install --profile=release --path=gtars/

Then you can just run gtars:

gtars uniwig ...

@saanikat
Copy link
Member Author

saanikat commented Oct 31, 2024

The line which is causing an issue (line 879394241) looks like this:

fixedStep chrom=chr4_GL000008v2_random start=0 step=1

For reference, this is what the other fixedStep lines in the same file look like:

fixedStep chrom=chr1 start=803560 step=1
fixedStep chrom=chr1_KI270706v1_random start=2752 step=1
fixedStep chrom=chr1_KI270707v1_random start=23913 step=1

wigToBigWig isn't throwing an error for these lines though.

@donaldcampbelljr
Copy link
Member

Looks like it might be this commit:
9f96c8d

start must be > 0 for wigtobigwig, correct?

I reverted a change where I was clamping the start position for the core counts. It was causing other issues but I don't remember now what issues it fixed. You could re-add that line and see if it fixes your issue.

Alternatively, (and more recommended) you can use the newest changes in this branch:
https://github.com/databio/gtars/tree/dev_bam_bedgraph_bigtools

And simply set the output type to bw, skipping the intermediate wiggle step.

@donaldcampbelljr donaldcampbelljr added igd uniwig bug Something isn't working and removed igd labels Dec 12, 2024
@donaldcampbelljr
Copy link
Member

donaldcampbelljr commented Dec 12, 2024

This surfaced again. It is happening for core files. Reproducible with simple example where chroms start at 0:

chr1	0	360
chr1	0	369
chr1	0	374
chr1	0	394
chr1	10039	10539
chr1	100861	101361

Output for core file that begins like so:

fixedStep chrom=chr1 start=0 step=1
5
5
5
5
5
5
5

I believe the reason we removed the clamped core position for the core counts is that if you have a wiggle file that spans the entire chromosome from the chrom.sizes file and shift the position in the header by 1, the wiggle file now extends past the chromosome range which causes issues with wigToBigWig.

The proper fix for this is to realize that we are converting a 0 based Bed file to a 1 -based wiggle file.

However, to do this we would need to shift before/during counting. BUT we only want to shift for wiggle outputs, not for bedGraph or npy outputs. Therefore, implementing this will require some refactoring.

donaldcampbelljr added a commit that referenced this issue Dec 12, 2024
@donaldcampbelljr
Copy link
Member

Attempted fix above, need to refactor added code to remove duplication

  • refactor above commit for less duplication

@donaldcampbelljr
Copy link
Member

I also force an upper bound for the number of counts written to a wiggle file based on chrom size. This was added for downstream tool interoperability.

@donaldcampbelljr
Copy link
Member

Per feedback from @saanikat , the above commits appear to have fixed this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working likely solved uniwig
Projects
None yet
Development

No branches or pull requests

3 participants