Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Identify homopolymers-count variants #7

Open
skoren opened this issue Nov 5, 2021 · 2 comments
Open

Identify homopolymers-count variants #7

skoren opened this issue Nov 5, 2021 · 2 comments
Labels
question Further information is requested

Comments

@skoren
Copy link
Member

skoren commented Nov 5, 2021

Identify regions where the haplotypes only differ by homopolymers count and so are invisible in compressed space. These are likely to cause consensus issues when the two haplotypes are mixed into a smashed consensus.

@skoren skoren added the question Further information is requested label Dec 10, 2021
@skoren
Copy link
Member Author

skoren commented Jan 13, 2022

Some examples of poor consensus in T2T HG002 XY from string graph pipeline so check the quality in Verkko and update consensus if needed.

@skoren
Copy link
Member Author

skoren commented Feb 17, 2023

A great set of curated examples is now available here: https://github.com/marbl/HG002-issues/issues?q=is%3Aopen+is%3Aissue+label%3Apriority

There are a good number of missed variants creating a falsely-homozygous region in simple-sequence repeats. We should consider unzipping these kinds of nodes locally w/uncompressed reads or w/o simple sequence compression to try to phase the reads. The signal I expect is there.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
Status: No status
Development

No branches or pull requests

1 participant