Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question: Imputation of scores #7

Open
lima1 opened this issue Jun 12, 2019 · 4 comments
Open

Question: Imputation of scores #7

lima1 opened this issue Jun 12, 2019 · 4 comments

Comments

@lima1
Copy link

lima1 commented Jun 12, 2019

Hi,

I'm looking for a solution of a fairly straightforward problem: I have scores for all heterozygous SNPs in a pool of normals describing how the allelic fraction (not population allele frequency) deviates from the expected 0.5. There is also an error associated with each available position based on total coverage and number of samples with this SNP.

I currently have an ad hoc way of imputing a score of variants not in the pool of normal by averaging the scores of the n nearest neighbors, but a weighted running median would be better.

Sorry for the basic question, but is this something I can use GenomicScores for, or maybe make it work, maybe by including some fake data points?

Thanks in advance,
Markus

@rcastelo
Copy link
Owner

Hi Markus,
If I understand you correctly, we could incorporate the scores you have as an AnnotationHub resource available via 'getGScores()'. This is a manual process that requires parsing files and put them available in the proper format but once they are in place, then you can query those scores in an uniform way with the functions 'gscores()' and 'score()'. Is this what you were asking for?

Cheers,

robert.

@lima1
Copy link
Author

lima1 commented Jul 9, 2019

Hi Robert,

thanks for getting back to me and sorry for my late response.

Now it makes sense, I thought I missed something in the documentation about generating these data structures. Since these scores depend on many things, they would be unique to each user and their normal samples.

My question was: essentially now I have a custom GRanges with scores. Only a (small) fraction of the genome has scores associated, but I'd like to impute the scores for all requested ranges. Do you think GenomicScores is the right tool for this? Looks like not (yet?), right?

Markus

@rcastelo
Copy link
Owner

Hi,

GenomicScores currently has nothing like that but I guess it would not be that difficult to implement this feature and enable it with additional arguments to the call to 'gscores()' or 'score()', e.g., impute.method=c("none", "min", "max", "mean"), impute.distance=0L, so that every NA value could be imputed using one of the methods applied to the values observed within a physical distance expressed in bp. Is this what you are looking for?

@lima1
Copy link
Author

lima1 commented Sep 14, 2019

Hi Robert,

I'm currently benchmarking best ways of imputing the scores and get back to you. But that sounds perfect.

Markus

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants