Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Filtering of munge_sumstats.py #439

Open
yesyj-yuns opened this issue Jun 25, 2024 · 5 comments
Open

Filtering of munge_sumstats.py #439

yesyj-yuns opened this issue Jun 25, 2024 · 5 comments

Comments

@yesyj-yuns
Copy link

Hi,

I tried to convert my GWAS data into a sumstats file using munge_sumstats.py.

When you look at https://github.com/bulik/ldsc/wiki/Summary-Statistics-File-Format#sumstats , it says that it filters ambiguous SNPs or non-SNP variants.
Could you please let me know why you filter ambiguous SNP when converting GWAS summary statistics to summary.sumstats?

And #375
You answered the question above, but I still don't understand it, so I'm asking you.
My GWAS summary statistics (logistic regression data) were generated using plink2. In this data, the effect allele, A1, is an alternative allele, not a reference allele.
In this case, can I know whether the A1 allele in the file to be entered in munge_sumstats.py should be a reference allele or an effect allele?

Thank you very much:)

@aksarkar
Copy link

@yesyj-yuns It is more difficult to detect errors when computing GWAS associations or meta-analyses for strand ambiguous variants. The simplest strategy to avoid introducing noise into ldsc from such errors is to remove ambiguous variants.

For your second question, ldsc flips alleles and effect sizes as necessary to ensure that A1 is ref, A2 is alt, and effect allele is alt. As long as A1 in the input is consistently ref or consistently alt, you will get the correct answer.

@yesyj-yuns
Copy link
Author

yesyj-yuns commented Jun 26, 2024

Thank you so much for your response.

I'm really sorry, but I'm inquiring again because I don't understand the answer to the second question yet.

If you look at the descript_cname inside the munge_sumstats.py, the statistics are defined based on A1 as shown below.

  'A1': 'Allele 1, interpreted as ref allele for signed sumstat.',
  'A2': 'Allele 2, interpreted as non-ref allele for signed sumstat.',
  'Z': 'Z-score (0 --> no effect; above 0 --> A1 is trait/risk increasing)',
  'OR': 'Odds ratio (1 --> no effect; above 1 --> A1 is risk increasing)',
  'BETA': '[linear/logistic] regression coefficient (0 --> no effect; above 0 --> A1 is trait/risk increasing)',
  'LOG_ODDS': 'Log odds ratio (0 --> no effect; above 0 --> A1 is risk increasing)',

In my GWAS file, A1 is effect allele, but not ref allele.
In this case, I would like to ask if I can put the A1 allele of Input GWAS as an effect allele.

If not, I would like to check whether A1 allele in the Input GWAS summary file should be set as a reference allele and the statistics (beta, OR) of my GWAS should also be multiplied by -1 according to the ref allele.

I would like to thank you once again.
Best regards.

@aksarkar
Copy link

I have misspoken since I didn't read the source code correctly: you do need to make sure that A1 is ref and multiply beta by -1 accordingly.

@yesyj-yuns
Copy link
Author

yesyj-yuns commented Jul 5, 2024

@aksarkar
Thank you for your answer.

If you don`t mind, may I ask why "munge_sumstats.py" should set A1 to ref?

@aksarkar
Copy link

@yesyj-yuns LD scores are computed assuming that A1 is ref, and the summary statistics must be consistent with the LD scores.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants