Filtering of munge_sumstats.py #439

yesyj-yuns · 2024-06-25T07:43:04Z

Hi,

I tried to convert my GWAS data into a sumstats file using munge_sumstats.py.

When you look at https://github.com/bulik/ldsc/wiki/Summary-Statistics-File-Format#sumstats , it says that it filters ambiguous SNPs or non-SNP variants.
Could you please let me know why you filter ambiguous SNP when converting GWAS summary statistics to summary.sumstats?

And #375
You answered the question above, but I still don't understand it, so I'm asking you.
My GWAS summary statistics (logistic regression data) were generated using plink2. In this data, the effect allele, A1, is an alternative allele, not a reference allele.
In this case, can I know whether the A1 allele in the file to be entered in munge_sumstats.py should be a reference allele or an effect allele?

Thank you very much:)

aksarkar · 2024-06-25T17:14:26Z

@yesyj-yuns It is more difficult to detect errors when computing GWAS associations or meta-analyses for strand ambiguous variants. The simplest strategy to avoid introducing noise into ldsc from such errors is to remove ambiguous variants.

For your second question, ldsc flips alleles and effect sizes as necessary to ensure that A1 is ref, A2 is alt, and effect allele is alt. As long as A1 in the input is consistently ref or consistently alt, you will get the correct answer.

yesyj-yuns · 2024-06-26T00:10:42Z

Thank you so much for your response.

I'm really sorry, but I'm inquiring again because I don't understand the answer to the second question yet.

If you look at the descript_cname inside the munge_sumstats.py, the statistics are defined based on A1 as shown below.

  'A1': 'Allele 1, interpreted as ref allele for signed sumstat.',
  'A2': 'Allele 2, interpreted as non-ref allele for signed sumstat.',
  'Z': 'Z-score (0 --> no effect; above 0 --> A1 is trait/risk increasing)',
  'OR': 'Odds ratio (1 --> no effect; above 1 --> A1 is risk increasing)',
  'BETA': '[linear/logistic] regression coefficient (0 --> no effect; above 0 --> A1 is trait/risk increasing)',
  'LOG_ODDS': 'Log odds ratio (0 --> no effect; above 0 --> A1 is risk increasing)',

In my GWAS file, A1 is effect allele, but not ref allele.
In this case, I would like to ask if I can put the A1 allele of Input GWAS as an effect allele.

If not, I would like to check whether A1 allele in the Input GWAS summary file should be set as a reference allele and the statistics (beta, OR) of my GWAS should also be multiplied by -1 according to the ref allele.

I would like to thank you once again.
Best regards.

aksarkar · 2024-06-26T18:46:27Z

I have misspoken since I didn't read the source code correctly: you do need to make sure that A1 is ref and multiply beta by -1 accordingly.

yesyj-yuns · 2024-07-05T04:30:59Z

@aksarkar
Thank you for your answer.

If you don`t mind, may I ask why "munge_sumstats.py" should set A1 to ref?

aksarkar · 2024-07-17T15:31:27Z

@yesyj-yuns LD scores are computed assuming that A1 is ref, and the summary statistics must be consistent with the LD scores.

Al-Murphy mentioned this issue Aug 15, 2024

Option to specify effect allele as A1 instead. Al-Murphy/MungeSumstats#160

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Filtering of munge_sumstats.py #439

Filtering of munge_sumstats.py #439

yesyj-yuns commented Jun 25, 2024

aksarkar commented Jun 25, 2024

yesyj-yuns commented Jun 26, 2024 •

edited

Loading

aksarkar commented Jun 26, 2024

yesyj-yuns commented Jul 5, 2024 •

edited

Loading

aksarkar commented Jul 17, 2024

Filtering of munge_sumstats.py #439

Filtering of munge_sumstats.py #439

Comments

yesyj-yuns commented Jun 25, 2024

aksarkar commented Jun 25, 2024

yesyj-yuns commented Jun 26, 2024 • edited Loading

aksarkar commented Jun 26, 2024

yesyj-yuns commented Jul 5, 2024 • edited Loading

aksarkar commented Jul 17, 2024

yesyj-yuns commented Jun 26, 2024 •

edited

Loading

yesyj-yuns commented Jul 5, 2024 •

edited

Loading