-
Notifications
You must be signed in to change notification settings - Fork 15
I don’t know what uc031tcg.1 is (but I do understand PCAT1, I think that naming convention is RefSeq, are the others also?).
A: I am trying to re-build gene symbols by collapsing transcripts into genes, when transcripts do not overlap with gene symbols, they get named as transcript names – in this case something like uc031tcg.1 – this is UCSC ID. See, udf_GeneSymbol for details. This part of the script is quite heavy and we are working on it.
Q: Can you tell me how to interpret the wavy lines that cross the plot (for text in the figure legend)?
So far I am using this sentence: “The colored lines spanning the plotting region indicate the extent of LD for the lead SNPs with the same color designation, where the height of the line represents __ and the length of the line represents ___.” Can you send me a better sentence to describe how these lines should be interpreted if I am not on the right track here?
A: It is a loess smoothing for matching hit SNP. If there are 2 SNPs red and green, then we will have 2 loess lines red and green. Smoothing is using LD values from 1KG EUR subset. We can use different cut-offs of LD: LD=0, LD > 0.1, LD >= 0.2, etc., usually LD = 0, i.e.: include all SNP LDs works best.
Y axis is 0 to 1, as in minimum and maximum value for LD - R^2. When wavy lines have similar shape, we can safely assume, those SNPs are the same signal. There is also an “R^2” track, the darker the lines the higher the LD, this track also helps visually see how hit SNPs overlap.