Skip to content
Tokhir Dadaev edited this page Oct 2, 2015 · 20 revisions

Q. Can you tell me the naming convention for the genomic features?

I don’t know what uc031tcg.1 is (but I do understand PCAT1, I think that naming convention is RefSeq, are the others also?).

A: I am trying to re-build gene symbols by collapsing transcripts into genes, when transcripts do not overlap with gene symbols, they get named as transcript names – in this case something like uc031tcg.1 – this is UCSC ID. See, udf_GeneSymbol for details. This part of the script is quite heavy and we are working on it.

Q: Can you tell me how to interpret the wavy lines that cross the plot (for text in the figure legend)?

So far I am using this sentence: “The colored lines spanning the plotting region indicate the extent of LD for the lead SNPs with the same color designation, where the height of the line represents __ and the length of the line represents ___.” Can you send me a better sentence to describe how these lines should be interpreted if I am not on the right track here?

A: It is a loess smoothing for matching hit SNP. If there are 2 SNPs red and green, then we will have 2 loess lines red and green. Smoothing is using LD values from 1KG EUR subset. As you might have noticed, my plots might have different smoothing lines for the same region, it is because I can choose the filter for LD: LD=0, LD > 0.1, LD >= 0.2, etc. Different cut-offs are used for each region, but usually LD = 0, i.e.: include all SNP LDs works best. Y axis is 0 to 1, as in minimum and maximum value for LD. When “wavy” lines have similar shape, we can safely assume, those SNPs are the same signal. There is also an “R^2” track, the darker the lines the higher the LD, this track also helps visually see how hit SNPs overlap.

Clone this wiki locally