Skip to content
Tokhir Dadaev edited this page Oct 2, 2015 · 20 revisions

Q1. Can you tell me the naming convention for the genomic features?

I don’t know what uc031tcg.1 is (but I do understand PCAT1, I think that naming convention is RefSeq, are the others also?).

A: I am trying to re-build gene symbols by collapsing transcripts into genes, when transcripts do not overlap with gene symbols, they get named as transcript names – in this case something like uc031tcg.1 – this is UCSC ID. See, udf_GeneSymbol for details. This part of the script is quite heavy and we are working on it.

Q2: Can you tell me how to interpret the wavy lines that cross the plot (for text in the figure legend)?

So far I am using this sentence: “The colored lines spanning the plotting region indicate the extent of LD for the lead SNPs with the same color designation, where the height of the line represents __ and the length of the line represents ___.” Can you send me a better sentence to describe how these lines should be interpreted if I am not on the right track here?

A: It is a loess smoothing for matching hit SNP. If there are 2 SNPs marked with red and green shape and fill, then we will have 2 matching loess lines red and green. Smoothing is using LD values from 1000G phase EUR subset. We can use different cut-offs of LD: LD=0, LD > 0.1, LD >= 0.2, etc., usually LD = 0, i.e.: include all SNP LDs works best.

Y axis is 0 to 1, as in minimum and maximum value for LD - R^2. When wavy lines have similar shape, we can safely assume, those SNPs are the same signal. There is also an “R^2” track, the darker the lines the higher the LD, this track also helps visually see how hit SNPs overlap.

Q3. Can you tell me what is included (maybe even just the data source) for the histone and DNase panels, and what the colour interpretation is for the histone panel?

A: Data from ENCODE project, see links for more info.

Clone this wiki locally