Skip to content

2. Input Format

Giulio Caravagna edited this page Oct 31, 2018 · 7 revisions

Input dataframe

Data from a cohort of patients can be represented as a dataframe with 7 fields, where every row represents one genomic alterations annotated for the analysis.

Field Type Description
Misc string Some customary annotation
patientID string Patient ID, without spaces
variantID string Alteration ID, without spaces or dash/ hyphen (-) symbols
cluster string Group ID (eg., a clone, with CCF data)
is.driver logical TRUE if the alteration is annotated as driver
is.clonal logical TRUE if the group is clonal (truncal); there should only one such clonal group
CCF string A parsable format for storage of input CCFs or binary data

The input dataframe has the same structure for both CCF and binary data.

Supported alterations

Any SNV, larger chromosomal re-arrangment or other covariate that can be encoded in CCF or binary format. The variantID field of driver ones (driver=TRUE) will be matched to detect occurrences in multiple patients, and correlate trajectories; variantID must be unique, and ca appear only once in a patient.

The id can be you whatever you find more suitable for your analysis, for instance:

  • a Hugo_Symbol (BRAF)
  • a name for a well-known SNV (BRAF_600E)
  • a reference to some cytoband (3q26.32)
  • your custom annotation (MyFavoritePathway).

Alterations are also associated to groups (via cluster), which constitute the nodes of the computed trees. A group can have 0 drivers annotated, but every patient should have at least one driver to be analyzed with REVOLVER.

See also Guidelines if you are interested in modelling parallel evolution.

Input values for CCF or binary variables

Field CCF represents either:

  • real-valued CCF values (in [0, 1]),
  • or input binary values (either 0 or 1).

Since patients can have different number of samples/ regions associated, CCF is a general string. The format that we propose is simple, and easy to parse it:

R1:0.86;R2:1;R3:1

would mean CCF value 0.86 in region R1, 1 in R2 etc. In the same format one can encode binary data as R1:1;R2:1;R3:1. If you use this format, a possible parsing function is

CCF_parser = function(x)
{
  tk = strsplit(x, ';')[[1]]
  tk = unlist(strsplit(tk, ':'))
  
  samples = tk[seq(1, length(tk), 2)]
  
  values = tk[seq(2, length(tk), 2)]
  names(values) = samples
  
  return(values)  
}

This function is available as revolver:::CCF_parser.

Example data

An example binary dataset is the following (we subset it to only driver alterations).

> head(dataset[dataset$is.driver, ])
           Misc patientID  variantID cluster is.driver is.clonal                                                      CCF
	UNKNOWN     EV001     ABHD11       1      TRUE     FALSE      R1:1;R2:1;R3:1;R5:1;R8:1;R9:1;R4:0;M1:0;M2a:0;M2b:0
	UNKNOWN     EV001   ADAMTS10       2      TRUE     FALSE      R1:0;R2:0;R3:0;R5:0;R8:0;R9:1;R4:0;M1:0;M2a:0;M2b:0
	UNKNOWN     EV001   ADAMTSL4       3      TRUE     FALSE      R1:0;R2:0;R3:0;R5:0;R8:0;R9:0;R4:0;M1:1;M2a:1;M2b:0
	UNKNOWN     EV001      AKAP8       4      TRUE     FALSE      R1:1;R2:1;R3:0;R5:1;R8:1;R9:1;R4:0;M1:1;M2a:1;M2b:1
	UNKNOWN     EV001      AKAP9       5      TRUE     FALSE      R1:0;R2:0;R3:0;R5:0;R8:0;R9:0;R4:0;M1:1;M2a:1;M2b:1
	UNKNOWN     EV001     ALKBH8       6      TRUE     FALSE      R1:0;R2:0;R3:0;R5:0;R8:0;R9:0;R4:1;M1:1;M2a:1;M2b:1
	UNKNOWN     EV001   ALS2CR12       7      TRUE     FALSE      R1:1;R2:1;R3:0;R5:1;R8:1;R9:1;R4:1;M1:1;M2a:1;M2b:0
	UNKNOWN     EV001    ANKRD26       2      TRUE     FALSE      R1:0;R2:0;R3:0;R5:0;R8:0;R9:1;R4:0;M1:0;M2a:0;M2b:0
	UNKNOWN     EV001       ANO5       8      TRUE      TRUE      R1:1;R2:1;R3:1;R5:1;R8:1;R9:1;R4:1;M1:1;M2a:1;M2b:1
	UNKNOWN     EV001      ATXN1       8      TRUE      TRUE      R1:1;R2:1;R3:1;R5:1;R8:1;R9:1;R4:1;M1:1;M2a:1;M2b:1
	UNKNOWN     EV001      BCAS2       8      TRUE      TRUE      R1:1;R2:1;R3:1;R5:1;R8:1;R9:1;R4:1;M1:1;M2a:1;M2b:1
	UNKNOWN     EV001     BCL11A       9      TRUE     FALSE      R1:0;R2:0;R3:0;R5:0;R8:0;R9:0;R4:1;M1:0;M2a:0;M2b:0