-
Notifications
You must be signed in to change notification settings - Fork 6
Output File Formats
For robustness to future development, we designed a file format similar to an edge list that must be used for the input Contact Network. The first portion of the file is a list of nodes, and the second portion is a list of edges.
- "Node" lines have three tab-delimited sections:
- NODE (i.e., just the string
NODE
) - This node's label
- Attributes of this node as comma-separated values, or a period (i.e.,
'.'
) if this node has no attributes
- NODE (i.e., just the string
- "Edge" lines have five tab-delimited sections:
- EDGE (i.e., just the string
EDGE
) - The label of the node from which this edge leaves
- The label of the node to which this edge goes
- Attributes of this edge as comma-separated values, or a period (i.e.,
'.'
) if this edge has no attributes -
d
(directed) oru
(undirected) to denote whether or not this edge is directed (i.e.,u -> v
vs.u <-> v
)
- EDGE (i.e., just the string
- Lines beginning with the pound symbol (i.e.,
'#'
) and empty lines are ignored
Below is an example of this file format. Note that <TAB>
is referring to a single tab character (i.e., '\t'
).
#NODE<TAB>label<TAB>attributes (csv or .)
#EDGE<TAB>u<TAB>v<TAB>attributes (csv or .)<TAB>(d)irected or (u)ndirected
NODE<TAB>Bill<TAB>USA,Mexico
NODE<TAB>Eric<TAB>USA
NODE<TAB>Curt<TAB>.
EDGE<TAB>Bill<TAB>Eric<TAB>.<TAB>d
EDGE<TAB>Curt<TAB>Eric<TAB>Friends<TAB>u
The file format of the transmission networks that are outputted by FAVITES are in the standard edge list format. Each line represents a single edge via three tab-delimited attributes:
- The label of the node from which this edge leaves
- The label of the node to which this edge goes
- The time at which this transmission event occurred
Self-edges (i.e., same node in columns 1 and 2) denote removal of infection, either via recovery or death. Edges with None
in column 1 denote seed infections (i.e., infections from outside the population).
Below is an example of this file format. Note that <TAB>
is referring to a single tab character (i.e., '\t'
).
None<TAB>Eric<TAB>0
Eric<TAB>Bill<TAB>1
Eric<TAB>Curt<TAB>2
Eric<TAB>Curt<TAB>3
Curt<TAB>Bill<TAB>4
Curt<TAB>Bill<TAB>5
Curt<TAB>Curt<TAB>6
When FAVITES outputs viral lineages in the phylogeny and sequence files, the identifiers are in the format viral_lineage|contact_network_node|time
, e.g. N19|67|4.118017
.
-
viral_lineage
: Each viral lineage in the simulation process has its own unique identifier for ease of identification -
contact_network_node
: The contact network individual from whichviral_lineage
was sampled -
time
: The time at whichviral_lineage
was sampled fromcontact_network_node
The file format of sample times that can be used with FAVITES are in a simple tab-delimited format. Each line represents a single sample time via two tab-delimited attributes:
- The label of the node to be sampled
- The sample time
Multiple sample times can be specified per person by simply having multiple lines for that person. Below is an example of this file format. Note that <TAB>
is referring to a single tab character (i.e., '\t'
).
Eric<TAB>1
Eric<TAB>2
Bill<TAB>3