forked from qBioTurin/SIR
-
Notifications
You must be signed in to change notification settings - Fork 0
/
ReadME.Rmd
782 lines (592 loc) · 42.1 KB
/
ReadME.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
---
title: "GreatMod: SIR model"
author: "Beccuti Marco, Castagno Paolo, Pernice Simone"
output:
md_document:
# pdf_document:
# fig_caption: yes
toc: true
toc_depth: 3
header-includes:
\usepackage{float}
\usepackage{url}
\usepackage{booktabs}
\usepackage{multirow}
\newcommand{\Nat}{\mathbb{N}}
\newcommand{\Real}{\mathbb{R}}
\newcommand{\cd}{\mathit{cd}}
\newcommand{\mk}{\mathbf{m}}
\newcommand{\Dom}{\mathit{Dom}}
\newcommand{\Tuple}[1]{\ensuremath{\langle 1\rangle}}
\newcommand{\In}{I}
\newcommand{\Out}{O}
\newcommand{\rate}{\varphi}
\newcommand{\Gard}{\Theta}
\newcommand{\Class}{\mathcal{C}}
\renewcommand {\Pr} {\mathbb{P}}
\newcommand {\Es} {{\mathbb E}}
\newcommand{\ABm}{\ensuremath{\mathcal{A}}}
\newcommand{\RBm}{\ensuremath{\mathcal{R}}}
\newcommand{\theLanguage}{\ensuremath{\mathcal L}}
\newcommand{\tuple}[1]{\ensuremath{\langle 1 \rangle}}
\newcommand{\bag}[1]{\ensuremath{Bag[ 1 ]}}
\newcommand{\Inputp}{^{\bullet} \mathbf{t} }
\newcommand{\Outputp}{ \mathbf{t} ^\bullet}
\newcommand{\Inputpprim}{^{\bullet} \mathbf{t'} }
\newcommand{\mm}{\ensuremath{x}}
\floatplacement{figure}{H}
\usepackage{caption}
\captionsetup[table]{skip=10pt}
bibliography: biblio.bib
vignette: >
%\VignetteIndexEntry{my-vignette}
%\VignetteEncoding{UTF-8}
%\VignetteEngine{knitr::rmarkdown}
---
```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>"
)
```
# Introduction
In this document we describe how to use the R library *epimod*. In details, *epimod* implements a new general modeling framework to study epidemiological systems, whose novelties and strengths are:
1. the use of a graphical formalism to simplify the model creation phase;
2. the automatic generation of the deterministic and stochastic process underlying the system under study;
3. the implementation of an R package providing a friendly interface to access the analysis techniques implemented in the framework;
4. a high level of portability and reproducibility granted by the containerization [@docker] of all analysis techniques implemented in the framework;
5. a well-defined schema and related infrastructure to allow users to easily integrate their own analysis workflow in the framework.
The effectiveness of this framework is showed through the wellknown and simple SIR model.
# How to start
Before starting the analysis we have to install (1) GreatSPN GUI, (2) docker, and (3) the R package **devtools** for installing *EPIMOD*.
GreatSPN GUI, the graphical editor for drawing Petri Nets formalisms, is available online ([link to install GreatSPN](http://www.di.unito.it/~amparore/mc4cslta/editor.html)), and it can be installed following the steps showed therein. Then, the user must have docker installed on its computer for exploiting the *epimod*'s docker images (for more information on the docker installation see: [link to install docker](https://docs.docker.com/engine/installation/)), and to have authorization to execute docker commands reported in the command page of function install docker.
To do this the following commands must be executed.
1. Create the docker group.
```{bash,eval=F}
$ sudo groupadd docker
```
2. Add your user to the docker group.
```{bash,eval=F}
$ sudo usermod -aG docker $USER
```
The R package *devtools* has to be installed to run *epimod*:
```{r,eval=F}
install.packages("devtools")
library(devtools)
install_github("qBioTurin/epimod", dependencies=TRUE)
```
```{r}
library(epimod)
```
Then, the following function must be used to download all the docker images used by *epimod*:
```{r,eval=F}
downloadContainers()
```
## Something to know
All the *epimod* functions print the following information:
- *Docker ID*, that is the CONTAINER ID which is executed by the function;
- *Docker exit status*, if 0 then the execution completed with success, otherwise an error log file is saved in the working directory.
# Cases of study
In this section we show the steps necessary to model, study and analyze a simple case study. To this aim, we choose to study the diffusion of a disease following the SIR dynamics. We refer to [@keeling2011modeling] for all the details.
### SIR model
The S-I-R models the diffusion of an infection in a population assuming three possible states (or compartments) in which any invidual in the population may move. Specifically, (1) *Susceptible*, individuals unexposed to the disease, (2) *Infected*, individuals currently infected by the disease, and (3) *Recovered*, individuals which were successfully recovered by the infection.
To consider the simplest case, we ignore the population demography (i.e., births and deaths of individuals are omitted), thus we consider only two possible events: the infection (passage from *Susceptible* to *Infected*), and the recovery (passage from *Infected* to *Recovered*). We are also assuming to neglect complex pattern of contacts, by considering an homogeneous mixing.
From a mathematical point of view, the system behaviors can be investigated by exploiting the deterministic approach [@Ku70] which approximates its dynamics through a system of ordinary differential equations (ODEs):
```{r , echo = FALSE, message=FALSE, fig.align='center', fig.pos='H'}
knitr::include_graphics("./Images/equation1.png")
```
where:
- $S,\ I,\ R$ are the number of susceptible, infected, and recovered individuals, respectively;
- $\beta$ is the infection rate;
- $N$ is the constant population size;
- $\gamma$ is the recovery rate, which determines the mean infectious period.
### Model generation
The first step is the model construction. Starting with the GreatSPN editor tool it is possible to draw the model using the PN formalism and its generalizations. We recall that the Petri Nets are bipartite graphs in which we have two type of nodes, places and transitions. Graphically, places are represented as circles and those are the variables of our systems. On the other hand, transitions are depicted as rectangles and are the possible events happening in the system. Variables and events (i.e., places and transitions) are connected through arcs, showing what variable(s) is (are) affected by a specific event.
For more details we refer to [@PetriNet].
Therefore, as represented in figure \ref{fig:SIR_PN}, we add one place for each variable of the system (i.e., S, I, and R represent the susceptible, infected, and recovered individuals respectively), and one transition for each possible event (i.e., *Infection* and *Recovery*). Finally, we save the PN model as a file with extension *.PNPRO* .
```{r , echo = FALSE, message=FALSE, fig.align='center', fig.cap='\\label{fig:SIR_PN} Petri Net representation of the SIR model.', fig.height = 4, fig.width = 6, fig.pos='H'}
knitr::include_graphics("./Images/SIRPNPRO.png")
```
Having constructed the model, the generation of both the stochastic (the Continuous Time Markov Chain) and deterministic (ODEs) processes underlying the model is implemented by the *model.generation()* function. This function takes as input the file generated by the graphical editor, in this case called *SIR.PNPRO*, and automatically derives the processes.
```{r, linewidth = 80,eval=FALSE}
model.generation(net_fname = "./Net/SIR.PNPRO")
```
The binary file *SIR.solver* is generated in which the derived processes and the library used for their simulation are packaged.
Notice that *model.generation()* might take as input parameter a C++ file defining the functions characterizing the behavior of general transitions [@Pernice19], namely *transitions_fname*.
For instance, if we want to define the transition *Infection* as a general transition then we have to set the transition as *General* and name the corresponding rate name as **FN:NameGeneralFN**, where in this case the *NameGeneralFN* is **InfectionFunction**. As showed in figure \ref{fig:SIR_PN_general}, where the transition type is set to *General* and the delay (i.e., the rate) to **FN:InfectionFunction**.
```{r , echo = FALSE, message=FALSE, fig.align='center', fig.cap='\\label{fig:SIR_PN_general} Petri Net representation of the SIR model, modelling the Infection transition as a general transition.', fig.height = 4, fig.width = 6, fig.pos='H'}
knitr::include_graphics("./Images/SIRPNPRO_FNen.png")
```
Then, we have to properly define a C++ function implementing the specific behavior of the transition and save it, for instance in a file named *transition.cpp*, which has to be structured as follow:
```{Rcpp, eval = FALSE}
static double Infection_rate = 1.428;
double InfectionFunction(double *Value,
map <string,int>& NumTrans,
map <string,int>& NumPlaces,
const vector<string> & NameTrans,
const struct InfTr* Trans,
const int T,
const double& time)
{
// Definition of the function exploited to calculate the rate,
// in this case for semplicity we define it throught the Mass Action law
double intensity = 1.0;
for (unsigned int k=0; k<Trans[T].InPlaces.size(); k++)
{
intensity *= pow(Value[Trans[T].InPlaces[k].Id],Trans[T].InPlaces[k].Card);
}
double rate = Infection_rate * intensity;
return(rate);
}
```
where the fixed input parameters are:
- **double \*Value**: marking of the Petri Net at time *time*;
- **map <string,int>& NumTrans**: association between the transition name and the corresponding index of the *NameTrans* vector;
- **map <string,int>& NumPlaces**: association between the place's name and the corresponding index of the *Value* vector;
- **const vector<string> & NameTrans**: transition names;
- **const struct InfTr* Trans**: array of *InfTr* structures (implemented in GreatSPN) which is indexed using the transition index. The structure *InfTr* has the following fileds: (1) *InPlaces*: the input places to a transition, which is characterized by the input place index position in the *Value* vector (*Trans[T].InPlaces[k].Id*) and the arc (linking the *k* place with the *T* transition) molteplicity (*Trans[T].InPlaces[k].Card*). (2) ....
- **const int T**: index of the firing transition;
- **const double& time** : time.
Notice that the function name has to correspond to the rate name associated with the general transition, in this case *InfectionFunction*.
Finally, the process can be derived be the *model.generation()* function as follow.
```{r, linewidth = 80, eval=FALSE}
model.generation(net_fname = "./Net/SIR_generalFN.PNPRO",
transitions_fname = "./Cpp/transition.cpp")
```
### Sensitivity analysis
The second step is represented by the sensitivity analysis, in which the deterministic process is solved several times varying the values of the unknown parameters to identify which are the sensitive ones (i.e., those that have a greater effect on the model behavior), by exploiting the Pearson Ranking Correlation Coefficients (PRCCs).
This may simplify the calibration step reducing (1) the number of variables to be estimated and (2) the search space associated with each estimated parameter.
With this purpose, the function *model.sensitivity()* calculates the PRCCs, and, given a reference dataset and a distance measure, it ranks the simulations according to the distance of each solution with respect to the reference one.
In details, the function *model.sensitivity()* takes in input
1. **solver_fname**: the \emph{.solver} file generated by the *model.generation* function, that is *SIR.solver*;
2. **n_config**: the total number of samples to be performed, for instance 200;
3. **f_time**: the final solution time, for instance 10 weeks (70 days);
4. **s_time**: the time step defining the frequency at which explicit estimates for the system values are desired, in this case it could be set to 1 day;
5. **parameters_fname**: a textual file in which the parameters to be studied are listed associated with their range of variability.
This file is defined by three mandatory columns (*which must separeted using ;*): (1) a tag representing the parameter type: *i* for the complete initial marking (or condition), *m* for the initial marking of a specific place, *c* for a single constant rate, and *g* for a rate associated with general transitions [@Pernice19] (the user must define a file name coherently with the one used in the general transitions file); (2) the name of the transition which is varying (this must correspond to name used in the PN draw in GreatSPN editor), if the complete initial marking is considered (i.e., with tag *i*) then by default the name *init* is used; (3) the function used for sampling the value of the variable considered, it could be either a R function or an user-defined function (in this case it has to be implemented into the R script passed through the *functions_fname* input parameter). Let us note that the output of this function must have size equal to the length of the varying parameter, that is 1 when tags *m*, *c* or *g* are used, and the size of the marking (number of places) when *i* is used.
The remaining columns represent the input parameters needed by the functions defined in the third column. An example is given by the file *Functions_list.csv*, where we decided to vary the rates of the *Recovery* and *Infection* transitions by using the R function which generates values following the uniform probability distribution on the interval from *min* to *max*. We set *n=1* because we must generate one value for each sample.
```{r, echo= FALSE}
Functions_list<-read.csv("Input/Functions_list.csv",header = F,skip = 1, sep=";")
colnames(Functions_list) <- c("Tag","Name","Function","Parameter1","Parameter2","Parameter3")
Functions_list
```
Another example might be *FunctionsSensitivity_list.csv*, where we decide to vary the initial marking using the following function *init_generation* defined in the R script *Functions.R* (see *functions_fname* parameter).
```{r}
Sensitivity_list<-read.csv("Input/FunctionsSensitivity_list.csv", header=FALSE,sep=";")
colnames(Sensitivity_list) <- c("Tag","Name","Function","Parameter1","Parameter2","Parameter3")
Sensitivity_list
```
6. **functions_fname**: an R file storing: 1) the user defined functions to generate instances of the parameters summarized in the *parameters_fname* file, and 2) the functions to compute the distance (or error) between the model output and the reference dataset itself (see *reference_data* and *distance_measure*), to obtain the place or a combination of places from which the PRCCs over the time have to be calculated (see *target_value*).
An example is given by *FunctionSensitivity.R*, where three functions are implemented: *init_generation*, *target*, and *mse*.
*init_generation* introduced in *FunctionsSensitivity_list.csv* file is defined in order to sample the initial number of susceptible between *min_init* and *max_init*, and fixing the number of infected and recovered to 1 and 0 respectively.
```{r, linewidth = 80 }
init_generation<-function(min_init , max_init, n)
{
S=runif(n=1,min=min_init,max=max_init)
# It returns a vector of lenght equal to 3 since the marking is
# defined by the three places: S, I, and R.
return( c(S, 1,0) )
}
```
Differently, *target* is the function to obtain the place or a combination of places from which the PRCCs over the time have to be calculated. In details, the function takes in input a *data.frame*, namely *output*, defined by a number of columns equal to the number of places plus one corresponding to the time, and number of rows equals to number of time steps defined previously. Finally, it must return the column (or a combination of columns) corresponding to the place (or combination of places) for which the PRCCs have to be calculated for each time step. In the example the PRCCs are calculated with respect to place *I* (infected individuals):
```{r, linewidth = 80 }
target<-function(output)
{
I <- output[,"I"]
return(I)
}
```
Finally, the function *mse* defines the distance measure (based on the squared error distance) between the reference data and the simulations; it takes in input only the reference data (defined in *reference_data.csv*), and the *simulation output* with the following structure:
```{r, echo= FALSE}
a<-read.csv("Results/SIR_calibration/SIR-calibration-10.trace", header=TRUE,sep=" ")
a[1:7,]
```
Thus, an example of this function can be as follows:
```{r, linewidth = 80 }
mse<-function(reference, output)
{
reference[1,] -> times_ref
reference[3,] -> infect_ref
# We will consider the same time points
Infect <- output[which(output$Time %in% times_ref),"I"]
infect_ref <- infect_ref[which( times_ref %in% output$Time)]
diff.Infect <- 1/length(times_ref)*sum(( Infect - infect_ref )^2 )
return(diff.Infect)
}
```
7. **target_value**: the function name to exploit for obtaining the PRCCs, which is implemented in *functions_fname*;
8. **reference_data**: a csv file storing the data to be compared with the simulations' result. In *reference_data.csv* we report the SIR evolution starting with 100 susceptible, one infected and zero recovered, with a recovery and infection rates equals to 0.04 and 0.004 respectively. Notice that the **reference_data**'s rows must correspond to the time serie variables (in our example: Susceptible, Infected and Recovered) , and so the columns the corresponding values at a specific time.
```{r, echo= FALSE}
a<-read.csv("Input/reference_data.csv", header=FALSE,sep=" ")
colnames(a) <- c("Time","I")
row.names(a) <- paste0("TimeStep",1:length(a$Time))
a[1:7,]
```
9. **distance_measure**: the distance function name to exploit for ranking the simulations, which is implemented in *functions_fname*;
Let us observe that: (i) *model.sensitivity* exploits also the parallel processing capabilities, and (ii) if the user is not interested on the ranking calculation then the **distance_measure** and **reference_data** are not necessary and can be omitted.
```{r, linewidth = 80,eval=FALSE}
## Simple version where only the transition rates vary.
sensitivity<-model.sensitivity(n_config = 200,
solver_fname = "Net/SIR.solver",
parameters_fname = "Input/FunctionsSensitivity_list.csv",
reference_data = "Input/reference_data.csv",
functions_fname = "Rfunction/FunctionSensitivity.R",
distance_measure = "mse" ,
target_value = "target" ,
i_time = 0,
f_time = 7*10, # weeks
s_time = 1, # days
parallel_processors = 2
)
```
```{r, linewidth = 80,echo=F,warning=F}
## Simple version where only the transition rates vary.
sensitivity<-model.sensitivity(n_config = 200,
solver_fname = "Net/SIR.solver",
parameters_fname = "Input/FunctionsSensitivity_list.csv",
reference_data = "Input/reference_data.csv",
functions_fname = "Rfunction/FunctionSensitivity.R",
distance_measure = "mse" ,
target_value = "target" ,
i_time = 0,
f_time = 7*10, # weeks
s_time = 1, # days
parallel_processors = 2
)
```
Hence, considering the SIR model we can run the *model.sensitivity* varying the *Infection* and *Recovery* transitions rates in order to characterized their effect on the number of infected individuals.
```{r, fig.height = 4, fig.width = 6, fig.align = "center", fig.pos='H', fig.cap=c("\\label{fig:I_traces} The 200 trajectories considering the I place obtained from different parameters configurations.", "\\label{fig:S_traces} The 200 trajectories considering the S place obtained from different parameters configurations.","\\label{fig:R_traces} The 200 trajectories considering the R place obtained from different parameters configuration.","\\label{fig:ScatterPlot} Scatter plot showing the squared error between the reference data and simulated number of infected. The dark blue points represent the parameters configuration with minimum error." ), fig.height = 4, fig.width = 6, fig.pos='H', echo = F}
source("./Rfunction/SensitivityPlot.R")
pl = SensitivityPlot(folder = "SIR_sensitivity/")
pl$TrajI
pl$TrajS
pl$TrajR
pl$Points
```
From the figures \ref{fig:I_traces}, \ref{fig:R_traces}, and \ref{fig:S_traces}, it is possible to observe the different trajectories obtained by solving the system of ODEs, represented by eq. \ref{eq:odesSIR}, with different parameters configurations, sampled by exploiting the function passed through **parameters_fname**. In figure \ref{fig:ScatterPlot} the distance values, obtained using the measure definition described before, are plotted varying the *Recovery* parameter (on the x-axis) and *Infection* parameter (on the y-axis). Each point is colored according to a nonlinear gradient function starting from color dark blue (i.e., lower value) and moving to color light blue (i.e., higher values). From this plot we can observe that lower squared errors are obtained when *Recovery* is around 0.025 and *Infection* around 0.002, thus we can reduce the search space associated with the two parameters around these two values.
```{r old-figure-label, echo = FALSE, message=FALSE, fig.align='center', fig.cap='\\label{fig:prcc} PRCC for the I place over time.', fig.height = 4, fig.width = 6, fig.pos='H'}
knitr::include_graphics("./Images/prcc_SIR-sensitivity.png")
```
The PRCCs values for these two parameters, depicted in figure \ref{fig:prcc}, with respect the number of infections over the entire simulated period are both meaningful, especially in the first part of the simulation, corresponding to the transient part where the parameters affect mostly the output. Differently, this effect decreases after the fifth week where all the deterministic trajectories obtained with different parameters configurations converge to the same states, see figure \ref{fig:I_traces}.
Other possible examples of how to use this function are reported hereafter:
```{r, eval=FALSE }
## Version where only the PRCC is calculated
sensitivity<-model.sensitivity(n_config = 100,
solver_fname = "Net/SIR.solver",
parameters_fname = "Input/Functions_list.csv",
functions_fname = "Rfunction/FunctionSensitivity.R",
target_value = "target" ,
parallel_processors = 1,
f_time = 7*10, # weeks
s_time = 1 # days
)
## Version where only the ranking is calculated
sensitivity<-model.sensitivity(n_config = 100,
solver_fname = "Net/SIR.solver",
parameters_fname = "Input/Functions_list.csv",
functions_fname = "Rfunction/FunctionSensitivity.R",
reference_data = "Input/reference_data.csv",
distance_measure = "mse" ,
parallel_processors = 1,
f_time = 7*10, # weeks
s_time = 1 # days
)
## Complete and more complex version where all the parameters for calculating
## the PRCC and the ranking are considered, and the initial conditions vary too.
sensitivity<-model.sensitivity(n_config = 100,
solver_fname = "Net/SIR.solver",
parameters_fname = "Input/Functions_list.csv",
reference_data = "Input/reference_data.csv",
functions_fname = "Rfunction/FunctionSensitivity.R",
distance_measure = "mse" ,
target_value = "target" ,
f_time = 7*10, # weeks
s_time = 1, # days
parallel_processors = 2
)
```
#### Sensitivity analysis with general transitions
Let us consider the example of the SIR model where the *Infection* transition is defined as general transition, with the porpoise to varying the *Infection_rate* constant of the corresponding Mass Action law. Generally, in order to define the rate of a transition it is required to provide some inputs and, hence, we need to define an R function (in the **functions_fname** file) which provides all the input parameters necessary to the C++ function.
Therefore, we have to modify the *Functions_list* csv as follow in order to associate with the general transition *Infection* the R function, *InfectionValuesGeneration*, which generates the values exploited by the respective function defined in the C++ file, called *trasition.cpp*.
```{r, echo= FALSE}
a<-read.csv("Input/Functions_list_FNgeneral.csv", header=FALSE,sep=";")
colnames(a) <- c("Tag","Name","Function","Parameter1","Parameter2","Parameter3")
a
```
Successively, we have to define the *InfectionValuesGeneration* in *Functions.R*.
```{r, eval= FALSE}
InfectionValuesGeneration<-function(min, max)
{
rate_value <- runif(n=1, min = min, max = max)
return(rate_value)
}
```
Notice that the value (or values) generated are temporarily saved in a file named as the corresponding name in the *Functions_list*, in this case *Infection*.
Hence, the file *transition.cpp* has to be modified in order to read and use the value generated from the R function *InfectionValuesGeneration*.
An example of implementation is the following, where two functions are defined: (1) *read_constant()* in order to read the generated value, which is associated with the right variable, and (2) *init_data_structures()* in order to read the file only the first time that the function is called.
```{Rcpp, eval = FALSE}
static double Flag = -1;
static double Infection_rate = 1.428;
void read_constant(string fname, double& Infection_rate)
{
ifstream f (fname);
string line;
if(f.is_open())
{
int i = 1;
while (getline(f,line))
{
switch(i)
{
case 1:
Infection_rate = stod(line);
//cout << "p" << i << ": " << line << "\t" << p1 << endl;
break;
}
++i;
}
f.close();
}
else
{
std::cerr<<"\nUnable to open " << fname <<
": file do not exists\": file do not exists\n";
exit(EXIT_FAILURE);
}
}
void init_data_structures()
{
read_constant("./Infection", Infection_rate);
Flag = 1;
}
double InfectionFunction(double *Value,
map <string,int>& NumTrans,
map <string,int>& NumPlaces,
const vector<string> & NameTrans,
const struct InfTr* Trans,
const int T,
const double& time)
{
// Definition of the function exploited to calculate the rate,
// in this case for semplicity we define it throught the Mass Action law
if( Flag == -1) init_data_structures();
double intensity = 1.0;
for (unsigned int k=0; k<Trans[T].InPlaces.size(); k++)
{
intensity *= pow(Value[Trans[T].InPlaces[k].Id],Trans[T].InPlaces[k].Card);
}
double rate = Infection_rate * intensity;
return(rate);
}
```
### Calibration analysis
The aim of this phase is to optimize the fit of the simulated behavior to the reference data by adjusting the parameters associated with both Recovery and Infection transitions. This step is performed by the function *model.calibration()*, characterized by the solution of an optimization problem in which the distance between the simulated data and the reference data is minimized, according to the definition of distance provided by the user (**distance_fname**).
```{r, linewidth = 80,eval=FALSE }
model.calibration(parameters_fname = "Input/Functions_list_Calibration.csv",
functions_fname = "Rfunction/FunctionCalibration.R",
solver_fname = "Net/SIR.solver",
reference_data = "Input/reference_data.csv",
distance_measure = "mse" ,
f_time = 7*10, # weeks
s_time = 1, # days
# Vectors to control the optimization
ini_v = c(0.02,0.001),
lb_v = c(0.01, 0.0001),
ub_v = c(0.05, 0.002),
max.time = 2
)
```
```{r, linewidth = 80,echo=F,warning=F }
model.calibration(parameters_fname = "Input/Functions_list_Calibration.csv",
functions_fname = "Rfunction/FunctionCalibration.R",
solver_fname = "Net/SIR.solver",
reference_data = "Input/reference_data.csv",
distance_measure = "mse" ,
f_time = 7*10, # weeks
s_time = 1, # days
# Vectors to control the optimization
ini_v = c(0.02,0.001),
lb_v = c(0.01, 0.0001),
ub_v = c(0.05, 0.002),
max.time = 2
)
```
1. **solver_fname**: the \emph{.solver} file generated by the *model.generation* function, that is *SIR.solver*;
2. **parameters_fname**: a textual file in which the parameters to be studied are listed associated with their range of variability.
This file is defined by three mandatory columns (*which must separeted using ;*): (1) a tag representing the parameter type: *i* for the complete initial marking (or condition), *m* for the initial marking of a specific place, *c* for a single constant rate, and *g* for a rate associated with general transitions [@Pernice19] (the user must define a file name coherently with the one used in the general transitions file); (2) the name of the transition which is varying (this must correspond to name used in the PN draw in GreatSPN editor), if the complete initial marking is considered (i.e., with tag *i*) then by default the name *init* is used; (3) the function used for sampling the value of the variable considered, it could be either a R function or an user-defined function (in this case it has to be implemented into the R script passed through the *functions_fname* input parameter). Let us note that the output of this function must have size equal to the length of the varying parameter, that is 1 when tags *m*, *c* or *g* are used, and the size of the marking (number of places) when *i* is used. The remaining columns represent the input parameters needed by the functions defined in the third column. An example is given by the file *Functions_list_Calibration.csv*:
```{r, echo= FALSE,out.width="50%"}
a<-read.csv("Input/Functions_list_Calibration.csv", header=FALSE,sep=";")
colnames(a) <- c("Tag","Name","Function or fixed parameter")
a
```
where the rates of the *Recovery* and *Infection* transitions can be calibrated by using the R functions stored in the R script *functions_fname*;
3. **functions_fname**: an R file storing: 1) the user defined functions to generate instances of the parameters summarized in the *parameters_fname* file, and 2) the function to compute the distance (or error) between the model output and the reference dataset itself. An example is given by *FunctionCalibration.R*, where three functions are implemented: *fun.recovery*, *fun.infection*, and *mse*. The first two are introduced in *Functions_list_Calibration.csv* file, and they are defined in order to return the value (or a linear transformation) of the vector of the unknown parameters generated from the optimization algorithm, namely **optim_v**, whose size is equal to number of parameters in *parameters_fname*. Let us note that the output of these functions must return a value for each input parameter. For instance, to calibrate the transition rates associated with *Recovery* and *Infection*, the functions recovery and infection have to be defined, returning just the corresponding value from the vector **optim_v**, where **optim_v[1]** = *“Recovery rate”*, **optim_v[2]** = *“Infection rate”*, since we do not want to change the vector generated from the optimization algorithm. The order of values in **optim_v** is given by the order of the parameters in *parameters_fname*. Finally, the function *mse* defines the distance measure (based on the squared error distance) between the reference data and the simulations; it takes in input only the reference data (defined in *reference_data.csv*), and the *simulation output* with the following structure:
```{r, echo= FALSE}
a<-read.csv("Results/SIR_calibration/SIR-calibration-10.trace", header=TRUE,sep=" ")
a[1:7,]
```
Thus, these three functions are defined as follows:
```{r, linewidth = 80 }
fun.recovery<-function(optim_v)
{
return(optim_v[1])
}
fun.infection<-function(optim_v)
{
return(optim_v[2])
}
mse<-function(reference, output)
{
reference[1,] -> times_ref
reference[3,] -> infect_ref
# We will consider the same time points
Infect <- output[which(output$Time %in% times_ref),"I"]
infect_ref <- infect_ref[which( times_ref %in% output$Time)]
diff.Infect <- 1/length(times_ref)*sum(( Infect - infect_ref )^2 )
return(diff.Infect)
}
```
4. **reference_data**: a csv file storing the data to be compared with the simulations' result. In *reference_data.csv* we report the SIR evolution starting with 100 susceptible, one infected and zero recovered, with a recovery and infection rates equals to 0.04 and 0.004 respectively. Notice that the **reference_data**'s rows must correspond to the time serie variables (in our example: Susceptible, Infected and Recovered) , and so the columns the corresponding values at a specific time.
```{r, echo= FALSE}
a<-read.csv("Input/reference_data.csv", header=FALSE,sep=" ")
colnames(a) <- c("Time","I")
row.names(a) <- paste0("TimeStep",1:length(a$Time))
a[1:7,]
```
5. **distance_measure**: the distance function name to exploit for ranking the simulations, which is implemented in *functions_fname*;
6. **f_time**: the final solution time, for instance 10 weeks (70 days);
7. **s_time**: the time step defining the frequency at which explicit estimates for the system values are desired, in this case it could be set to 1 day;
8. **ini_v**: Initial values for the parameters to be optimized.
9. **lb_v, ub_v**: Vectors with length equal to the number of parameters which are varying. Lower/Upper bounds for each parameter.
10. **max_time**: maximum running time.
```{r,eval=F}
# How to generate the plots
source("Rfunction/CalibrationPlot.R")
plots <- calibration.plot(solverName_path = "./SIR_calibration/SIR-calibration-1.trace",
reference_path ="reference_data.csv")
plots$plS
plots$plI
plots$plR
```
```{r, fig.height = 4, fig.width = 6, fig.align = "center", fig.pos='H', fig.cap=c("\\label{fig:S_traces_cal}Trajectories considering the S place.","\\label{fig:I_traces_cal} Trajectories considering the I place.", "\\label{fig:R_traces_cal} Trajectories considering the R place." ), echo = F}
source("Rfunction/CalibrationPlot.R")
plots <- calibration.plot(solverName_path = "./SIR_calibration/SIR-calibration-1.trace",
reference_path ="Input/reference_data.csv",
print=F)
plots$plS
plots$plI
plots$plR
```
In figures \ref{fig:I_traces_cal},\ref{fig:S_traces_cal} and \ref{fig:R_traces_cal} the trajectories with color depending on the squared error w.r.t. reference trend are plotted.
In this case, fixing a maximum number of objective function calls, we obtain the following optimal value for the two parameters:
```{r, echo= FALSE}
load("./SIR_calibration/SIR-calibration_optim.RData")
ret$par
```
#### Calibration analysis with general transitions
Starting from the changes made in the Sensitivity Analysis phase, we have to add the possibility to save the value passed by the optmization algorithm instead of the value generated by the function defined by the user. By default in the calibration phase, the vector *x* of the unknown parameters, in this case the *Recovery* and *Infection* rates, is passed to the R functions defined in *Functions.R*. Therefore, we have to modify the *InfectionValuesGeneration* in order to return the value contained in *x*, i.e. the second one ( the order is given by the order of the parameters in **parameters_fname**). Notice that in the Sensitivity Analisys phase, the vector *x* is not passed in input so we can generalized the *InfectionValuesGeneration* as follow in order to use it in both the analysis phases.
```{r, eval= FALSE}
InfectionValuesGeneration<-function(optim_v= NULL)
{
rate_value <- optim_v[2]
return(rate_value)
}
```
### Model Analysis
Finally, a possible step is the model analysis, where the corresponding function *model.analysis()* executes and tests the behavior of the developed model. Furthermore, by changing the input parameters, it is possible to perform a *what-if* analysis or forecasting the evolution of the diffusion process.
```{r, linewidth=80,eval=F }
model.analysis(solver_fname = "SIR.solver",
i_time = 1,
f_time = 100, # days
s_time = 1,
parameters_fname = "Functions_list_ModelAnalysis.csv"
)
```
1. **solver_fname**: the \emph{.solver} file generated by the *model.generation* function, that is *SIR.solver*;
2. **i_time**: the initial solution time, for instance day 1;
2. **f_time**: the final solution time, for instance 100 days;
3. **s_time**: the time step defining the frequency at which explicit estimates for the system values are desired, in this case it could be set to 1 day.
4. **parameters_fname**: a textual file in which the parameters to be studied are listed associated with their range of variability.
This file is defined by three mandatory columns (*which must separeted using ;*): (1) a tag representing the parameter type: *i* for the complete initial marking (or condition), *m* for the initial marking of a specific place, *c* for a single constant rate, and *g* for a rate associated with general transitions [@Pernice19] (the user must define a file name coherently with the one used in the general transitions file); (2) the name of the transition which is varying (this must correspond to name used in the PN draw in GreatSPN editor), if the complete initial marking is considered (i.e., with tag *i*) then by default the name *init* is used; (3) the function used for sampling the value of the variable considered, it could be either a R function or an user-defined function (in this case it has to be implemented into the R script passed through the *functions_fname* input parameter). Let us note that the output of this function must have size equal to the length of the varying parameter, that is 1 when tags *m*, *c* or *g* are used, and the size of the marking (number of places) when *i* is used.
The remaining columns represent the input parameters needed by the functions defined in the third column.
```{r, out.width="50%"}
Functions_list_ModelAnalysis<-read.csv("Input/Functions_list_ModelAnalysis.csv", header=FALSE,sep=";")
colnames(Functions_list_ModelAnalysis) <- c("Tag","Name","Function or fixed parameter")
Functions_list_ModelAnalysis
```
```{r, linewidth=80,echo=F,message=F, results='hide'}
model.analysis(solver_fname = "Net/SIR.solver",
parameters_fname = "Input/Functions_list_ModelAnalysis.csv",
i_time = 1,
f_time = 100, # days
s_time = 1
)
```
```{r, eval=F}
## How to generate the plots
source("Rfunction/ModelAnalysisPlot.R")
AnalysisPlot = ModelAnalysisPlot(trace_path = "./SIR_analysis/SIR-analysys-1.trace",
Stoch = F,
print=F)
AnalysisPlot$plAll
```
```{r, fig.height = 4, fig.width = 6, fig.align = "center", fig.pos='H', fig.cap=c(" Deterministic Trajectory considering all places" ), echo = F,message=F, results='hide'}
source("Rfunction/ModelAnalysisPlot.R")
AnalysisPlot = ModelAnalysisPlot(trace_path = "./SIR_analysis/SIR-analysis-1.trace",
Stoch = F,print=F)
AnalysisPlot$plAll
```
It is possible also to simulate the stochastic behavior of the system exploiting the Gillespie algorithm, namely **SSA**, which is an exact stochastic method widely used to simulate chemical systems whose behavior can be described by the Master equations.
```{r, linewidth=80,eval=F }
model.analysis(solver_fname = "SIR.solver",
parameters_fname = "Functions_list_ModelAnalysis.csv",
solver_type = "SSA",
n_run = 500,
parallel_processors = 2,
i_time = 1,
f_time = 100, # days
s_time = 1
)
```
1. **solver_type**: type of solver to use;
2. **n_run**: number of stochastic simulations to run.
```{r, linewidth=80,echo=F,message=F, results='hide'}
model.analysis(solver_fname = "Net/SIR.solver",
parameters_fname = "Input/Functions_list_ModelAnalysis.csv",
solver_type = "SSA",
n_run = 500,
parallel_processors = 2,
f_time = 100, # days
s_time = 1
)
```
1. **solver_type**: type of solver to use;
2. **n_run**: number of stochastic simulations to run.
```{r, eval=F}
## How to generate the plots
source("Rfunction/ModelAnalysisPlot.R")
AnalysisPlot = ModelAnalysisPlot(trace_path = "./SIR_analysis/SIR-analysis-1.trace",
Stoch = T)
AnalysisPlot$plAll
AnalysisPlot$plAllMean
```
```{r, fig.height = 4, fig.width = 6, fig.align = "center", fig.pos='H', fig.cap=c(" Stochastic Trajectories considering the S place."," Stochastic Trajectories considering the I place." ), echo = F,message=F, results='hide'}
source("Rfunction/ModelAnalysisPlot.R")
AnalysisPlot = ModelAnalysisPlot(trace_path = "./SIR_analysis/SIR-analysis-1.trace",
Stoch = T,
print=F)
AnalysisPlot$plAll
AnalysisPlot$plAllMean
```
#### What-if analysis
It is possible to change the place marking directly from the *parameters_fname*:
```{r, out.width="50%"}
Functions_list_ModelAnalysis<-read.csv("Input/Functions_list_ModelAnalysis2.csv", header=FALSE,sep=";")
colnames(Functions_list_ModelAnalysis) <- c("Tag","Name","Function or fixed parameter")
Functions_list_ModelAnalysis
```
# References
<div id="refs"></div>
```{r removingFolders, echo= FALSE}
system("rm -r -d SIR_*")
```