-
Notifications
You must be signed in to change notification settings - Fork 16
/
prediction-warning-systems.qmd
664 lines (504 loc) · 38.5 KB
/
prediction-warning-systems.qmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
---
title: "Warning systems"
editor: visual
editor_options:
chunk_output_type: inline
format:
html:
mermaid:
theme: forest
bibliography: references.bib
---
## Introduction
One practical application of plant disease epidemiology is to predict disease occurrences to guide timely management interventions, reducing crop damage and rationalizing pesticide use. Since the early to mid 1900s, warning systems (synonyms: disease forecaster, predictor) have advanced considerably. More comprehensively, these systems have evolved to a Decision Support System (DSS), when they integrate expert input, models, and databases for more nuanced management recommendations, transcending simple prediction to encompass various goals within computerized frameworks. In fact, technological progress in recent decades has boosted the development and automation of DSSs, now widely available in public and private sectors, providing direct, sophisticated guidance to crop advisors and growers (@fig-dss).
![Core elements of a decision support system that provides risk information for plant disease management](imgs/dss.png){#fig-dss fig-align="center"}
In its core, a DSS targeting risk prediction for plant disease management are based on a disease **model**, or a simplified, often mathematical, representation of a system (here the pathosystem) used for making predictions or suggesting management decisions based on the risk information they provide. The models range from basic rules (e.g. if-then) and static thresholds to sophisticated simulation models covering entire disease epidemics. Historically, these models collect weather information from on site automatic weather stations, but can be fed with data from remote sensing (e.g. satellite) or data reanalysis sources.
## Risk assessment and decision framework
There are basically two types of decisions related to plant disease management: tactical and strategical and these can be related to the distinct time frames of information (i.e., historical data, pre-season, growing season, and future seasons). A risk assessment and decision framework, with associated terminology, can be proposed based on such relations (@fig-timeline) and its components are explained next.
![Framework and associated terminology for plant disease risk assessment](imgs/timeline.png){#fig-timeline fig-align="center"}
**Historical** data and prediction refers to those obtained from or simulated for previous years or seasons. This can include past observed weather patterns, observed or simulated disease outbreaks. The goal is to learn about the "normal" patterns associated with disease prevalence and severity. In the p**re-season** period, predictions are made before the actual planting or growing season starts. This could be based on early predictions of weather patterns for the season. During the **growing season,** real-time data is used to provide short-term predictions. This is the period when crops are in the fields and are actively monitored. Finally, for **future seasons,** projections (e.g. climate change scenarios) can be made for the subsequent years or decades.
**Strategical** decisions are those with a long-term impact and are typically based on historical data and projections for future seasons. Examples might include choosing where to plant the crops in the next years, which cultivar to plant, and making infrastructure investments. **Tactical** are short-term decisions typically based on real-time data or short-term predictions, especially those relevant to the current growing season. A chief example include the timing of fungicide applications.
As to risk terminology, **risk analysis** is a comprehensive assessment of potential disease risks, considering both historical data and projections of the future. The goal is to understand and mitigate potential threats to agricultural production. **Risk prediction** can be broken down into a) **outlook:** Broad predictions or estimations about potential risks, often based on pre-season data. Example: effects that El Niño Southern Oscillation (ENSO) may have on disease patterns; b) **forecasting:** short-term predictions, typically for the growing season, such as critical weather events; and c) **warning:** Immediate alerts about imminent risks, like an upcoming risk of plant infection, crucial for tactical decisions.
In this chapter we will focus on warning systems that make use of disease monitoring or seasonal weather to provide risk information for tactical decisions.
## When is a warning system needed?
The figure and the box below provide some information on their utility. In the figure, risk analysis is defined as an approach for pathogens/diseases that are not present in a target region and for which modeling can be used for risk estimation if the disease is highly damaging where it occurs. Warning systems can be used if the disease is more erratic but still damaging when occuring. If the epidemics are too frequent and no effective and economic control is available (for example, some nematodes and virus), plant host resistance is the way to go. If there are effective and economic control measures (e.g. fungicide sprays) during the season, farmers should follows scheduled application of treatments.
```{mermaid}
%%| label: fig-diagram2
%%| fig-cap: "Decision chart for the need of risk analysis, warning systems, schedule treatment or host resistance in disease management"
%%| fig-width: 4
flowchart
Start((Start))
A{Disease<BR>present?}
B{Highly<BR>damaging?}
C[Risk Analysis]
D{Damaging?}
E{Frequent<BR>epidemics?}
F[Scheduled treatments]
G[Warning System]
H{Economic<BR>control?}
I[Host Resistance]
Start --> A
A -->|N| B
B -->|N| Start
D -->|N| Start
A -->|Y| D
D -->|Y| E
E -->|Y| H
B -->|Y| C
E -->|N| G
H -->|N| I
H -->|Y| F
```
::: callout-warning
## When is a warning system useful?
For operational and economical use, warning systems must fulfill four criteria to be practical. A more comprehensive discussion on this topic is presented in [@Campbell1990]:
1. The targeted disease must be economically damaging to crop yield or quality. Minor diseases with negligible effects on yield or quality are unlikely to gain attention from growers and advisors.
2. The diseases should exhibit variability across seasons in terms of onset, epidemic growth rate, severity, or another aspect, creating uncertainty in decision-making. Diseases with predictable patterns provide minimal information and little management advantage, reducing the relevance of warning systems.
3. Users should be capable of acting on the system's alerts, necessitating available and effective control measures and sufficient response time to prevent crop damage. Systems are unhelpful if practitioners can't adapt their strategies promptly.
4. The system must encompass comprehensive knowledge about the disease, synthesizing accurate risk estimates. Understanding the specific interactions between host, pathogen, and environment is crucial for the system's effectiveness and relevance.
:::
## What types of systems are there?
These systems vary significantly in structure and design, reflecting the multitude of plant diseases, objectives, available data, control strategies, developer preferences, and operational infrastructures. Usually, warning system are based on weather inputs, but they might leverage other inputs like host, pathogen, and economic factors, catering to the complexities of disease prediction.
Disease warning systems can include static or dynamic disease thresholds, direct detection of inoculum, simple rules of thumb (e.g. if-then) based on weather, infection risk during defined periods (risk models), or complex simulation models that estimate all phases of an epidemic. Let's see some examples of these systems together with a possible implementation in R.
### Disease thresholds
Damaging thresholds, integral to integrated pest management in entomology, can serve as a basic disease warning system. They involve economic injury levels, denoting pest abundance that equates control costs with incurred losses, and economic or action thresholds, indicating when action is necessary to avoid reaching injury levels [@pedigo1986]. These concepts, while straightforward, can be quite complex in practical scenarios.
Though less prevalent than in arthropod management, thresholds guide actions like fungicide application in plant diseases, especially those directly impacting yield through photosynthetic area reduction [@leiminger2012; @nutter1993terms]. However, their application is challenging for rapid, recurrent diseases affecting high-value crops, requiring prompt intervention even at minimal disease levels. With potato late blight, for instance, the first fungicide application may need to be applied by the time disease severity reaches as low as 0.1% of the foliage. Hence, disease monitoring for detection and quantification is vital for this system. In reality, for some rapid spreading and highly damaging diseases one cannot wait to "see" the disease to start protecting the crops, for which yield protection is best when applications are made preventatively.
However, the concept of **economic damage threshold** (EDT) may be used as a criteria do indicate when to start with fungicide sprays. By definition, EDT is the amount of disease intensity (e.g. severity when dealing with foliar diseases) that corresponds to an economic loss that equates the control cost to combat the disease. A formula for the EDT was proposed by Mumford and Norton [@mumford1984] and further modified by Reis [@reis2002modelo] for use in foliar fungal diseases, as described in Equation 3:
$EDT = \frac{F_C}{C_P . D_C} . C_e$ ,
where EDT is the disease intensity, $F_C$ is the fungicide cost (USD/ha), $C_P$ is the crop selling price (USD/ton), $D_C$ is the damage coefficient (adjusted to potential yield) and $C_e$ is the control efficacy of the fungicide (proportion of disease reduction relative to non-treated). In practice, sprays should be applied prior to reaching the EDT, which gives rise to the ADT (action damage threshold).
In a study on northern corn leaf blight in Argentina, the following values were used to calculate the EDT [@derossi2022]. Note that the authors adjusted the Dc to potential yield by multiplying by the potential yield value (8.5 ton.ha) in metric tons, since the Dc is normalized to one metric ton. More about Dc in the dedicated chapter on [yield loss](yieldloss-regression-models.html#damage-coefficients). The action damage threshold (ADT) was defined in that study as 20% reduction of the EDT.
```{r}
calculate_EDT <- function(Fc, Cp, Dc, Ec) {
EDT <- (Fc / (Cp * Dc)) * Ce
return(EDT)
}
Fc <- 30 # fixed cost of control is 30 USD/ha.
Cp <- 112 # fixed crop price is 112 USD/ton.
Dc <- 0.1712 # for potential yield of 8.5 t/ha so 8.5 x 0.02015 = 0.1712.
Ce <- 0.70 # control efficacy of fungicide is 70%.
EDT_value <- calculate_EDT(Fc, Cp, Dc, Ce)
print(EDT_value)
ADT = EDT_value * 0.80
ADT
```
As we can see, the EDT is variable depending on these four parameters. But how does the EDT varies according to variations on each of the parameters individually? Let's perform some simulations with vary values of the parameter.
```{r}
#| warning: false
#| message: false
library(tidyverse)
library(r4pde)
# vary Crop price
Cp_values = seq(100, 500, by = 5)
EDT_values <- vector("numeric", length(Cp_values))
for(i in seq_along(Cp_values)) {
Cp = Cp_values[i]
EDT_values[i] <- calculate_EDT(Fc, Cp, Dc, Ce)
}
p_cp <- data.frame(Cp_values, EDT_values) |>
ggplot(aes(Cp_values, EDT_values))+
geom_line()+
labs(x = "Crop price (US$/ton)",
y = "EDT")+
theme_r4pde(font_size =14)
# Vary fungicide price
Fc_values = seq(10, 100, by = 5)
EDT_values <- vector("numeric", length(Fc_values))
for(i in seq_along(Fc_values)) {
Fc = Fc_values[i]
EDT_values[i] <- calculate_EDT(Fc, Cp, Dc, Ce)
}
p_fc <- data.frame(Fc_values, EDT_values) |>
ggplot(aes(Fc_values, EDT_values))+
geom_line()+
labs(x = "Control cost (US$/ha)",
y = "EDT")+
theme_r4pde(font_size =14)
# Vary damage coefficient
Dc_values = seq(0.001, 0.01, by = 0.0001)
EDT_values <- vector("numeric", length(Dc_values))
for(i in seq_along(Dc_values)) {
Dc = Dc_values[i]
EDT_values[i] <- calculate_EDT(Fc, Cp, Dc, Ce)
}
p_dc <- data.frame(Dc_values, EDT_values) |>
ggplot(aes(Dc_values, EDT_values))+
geom_line()+
labs(x = "Damage coefficient",
y = "EDT")+
theme_r4pde(font_size =14)
# Vary Control efficacy
Ce_values = seq(0.2, 1, by = 0.01)
EDT_values <- vector("numeric", length(Ce_values))
for(i in seq_along(Ce_values)) {
Ce = Ce_values[i]
EDT_values[i] <- calculate_EDT(Fc, Cp, Dc, Ce)
}
p_ce <- data.frame(Ce_values, EDT_values) |>
ggplot(aes(Ce_values, EDT_values))+
geom_line()+
labs(x = "Control efficacy (%)",
y = "EDT")+
theme_r4pde(font_size =14)
```
```{r}
#| warning: false
#| label: fig-edt
#| fig-cap: "Variation of the economic damage threshold (EDT) according to fluctuations on each parameter in the formula for its calculation"
library(patchwork)
(p_cp | p_fc) /
(p_dc | p_ce)
```
#### Bonus app: EDT calculator
The scenario for EDT calculation is intricate, influenced by fluctuations in several parameters. To streamline these computations, a Shiny application called [EDT Calculator](https://epidemiologiaufv.shinyapps.io/EDTCalculator/) was developed, enabling users to specify each parameter and assess EDT (and respective 95% CI) under uncertainties in variables such as crop price and fungicide costs.
[![Screenshot of the EDT calculator, a Shiny app for interactive calculation of economic damage threshold (EDT)](imgs/EDT_calculator.png){#fig-EDT fig-align="center"}](https://epidemiologiaufv.shinyapps.io/EDTCalculator/)
### Monitoring season-long weather
Disease warning systems frequently predict conditions conducive to infection of the plant by the pathogen, with wetness and temperature being key variables for many foliar diseases [@bourke1970]. BLITECAST, the first computerized system [@krause1975blitecast] (which combined the Wallin and Hyre systems) provided the means of performing necessary calculations accurately and quickly and issuing recommendations to growers, is an example of successful automated warning system [@krause1975]. While initial inoculum is often undetectable, the presence of inoculum of the pathogen is assumed in many weather-based warning systems. Predictions of an outbreak are possible by tracking environmental conditions favorable for disease development.
Weather-based disease warning systems, like FAST (Forecasting *Alternaria solani* on tomatoes) and Wallin [@madden1978; @wallin1962], continuously monitor moisture and temperature for various crop diseases. These systems calculate weather favorability from environmental data, predicting infection and disease severity. They serve to guide growers on optimal spraying schedules, initiating treatments or determining application intervals based on accumulated severity values over time.
#### Wallin model
J.R. Wallin developed a model in the mid-20th century focusing on forecasting potato late blight, detailed across several publications [@wallin1962]. The model tracks hourly relative humidity and temperature, emphasizing periods with relative humidity of 90% or more. It calculates the number of high-humidity hours and the corresponding average temperature during the wet period. By accumulating 'disease severity values' (DSV) from plant emergence throughout the season, based on humidity and temperature measures, the model predicts the initial onset and subsequent spread of potato late blight. The table below summarizes the way the DSVs are obtained based on combinations of hours of relative humdity \> 90% and the air temperature within the wet period.
**Table.** Relationship of temperature and relative humidity (RH) periods as used in the Wallin late blight forecasting system to predict disease severity values (0 to 4).
| | | | | | |
|---------------------------|-------|----------|-------|-------|------|
| | Daily | severity | value | | |
| Average Temperature Range | 0 | 1 | 2 | 3 | 4 |
| 7.2 - 11.6 C | 15 | 16-18 | 19-21 | 22-24 | \>25 |
| 11.7 - 15.0 C | 12 | 13-15 | 16-18 | 19-21 | \>22 |
| 15.1 - 26.6 C | 9 | 10-12 | 13-15 | 16-18 | \>19 |
Let's calculate DVS in R based on Wallin's system. But first we need to download hourly weather data from NASA Power project using {nasapower} R package for the locality of Viçosa, MG, Brazil during the month of March 2022.
```{r}
#| warning: false
#| message: false
library(nasapower)
weather <- get_power(
community = "ag",
lonlat = c(-42.88, -20.7561),
pars = c("RH2M", "T2M"),
dates = c("2022-03-02", "2022-03-31"),
temporal_api = "hourly"
)
head(weather)
```
We now need to obtain the wet period (let's call it leaf wetness, or LW) based on hours of relative humidity \>90% and then the average temperature during the LW period for each day. We can obtain these by grouping the variables by year, month and day and use `mutate()` and `summarise()`.
```{r}
#| warning: false
#| message: false
library(tidyverse)
weather2 <- weather |>
group_by(YEAR, MO, DY) |>
mutate(LW = case_when(RH2M > 90 ~ 1,
TRUE ~ 0)) |>
filter(LW > 0) |>
summarise(Air_LWD = mean(T2M, na.rm = TRUE),
LWD = n())
```
Now we are ready to calculate the daily DSV based on Wallin's rules on the table and inspect the first 6 rows of the new table called `df_wallin`.
```{r}
df_wallin <- weather2 |>
mutate(
DSV = case_when(
# Temperature Range: 7.2 - 11.6 C
Air_LWD >= 7.2 & Air_LWD <= 11.7 & LWD <= 15 ~ 0,
Air_LWD >= 7.2 & Air_LWD <= 11.7 & LWD > 15 & LWD <= 18 ~ 1,
Air_LWD >= 7.2 & Air_LWD <= 11.7 & LWD > 18 & LWD <= 21 ~ 2,
Air_LWD >= 7.2 & Air_LWD <= 11.7 & LWD > 21 & LWD <= 24 ~ 3,
Air_LWD >= 7.2 & Air_LWD <= 11.7 & LWD > 24 ~ 4,
# Temperature Range: 11.7 - 15.0 C
Air_LWD > 11.7 & Air_LWD <= 15.1 & LWD <= 12 ~ 0,
Air_LWD > 11.7 & Air_LWD <= 15.1 & LWD > 12 & LWD <= 15 ~ 1,
Air_LWD > 11.7 & Air_LWD <= 15.1 & LWD > 15 & LWD <= 18 ~ 2,
Air_LWD > 11.7 & Air_LWD <= 15.1 & LWD > 18 & LWD <= 21 ~ 3,
Air_LWD > 11.7 & Air_LWD <= 15.1 & LWD > 21 ~ 4,
# Temperature Range: 15.1 - 26.6 C
Air_LWD > 15.1 & Air_LWD <= 26.6 & LWD <= 9 ~ 0,
Air_LWD > 15.1 & Air_LWD <= 26.6 & LWD > 9 & LWD <= 12 ~ 1,
Air_LWD > 15.1 & Air_LWD <= 26.6 & LWD > 12 & LWD <= 15 ~ 2,
Air_LWD > 15.1 & Air_LWD <= 26.6 & LWD > 15 & LWD <= 18 ~ 3,
Air_LWD > 15.1 & Air_LWD <= 26.6 & LWD > 18 ~ 4,
# Default (For temperatures out of the specified ranges or any other scenarios)
TRUE ~ 0 # Assigning a default value of 0
)
)
head(df_wallin)
```
We can visualize the daily and cumulative DSV for the monthly period after transforming the date format using `as.Date()` function. The dashed horizontal line in the plot indicates the action threshold of 20 cumulative DSV points, or when a spray should be applied. Please note that in real systems, the DSV is reduced to zero and another DSV counting is initiated after the spray.
```{r}
df_wallin2 <- df_wallin |>
mutate(DSV2 = cumsum(DSV),
date = as.Date(sprintf('%04d-%02d-%02d', YEAR, MO, DY)))
df_wallin2 |>
ggplot(aes(date, DSV))+
geom_col(fill = "darkred")+
geom_line(aes(date, DSV2))+
geom_hline(yintercept = 20, linetype = 2)+
annotate(geom = "text", x = as.Date("2022-03-04"), y = 20.5, label = "Action threshold")+
r4pde::theme_r4pde()+
labs(x = "Date", y = "Daily and cumulative DSV")
```
#### FAST model
Here is a code to calculate daily DSV values based on the FAST table [@madden1978] that relates the hours of relative humidity \> 90% (wet period) and temperature during the wet period during a 24-hour period.
| Mean temp (°C) | 0 | 1 | 2 | 3 | 4 |
|----------------|-----|------|-------|-------|-----|
| 13-17 | 0-6 | 7-15 | 16-20 | 21+ | 23+ |
| 18-20 | 0-3 | 4-8 | 9-15 | 16-22 | 23+ |
| 21-25 | 0-2 | 3-5 | 6-12 | 13-20 | 21+ |
| 26-29 | 0-3 | 4-8 | 9-15 | 16-22 | 23+ |
```{r}
df_fast <- weather2 %>%
mutate(
DSV = case_when(
# Temperature Range: 13 <= T < 18
Air_LWD >= 13 & Air_LWD < 18 & LWD >= 0 & LWD <= 6 ~ 0,
Air_LWD >= 13 & Air_LWD < 18 & LWD >= 7 & LWD <= 15 ~ 1,
Air_LWD >= 13 & Air_LWD < 18 & LWD >= 16 & LWD <= 20 ~ 2,
Air_LWD >= 13 & Air_LWD < 18 & LWD > 20 ~ 3,
# Temperature Range: 18 <= T < 21
Air_LWD >= 18 & Air_LWD < 21 & LWD >= 0 & LWD <= 3 ~ 0,
Air_LWD >= 18 & Air_LWD < 21 & LWD >= 4 & LWD <= 8 ~ 1,
Air_LWD >= 18 & Air_LWD < 21 & LWD >= 9 & LWD <= 15 ~ 2,
Air_LWD >= 18 & Air_LWD < 21 & LWD >= 16 & LWD <= 22 ~ 3,
Air_LWD >= 18 & Air_LWD < 21 & LWD > 22 ~ 4,
# Temperature Range: 21 <= T < 26
Air_LWD >= 21 & Air_LWD < 26 & LWD >= 0 & LWD <= 2 ~ 0,
Air_LWD >= 21 & Air_LWD < 26 & LWD >= 3 & LWD <= 5 ~ 1,
Air_LWD >= 21 & Air_LWD < 26 & LWD >= 6 & LWD <= 12 ~ 2,
Air_LWD >= 21 & Air_LWD < 26 & LWD >= 13 & LWD <= 20 ~ 3,
Air_LWD >= 21 & Air_LWD < 26 & LWD > 20 ~ 4,
# Temperature Range: 26 <= T < 30
Air_LWD >= 26 & Air_LWD < 30 & LWD >= 0 & LWD <= 3 ~ 0,
Air_LWD >= 26 & Air_LWD < 30 & LWD >= 4 & LWD <= 8 ~ 1,
Air_LWD >= 26 & Air_LWD < 30 & LWD >= 9 & LWD <= 15 ~ 2,
Air_LWD >= 26 & Air_LWD < 30 & LWD >= 16 & LWD <= 22 ~ 3,
Air_LWD >= 26 & Air_LWD < 30 & LWD > 22 ~ 4,
# Default (For temperatures out of the specified ranges or any other scenarios)
TRUE ~ 0 # Assigning a default value of 0
)
)
df_fast
df_fast2 <- df_fast |>
mutate(DSV2 = cumsum(DSV),
date = as.Date(sprintf('%04d-%02d-%02d', YEAR, MO, DY)))
df_fast2 |>
ggplot(aes(date, DSV))+
geom_col(fill = "darkred")+
geom_line(aes(date, DSV2))+
geom_hline(yintercept = 20, linetype = 2)+
annotate(geom = "text", x = as.Date("2022-03-04"), y = 21.5, label = "Action threshold")+
r4pde::theme_r4pde()+
labs(x = "Date", y = "Daily and cumulative DSV")
```
#### Bonus app: Dashboard
For educational purposes, an interactive dashboard was developed using R Shiny. This [app](https://epidemiologiaufv.shinyapps.io/warning-systems/) demonstrates the application of the Wallin (forecast for late blight of potato) and FAST (Forecast for *Alternaria solani* on Tomatoes) rules for calculating DSV, determining the appropriate timing for fungicide sprays (based on a defined threshold), and counting the total sprays during a selected period (@fig-ws_systems).
To utilize the system, users should select the model, input the latitude and longitude (or choose a location from the map), and specify the time period for the simulation, such as from plant emergence to harvest. The weather data for this simulation is sourced from the NASA Power project via the {nasapower} R package.
[![Screenshot of a web-based warning system for plant diseases based on Wallin\'s and FAST rules](imgs/ws_systems.png){#fig-ws_systems fig-align="center"}](https://epidemiologiaufv.shinyapps.io/warning-systems/)
### Infection risk during defined periods
Many diseases like late blight of potatoes and early blight of tomatoes, as seen in the previous section, require continuous (season long) risk monitoring to indicate multiple fungicide sprays. However, some warning systems assess risk at only at one time points, such as those crops that are most vulnerable to diseases during specific growth phases. In these cases, a single warning is issued, such as the risk of disease occurrence or occurrence at a intensity above a threshold (e.g. incidence \> 30%).
Some of these systems use statistical (risk algorithms) to predict probabilities of occurrence of the event (yes/no situation) based on weather variables that occur over specific periods of time, usually when primary infections take place. For example, Fusarium head blight (FHB, caused by *Fusarium graminearum*) disease depends on weather events that occur around the flowering stage, when the crop is most vulnerable. This is also the case of *Sclerotinia* (white mold) diseases that affect flowers and the presence/absence of apothecia is key for risk assessment[@willbur2018]. Such events of binary nature can be predicted using risk algorithms that consider weather variables, cultivar susceptibility, soil moisture, etc. A commonly used algorithm is the logistic regression model which deals with a binary classification.
#### Logistic regression: Ohio FHB models
FHB risk models were developed in the United States in the early 2000s to predict the risk of an epidemic, or when disease severity in the field was greater than 10% [@dewolf2003]. In the paper, the epidemic cases were classified as 0 and 1, according to that threshold, and several logistic regression models were fitted to the data using a summary of weather-related variables from two defined periods: seven days prior to flowering date and 10 days following flowering date (50% of wheat heads with anthers).
Among several models, the authors found that a single variable model (model A) exhibited a good accuracy (0.83). The selected variable consisted of a combination of temperature and humidity for the 10-day period after flowering. Named TRH9010, it corresponds to the total number of hours, within the 10-day post-flowering period, when the temperature (T) was \>=15 and \<= 30 ^o^C and relative humidity (RH) was \>90%.
Let's implement this model in R and calculate the probability of infection. As usual, we first need to download hourly data from NASA Power project for a period of three years. Let's work with data from Wooster, OH, United States.
```{r}
library(nasapower)
weather <- get_power(
community = "ag",
lonlat = c(-81.9399, 40.7982),
pars = c("RH2M", "T2M", "PRECTOTCORR"),
dates = c("2020-01-01", "2022-12-31"),
temporal_api = "hourly"
)
head(weather)
```
We can then create a function named `calculate_TRH9010()`. This function calculates the number of hours when specified condition is met (i.e., hours when 15 \<= T \<= 30 & RH \> 90), the probability (*p*) of infection, and the classification of the epidemic (*epi*) as 0 or 1 based on a defined threshold for p (0.36 in this case) [@dewolf2003]. The function's arguments are the weather data frame, YEAR, MO, and DY, allowing us to estimate the risk for a specific day (the flowering date) within the downloaded period. It's important to note that, in the original model, the variables were normalized (its value divided by the maximum observed value). Thus, the hour count for TRH9010 should be divided by the constant 136.
```{r}
calculate_TRH9010 <- function(data, YEAR, MO, DY){
start_row <- which(data$YEAR == YEAR & data$MO == MO & data$DY == DY & data$HR == 0)[1]
if ((nrow(data) - start_row) < 9 * 24){
return(list(TRH9010 = NA, y = NA, p = NA, epi = NA))
}
subset_data <- data[start_row:(start_row + 9 * 24 - 1),]
condition_met <- with(subset_data, T2M >= 15 & T2M <= 30 & RH2M > 90)
total_hours <- sum(condition_met)
y <- -3.3756 + 6.8128 * (total_hours / 136) # divide by maximum of 136 as in the paper
p <- exp(y) / (1 + exp(y))
epi <- ifelse(p > 0.36, 1, 0)
return(list(TRH9010 = total_hours, p = p, epi = epi))
}
```
Now we apply the function for the 1st of June 2021 as flowering date. The output is the hour count, the probability and the epidemic classification.
```{r}
results <- calculate_TRH9010(weather, YEAR = 2021, MO = 6, DY = 1)
results
```
We may want to apply the function for a series of days within a given month. Let's write code to predict the risk for the 15 days of June using a `for()` loop which will store the results in a list, but we can further transform to a dataframe using `map_dfr()` function.
```{r}
num_days <- 15
all_results <- list()
for(day in 1:num_days){
all_results[[paste0("", day)]] <- calculate_TRH9010(weather, YEAR = 2021, MO = 6, DY = day)
}
# transform results to a dataframe
df <- all_results %>%
map_dfr(~ as.data.frame(t(.)), .id = "day")
# Change variables to numeric
df$p <- as.numeric(df$p)
df$day <- as.numeric(df$day)
df$epi <- as.numeric(df$epi)
```
Finally we plot the results.
```{r}
ggplot(df, aes(day, p, fill = factor(epi))) +
geom_col()+
scale_x_continuous(n.breaks = 15)+
scale_fill_manual(values = c("grey60", "darkred"))+
geom_hline(yintercept = 0.36, linetype = 2)+
r4pde::theme_r4pde(font_size = 12)+
ylim(0,1)+
theme(legend.position = "bottom")+
labs(x = "Wheat flowering day in june 2021", y = "Risk probability",
fill = "Epidemic classification",
title = "FHB risk by Ohio (TRH9010) model",
subtitle = "Location: Wooster, Ohio",
caption = "Model source: De Wolf et al. (2002)")
```
#### Mechanistic models: Brazilian FHB model
Different from the previous approach, where statistical models are fitted to data, some models are called mechanistic, dynamic or process-based. These are designed based on the knowledge of the processes of the disease cycle and the factors that affect these processes [@gonzález-domínguez2023]. These usually operate in a hourly to daily basis and can be quite complex and sophisticated with numerous examples being found in the literature [@salotti2023; @rossi2014; @salotti2022; @caffi2010; @caffi2007].
One example is a mechanistic model for FHB developed in Brazil. In its core, the model is based on the daily simulation of the flowering process, the most vulnerable phase of the wheat crop, and events that contribute to infect the flowers. The original model [@delponte2005] integrates several sub models that estimate the daily proportion of flowers, a density of inoculum in a "spore cloud" and a proportion of infected flowers - all these variables being driven by daily variables of temperature and wetness variables such as rainfall and relative humidity.
A modified, simplified version of the model that takes into account only the flowering process and weather-variables that drives infection, was written in R and is presented here for illustration.
![A simplified version of a mechanistic Fusarium head blight model that takes into account the amount of flowers on each day which will be infected depending on rules that define a wet period and temperature during the wet period.](imgs/FHB_diagram.png){#fig-fhbdiagram fig-align="center"}
The function `simulate_FHB()` requires a starting date and weather data frame (adapted here to use NASA Power data set) to run a simulation during 20 days.
Among the several simplifications, which were based on the simulated results of the original model, flowering dynamics was assumed to follow a normal distribution over 10 days, and is adjusted to ensure that the peak in the proportion of anthers is set to 0.8. An adjustment is made so that anthers are set to 0 after the 10th day.
$[ \text{antherproportions} = \left\{ \begin{array}{ll} \frac{1}{\sigma\sqrt{2\pi}} \exp\left(-\frac{1}{2} \left(\frac{x-\mu}{\sigma}\right)^2\right) & \text{for } x = 1, \ldots, 10 \\ 0 & \text{for } x = 11, \ldots, 20 \end{array} \right. ]$ ,
where $𝞵$ = 6 and $𝝈$ = 2.
The probability of the infection of the flowers is calculated as in the paper. It is based on an equation that considers daily temperature (T) and gives the proportion of flowers getting infected on any given day when a **condition for infection** is met.
$INF=0.001029×exp(0.1957×T)$
There are three conditions for an infection event, similar to what is described in the article [@delponte2005]:
- **Continuous 2-day Rain & High Humidity**: There should be rain (\> 0.5 mm) and the average humidity over two consecutive days should be high (\> 80%).
- **Rain Followed by Dry Day & High Humidity**: Rain and high humidity (\> 80%), followed by a day without rain but still with high humidity (\> 85%).
- **Dry Day & High Humidity Followed by Rain**: A day without rain but with high humidity (\> 85%), followed by a day with both rain and high humidity (\> 80%).
Infection then occurs when any of the above conditions are met. In this simpler version (without the inoculum factor), the proportion of infected spikelets, assuming that infected anthers will give rise to infected spikelets, is given by
$InfectedSpikelets = AntherProportions×INF$
```{r}
simulate_FHB <- function(start_date, weather_data) {
start_date <- as.Date(start_date, format = "%Y-%m-%d")
# Extract weather data for 30 days starting from the provided start_date
weather_subset <- subset(weather_data, YYYYMMDD >= start_date & YYYYMMDD < start_date + 20)
# Simulation for 20 days
x <- 1:20
# Normal distribution formula for 10 days using fixed peak day (mu) and spread (sigma)
mu <- 6
sigma <- 2
anther_proportions <- c((1 / (sigma * sqrt(2 * pi))) * exp(-0.5 * ((x[1:10] - mu) / sigma)^2), rep(0, 10))
# Adjust the values to have a peak of 0.8
max_value <- max(anther_proportions)
scaling_factor <- 0.8 / max_value
anther_proportions <- anther_proportions * scaling_factor # This scales the entire vector
# Zero out the anthers after the day they should be zero
anther_proportions[11:20] <- 0
# Calculate the INF value
INF <- 0.001029 * exp(0.1957 * weather_subset$T2M)
# Calculate average RH for two consecutive days
avg_RH2M_2days <- (weather_subset$RH2M + c(weather_subset$RH2M[-1], NA)) / 2
# Check conditions for infection
condition1 <- (weather_subset$PRECTOTCORR > 0.3) & (avg_RH2M_2days > 80)
consecutive_rain_high_humidity <- condition1 & c(FALSE, head(condition1, -1))
preceded_condition <- (weather_subset$PRECTOTCORR > 0.3) & (avg_RH2M_2days > 80) &
c(FALSE, head(weather_subset$RH2M, -1) > 85) &
c(head(weather_subset$PRECTOTCORR, -1) <= 0.3, FALSE)
succeeded_condition <- (weather_subset$PRECTOTCORR > 0.3) & (avg_RH2M_2days > 80) &
c(tail(weather_subset$RH2M, -1) > 85, FALSE) &
c(FALSE, tail(weather_subset$PRECTOTCORR, -1) <= 0.3)
# Combine conditions
infection_trigger <- consecutive_rain_high_humidity == 1 | preceded_condition | succeeded_condition
# Calculate infected anthers based on conditions
infected_spikelets <- ifelse(infection_trigger, anther_proportions * INF, 0)
# Cumulative infected anthers
cumulative_infected_spikelets <- cumsum(infected_spikelets)
# Compile results
result_df <- data.frame(
Date = weather_subset$YYYYMMDD,
T2M = weather_subset$T2M,
RAIN = weather_subset$PRECTOTCORR,
RH2M = weather_subset$RH2M,
Flowers = round(anther_proportions,3),
InfSpikelets = round(infected_spikelets,3),
CumInfSpikelets = round(cumulative_infected_spikelets, 3)
)
return(result_df)
}
```
Let's download the data again for the location of Wooster, Ohio during a 9-year period.
```{r}
library(nasapower)
wooster <- get_power(
community = "ag",
lonlat = c(-81.9399, 40.7982),
pars = c("RH2M", "T2M", "PRECTOTCORR"),
dates = c("2015-01-01", "2023-09-30"),
temporal_api = "daily"
)
wooster
```
Now that we have the data, we can simulate FHB for a specific start flowering date.
```{r}
start_date <- "2018-05-15"
result <- simulate_FHB(start_date, wooster)
head(result, 15)
```
Finally we can plot the dynamics of the flowering and the percent of infected spikelets during the period of 20 days after first flowers appear in the field.
```{r}
# Plotting the data
library(tidyverse)
p1 <- result |>
ggplot(aes(x = Date)) +
geom_area(aes(y = Flowers,
color = "Proportion of flowers"),
linewidth = 1, fill = "yellow", alpha = 0.5) +
geom_line(aes(y = CumInfSpikelets,
color = "Cum. Infected Spikelets"),
linewidth =1) +
labs(title = "FHB simulation model",
subtitle = paste("% Infected spikelets = ",
round(max(result$CumInfSpikelets*100, na.rm = TRUE),2), "%"),
y = "Proportion", x = "Date", color = "Legend") +
r4pde::theme_r4pde(font_size = 12)+
theme(legend.position = "bottom")
```
```{r}
p2 <- result |>
ggplot(aes(x = Date))+
geom_col(aes(x = Date, y = RAIN))+
geom_line(aes(y = T2M))+
geom_line(aes(y = RH2M), linetype = 2)+
labs(y = "Value",
title = "Rain, T and RH")+
r4pde::theme_r4pde(font_size = 12)
```
```{r}
library(patchwork)
p1 / p2
```
As in the previous section, we can apply the function for a series of days and see how the percent of infected spikelets fluctuates over time considering each day as a starting day for flowering.
```{r}
# Vector of dates
dates_vector <- seq(as.Date("2019-05-15"), as.Date("2019-05-30"), by="days")
# Apply function and extract max cumulative infection
max_cumulative_infections <- sapply(dates_vector, function(date) {
result <- simulate_FHB(date, wooster)
max(result$CumInfSpikelets, na.rm = TRUE)
})
df_results <- data.frame(
Date = dates_vector,
MaxCumulativeInfection = max_cumulative_infections
)
# plot the results
df_results |>
ggplot(aes(Date, MaxCumulativeInfection*100))+
geom_col(fill = "darkred", alpha = 0.8)+
labs(title = "FHB simulation model",
subtitle = "Wooster, Ohio - 2019",
x = "Starting day of flowering", y = "% Infected Spikelets")+
r4pde::theme_r4pde(font_size = 14)
```
## Practical considerations
While initial disease warning systems faced underuse, this was attributed to their primary deployment by advisors rather than growers directly, and an evolution where users, through experience, develop simplified rules that diminish the need for constant system consultation. Notable examples, such as the CPO in Denmark and FARMSCAPE in Australia, have documented this learning process, leading to less reliance on such tools. Growers often favor 'good enough' solutions that evolve from system use, which may reduce the need for direct interaction with the system over time.
However, certain diseases with specific control timing requirements, regulatory pressures, or those affecting high-value crops necessitate direct engagement with warning systems. In these scenarios, growers prioritize yield consistency and economic security over reduced pesticide usage, especially when faced with asymmetric risks associated with incorrect management decisions. It has been suggested that disease warning systems might be more beneficial and accepted for crops of intermediate or lower value, where the cost of mismanagement is more balanced. Essential to the success and adoption of these systems is the active involvement of end-users throughout the development process to ensure the system's features align with their practical farming goals and constraints.