-
Notifications
You must be signed in to change notification settings - Fork 16
/
data-remote-sensing.qmd
1473 lines (1047 loc) · 60.8 KB
/
data-remote-sensing.qmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
---
title: "Remote sensing"
editor_options:
chunk_output_type: console
author:
name: Ivan Alberto Lizarazo
email: [email protected]
affiliations:
name: Universidad Nacional de Colombia
city: Bogotá
---
## Introduction
Remote sensing techniques, in the sense of gathering & processing of data by a device separated from the object under study, are increasingly providing an important component of the set of technologies available for the study of vegetation systems and their functioning. This is in spite that many applications only provide indirect estimations of the biophysical variables of interest [@jones2010remote].
Particular advantages of remote sensing for vegetation studies are that: (i) it is non-contact and non-destructive; and (ii) observations are easily extrapolated to larger scales. Even at the plant scale, remotely sensed imagery is advantageous as it allows rapid sampling of large number of plants [@jones2010remote].
This chapter aims at providing a conceptual & practical approach to apply remote sensing data and techniques to infer information useful for monitoring crop diseases. The structure of this chapter is divided into four sections. The first one introduces basic remote sensing concepts and provides a summary of applications of remote sensing of crop diseases. The second one illustrates a case study focused on identification of banana Fusarium wilt from multispectral UAV imagery. The third one illustrates a case study dealing with estimation of cercospora leaf spot disease on table beet. Finally, it concludes with several reflections about potential and limitations of this technology.
## Remote sensing background
### Optical remote sensing
Optical remote sensing makes use of the radiation reflected by a surface in the visible (\~400-700 nm), the near infrared (700-1300 nm) and shortwave infrared (1300-\~3000 nm) parts of the electromagnetic spectrum. Spaceborne & airborne-based remote sensing and field spectroscopy utilize the solar radiation as an illumination source. Lab spectroscopy utilizes a lamp as an artificial illumination source [@fig-RS1].
![Optical remote sensing via spaceborne sensors, field spectroscopy and laboratory spectroscopy (Adapted from [https://pages.cms.hu-berlin.de/EOL/geo_rs/index.htm](https://pages.cms.hu-berlin.de/EOL/geo_rs/)) .](imgs/RS1.png){#fig-RS1}
The proportion of the radiation reflected by a surface depends on the surface's spectral reflection, absorption and transmission properties and varies with wavelength [@fig-RS2]. These spectral properties in turn depend on the surface's physical and chemical constituents [@fig-RS2]. Measuring the reflected radiation hence allows us to draw conclusions on a surface's characteristic, which is the basic principle behind optical remote sensing.
![Reflection, absortion and transmission by a surface (left). Spectral reflectance profile of a vegetation with major factors determining the reflection (right). Source: <https://pages.cms.hu-berlin.de/EOL/geo_rs/>](imgs/RS-2.jpg){#fig-RS2}
### Vegetation spectral properties
Optical remote sensing enables the deduction of various vegetation-related characteristics, including biochemical properties (e.g., pigments, water content), structural properties (e.g., leaf area index (LAI), biomass) or process properties (e.g., light use efficiency (LUE)). The ability to deduce these characteristics depends on the ability of a sensor to resolve vegetation spectra. Hyperspectral sensors capture spectral information in hundreds of narrow and contiguous bands in the VIS, NIR and SWIR, and, thus, resolve subtle absorption features caused by specific vegetation constituents (e.g. anthocyanins, carotenoids, lignin, cellulose, proteins). In contrast, multispectral sensors capture spectral information in a few broad spectral bands and, thus, only resolve broader spectral features. Still, multispectral systems like Sentinel-2 have been demonstrated to be useful to derive valuable vegetation properties (e.g., LAI, chlorophyll).
![Vegetation spectrum in hyperspectral (ASD FielSpec4, EnMAP) and multispectral (Sentinel-2) resolution as well as characteristic spectral features caused by various constituents and processes (absorption lines shown as grey dashed lines). Source: [@hank2018]](imgs/RS-3.png){#fig-RS3}
### What measures a remote sensor?
Optical sensors/spectrometers measure the radiation reflected by a surface to a certain solid angle in the physical quantity radiance. The unit of radiance is watts per square meter per steradian (W • m-2 • sr-1) [@fig-RS4]. In other words, radiance describes the amount of energy (W) that is reflected from a surface (m-2) and arrives at the sensor in a three-dimensional angle (sr-1).
![Source: https://pages.cms.hu-berlin.de/EOL/geo_rs/](imgs/RS-4.png){#fig-RS4 fig-align="center" width="428"}
A general problem related to the use of radiance as unit of measurement is the variation of radiance values with illumination. For example, the absolute incoming solar radiation varies over the course of the day as a function of the relative position between sun and surface and so does the absolute amount of radiance measured. We can only compare measurements taken a few hours apart or on different dates when we are putting the measured radiance in relation to the incoming illumination.
The quotient between measured reflected radiance and measured incoming radiance (Radiance~reflected~ / Radiance~incoming~) is called reflectance (usually denoted as $\rho$). Reflectance provides a stable unit of measurement which is independent from illumination and is the percentage of the total measurable radiation, which has not been absorbed or transmitted.
### Hyperspectral vs.multispectral imagery
Hyperspectral imaging involves capturing and analyzing data from a large number of narrow, contiguous bands across the electromagnetic spectrum, resulting in a high-resolution spectrum for each pixel in the image. As a result, a hyperspectral camera provides smooth spectra. The spectra provided by multispectral cameras are more like stairs or saw teeth without the ability to depict acute spectral signatures [@fig-RS8].
### Vegetation Indices
A vegetation index (VI) represents a spectral transformation of two or more bands of spectral imagery into a singleband image. A VI is designed to enhance the vegetation signal with regard to different vegetation properties, while minimizing confounding factors such as soil background reflectance, directional, or atmospheric effects. There are many different VIs, including multispectral broadband indices as well as hyperspectral narrowband indices.
Most of the multispectral broadband indices make use of the inverse relationship between the lower reflectance in the red (through chlorophyll absorption) and higher reflectance in the near-infrared (through leaf structure) to provide a measure of greenness that can be indirectly related to biochemical or structural vegetation properties (e.g., chlorophyll content, LAI). The Normalized Difference Vegetation Index (NDVI) is one of the most commonly used broadband VIs:
$$NDVI = \frac{\rho_{nir} - \rho_{red} }{\rho_{nir} + \rho_{red}}$$
The interpretation of the absolute value of the NDVI is highly informative, as it allows the immediate recognition of the areas of the farm or field that have problems. The NDVI is a simple index to interpret: its values vary between -1 and 1, and each value corresponds to a different agronomic situation, regardless of the crop [@fig-RS6]
![Agronomic conditions depending on the values in a NDVI scale](imgs/RS-6.png){#fig-RS6 fig-align="center"}
## Remote sensing of crop diseases
### Detection of plant stress
One popular use of remote sensing is in diagnosis and monitoring of plant *responses* to biotic (i.e. disease and insect damage) and abiotic stress (e.g. water stress, heat, high light, pollutants) with hundreds of publications on the topic. It is worth nothing that most available techniques monitor the plant *response* rather than the stress itself. For example, with some diseases, it is common to estimate changes in canopy cover (using vegetation indices) as measures of "disease" but this measure could also be associated to water deficit [@jones2010remote]. This highlights the importance of measuring crop conditions in the field & laboratory to collect reliable data and be able to disentangle complex plant responses. Anyway, remote sensing can be used as the first step in site-specific disease control and also to phenotype the reactions of plant genotypes to pathogen attack [@lowe2017].
### Optical methods for measuring crop disease
There are a variety of optical sensors for the assessment of plant diseases. Sensors can be based only on the visible spectrum (400-700 nm) or on the visible and/or infrared spectrum (700 nm - 1mm). The latter may include near-infrared (NIR) (0.75-1.4 $μm$), short wavelength infrared (SWIR) (1.4--3 $μm$), medium wavelength infrared (MWIR) (3-8 $μm$), or thermal infrared (8-15 $μm$) [@fig-RS8]. Sensors record either imaging or non imaging (i.e average) spectral radiance values which need to be converted to reflectance before conducting any crop disease monitoring task.
![source: @delponte2024](imgs/RS-8.png){#fig-RS8}
In a recent chapter of Agrio's Plant Pathology, @delponte2024 highlights the importance of understanding the basic principles of the interaction of light with plant tissue or the plant canopy as a crucial prerrequisite for the analysis and interpretation for disease assessment. When a plant is infected, there are changes to the phisiology and biochemistry of the host, with the eventual development of disease symptoms and/or signs of the pathogen which may be accompanied by structural and biochemical changes that affect absorbance, transmittance, and reflectance of light [@fig-RS10].
![source: @delponte2024](imgs/RS-10.png){#fig-RS9}
### Scopes of disease sensing
The quantification of typical disease symptoms (disease severity) and assessment of leaves infected by several pathogens are relatively simple for imaging systems but may become a challenge for nonimaging sensors and sensors with inadequate spatial resolution [@oerke2020]. Systematic monitoring of a crop by remote sensors can allow farmers to take preventive actions if infections are detected early. Remote sensing sensors & processing techniques need to be carefully selected to be capable of (a) detecting a deviation in the crop's health status brought about by pathogens, (b) identifying the disease, and (c) quantifying the severity of the disease. Remote sensing can also be effectively used in (d) food quality control [@fig-RS10].
![Source: [@oerke2020]](imgs/RS-9.png){#fig-RS10}
### Monitoring plant diseases
Sensing of plants for precision disease control is done in large fields or greenhouses where the aim is to detect the occurrence of diseases at the early stages of epidemics, i.e., at low symptom frequency. @lowe2017 reviewed hyperspectral imaging of plant diseases, focusing on early detection of diseases for crop monitoring. They report several analysis techniques successfully used for the detection of biotic and abiotic stresses with reported levels of accuracy higher than 80%.
| Technique | Plant (stress) |
|----------------------------------------|---------------------------------|
| Quadratic discriminant analysis (QDA) | Wheat (yellow rust) |
| | Avacado (laurel wilt) |
| Decision tree (DT) | Avacado (laurel wilt) |
| | Sugarbeet (cerospora leaf spot) |
| | Sugarbeet (powdery mildew) |
| | Sugarbeet (leaf rust) |
| Multilayer perceptron (MLP) | Wheat (yellow rust) |
| Partial least square regression (PLSR) | Celery (sclerotinia rot) |
| Raw | |
| Savitsky-Golay 1st derivative | |
| Savitsky-Golay 2nd derivative | |
| Partial least square regression (PLSR) | Wheat (yellow rust) |
| Fishers linear determinant analysis | Wheat (aphid) |
| | Wheat (powdery mildew) |
| | Wheat (powdery mildew) |
| Erosion and dilation | Cucumber (downey mildew) |
| Spectral angle mapper (SAM) | Sugarbeet (cerospora leaf spot) |
| | Sugarbeet (powdery mildew) |
| | Sugarbeet (leaf rust) |
| | Wheat (head blight) |
| Artificial neural network (ANN) | Sugarbeet (cerospora leaf spot) |
| | Sugarbeet (powdery mildew) |
| | Sugarbeet (leaf rust) |
| Support vector machine (SVM) | Sugarbeet (cerospora leaf spot) |
| | Sugarbeet (powdery mildew) |
| | Sugarbeet (leaf rust) |
| | Barley (drought) |
| Spectral information divergence (SID) | Grapefruit |
| | (canker, greasy spot, insect |
| | damage, scab, wind scar) |
: Statistical techniques used to detect both biotic and abiotic stresses in crops. Source: @lowe2017
@lowe2017 state that remote sensing of diseases under production conditions is challenging because of variable environmental factors and crop-intrinsic characteristics, e.g., 3D architecture, various growth stages, variety of diseases that may occur simultaneously, and the high sensitivity required to reliably perceive low disease levels suitable for decision-making in disease control. The use of less sensitive systems may be restricted to the assessment of crop damage and yield losses due to diseases.
### UAV applications for plant disease detection and monitoring
@kouadio2023 undertook a systematic quantitative literature review to summarize existing literature in UAV-based applications for plant disease detection and monitoring. Results reveal a global disparity in research on the topic, with Asian countries being the top contributing countries. World regions such as Oceania and Africa exhibit comparatively lesser representation. To date, research has largely focused on diseases affecting wheat, sugar beet, potato, maize, and grapevine [@fig-RS11]. Multispectral, red-green-blue, and hyperspectral sensors were most often used to detect and identify disease symptoms, with current trends pointing to approaches integrating multiple sensors and the use of machine learning and deep learning techniques. The authors suggest that future research should prioritize (i) development of cost-effective and user-friendly UAVs, (ii) integration with emerging agricultural technologies, (iii) improved data acquisition and processing efficiency (iv) diverse testing scenarios, and (v) ethical considerations through proper regulations.
![Source: @kouadio2023](imgs/RS-12.png){#fig-RS11 width="488"}
## Disease detection
This section illustrates the use of unmanned aerial vehicle (UAV) remote sensing imagery for identifying banana wilt disease. Fusarium wilt of banana, also known as "banana cancer", threatens banana production areas worldwide. Timely and accurate identification of Fusarium wilt disease is crucial for effective disease control and optimizing agricultural planting structure [@pegg2019].
A common initial symptom of this disease is the appearance of a faint pale yellow streak at the base of the petiole of the oldest leaf. This is followed by leaf chlorosis which progresses from lower to upper leaves, wilting of leaves and longitudinal splitting of their bases. Pseudostem splitting of leaf bases is more common in young, rapidly growing plants [@pegg2019][@fig-RSfw].
![Cavendish plant affected by Race 1 Foc (D. Peasley). Source: [@pegg2019]](https://www.frontiersin.org/files/Articles/469624/fpls-10-01395-HTML/image_m/fpls-10-01395-g006.jpg){#fig-RSfw fig-align="center"}
@ye2020 made publicly available experimental data [@HuichunYE2023] on wilted banana plants collected in a banana plantation located in Long'an County, Guangxi (China). The data set includes UAV multispectral reflectance data and ground survey data on the incidence of banana wilt disease. The paper by @ye2020 reports that the banana Fusarium wilt disease can be easily identified using several vegetation indices (VIs) obtained from this data set. Tested VIs include green chlorophyll index (CIgreen), red-edge chlorophyll index (CIRE), normalized difference vegetation index (NDVI), and normalized difference red-edge index (NDRE). The dataset can be downloaded from [here](https://www.scidb.cn/en/detail?dataSetId=8d77781a1d754db4b8842708a69c2c22).
### Software setup
Let's start by cleaning up R memory:
```{r}
rm(list=ls())
```
Then, we need to install several packages (if they are not installed yet):
```{r}
list.of.packages <- c("terra",
"tidyterra",
"stars",
"sf",
"leaflet",
"leafem",
"dplyr",
"ggplot2",
"tidymodels")
new.packages <- list.of.packages[!(list.of.packages %in% installed.packages()[,"Package"])]
if(length(new.packages)) install.packages(new.packages)
```
Now, let's load all the required packages:
```{r}
#| warning: false
#| message: false
library(terra)
library(tidyterra)
library(stars)
library(sf)
library(leaflet)
library(leafem)
library(dplyr)
library(ggplot2)
library(tidymodels)
```
### Reading the dataset
Next code supposes you have already downloaded the @HuichunYE2023 dataset and unzipped its content under the *data/banana_data* directory.
### File formats
Let's list the files under each subfolder:
```{r}
list.files("data/banana_data/1_UAV multispectral reflectance")
```
Note that the *.tif* file contains an orthophotomosaic of surface reflectance. It was created from UAV images taken with a *Micasense Red Edge M* camera which has five narrow spectral bands: Blue (465--485 nm), green (550--570 nm), red (653--673 nm), red edge (712--722 nm), and near-infrared (800--880 nm). We assume here that those images have been radiometrically and geometrically corrected.
```{r}
list.files("data/banana_data/2_Ground survey data of banana Fusarium wilt")
```
This is shapefile with 80 points where the plant health status was collected in same date as the images.
```{r}
list.files("data/banana_data/3_Boundary of banana planting region")
```
This is a shapefile with one polygon representing the boundary of the study area.
### Read the orthomosaic and the ground data
Now, let's read the orthomosaic using the *terra* package:
```{r}
# Open the tif
tif <- "data/banana_data/1_UAV multispectral reflectance/UAV multispectral reflectance.tif"
rrr <- terra::rast(tif)
```
Let's check what we get:
```{r}
rrr
```
Note that this is a 5-band multispectral image with 8 cm pixel size.
Now, let's read the ground data:
```{r}
shp <- "data/banana_data/2_Ground survey data of banana Fusarium wilt/Ground_survey_data_of_banana_Fusarium_wilt.shp"
ggg <- sf::st_read(shp)
```
What we got?
```{r}
ggg
```
Note that the attributes are in Chinese language. It seems that we will need to do several changes.
### Visualizing the data
As the orthomosaic is too heavy to visualize, we will need a coarser version of it. Let's use the *terra* package for doing it.
```{r}
rrr8 <- terra::aggregate(rrr, 8)
#terra <- resample(elev, template, method='bilinear')
```
Let's check the output:
```{r}
rrr8
```
Note that the pixel size of the aggregated raster is 64 cm. Now, in order to visualize the ground points, we will need a color palette:
```{r}
pal <- colorFactor(
palette = c('green', 'red'),
domain = ggg$样点类型
)
```
Then, we will use the *leaflet* package to plot the new image and the ground points:
```{r}
#| warning: false
#| message: false
leaflet(data = ggg) |>
addProviderTiles("Esri.WorldImagery") |>
addRasterImage(rrr8) |>
addCircleMarkers(~x_经度, ~y_纬度,
radius = 5,
label = ~样点类型,
fillColor = ~pal(样点类型),
fillOpacity = 1,
stroke = F)
```
### Extracting image values at sampled points
Now we will extract raster values at point locations using the `st_extract()` function from the {stars} library. It is expected that a value per band is extracted at each point.
We need to convert the *raster* object into a *stars* object:
```{r}
sss <- st_as_stars(rrr)
```
What we got?
```{r}
sss
```
Before conducting the extraction task, it is advisable to collect band values not at a single pixel but at a small window (e.g. 3x3 pixels). Thus, we will start creating 20cm buffers at each site:
```{r}
poly <- st_buffer(ggg, dist = 0.20)
```
Now, the extraction task:
```{r}
# Extract the median value per polygon
buf_values <- aggregate(sss, poly, FUN = median) |>
st_as_sf()
```
What we got:
```{r}
buf_values
```
Note that names of bands are weird:
```{r}
names(buf_values)
```
Let's rename band values:
```{r}
buf_values |> rename(blue = "UAV multispectral reflectance.tif.V1",
red = "UAV multispectral reflectance.tif.V2",
green = "UAV multispectral reflectance.tif.V3",
redge = "UAV multispectral reflectance.tif.V4",
nir = "UAV multispectral reflectance.tif.V5") -> buf_values2
```
Now, we got shorter names per band:
```{r}
buf_values2
```
### Computing vegetation indices
@ye2020 used the following indices [@fig-RS12]:
![](imgs/RS_VIs.png){#fig-RS12 fig-align="center"}
Thus, we will compute several of those indices:
```{r}
buf_indices <- buf_values2 |>
mutate(ndvi = (nir - red) / (nir+red),
ndre = (nir - redge) / (nir+redge),
cire = (nir) / (redge-1),
sipi = (nir - blue) / (redge-red)
) |> select(ndvi, ndre, cire, sipi)
```
What we got:
```{r}
buf_indices
```
Note that the health status is missing in *buf_indices*. Therefore, we will need to use a spatial join to link such status:
```{r}
samples <- st_join(
ggg,
buf_indices,
join = st_intersects)
```
Let's check the output:
```{r}
samples
```
It seems we succeeded.
```{r}
unique(samples$样点类型)
```
Let's check it:
```{r}
samples
```
Now, we will replace the Chinese words for English words:
```{r}
samples |>
mutate(score = ifelse(样点类型 == '健康植株', 'healthy', 'wilted')) |>
rename(east = x_经度,
north = y_纬度 ) |>
select(OBJECTID,score, ndvi, ndre, cire, sipi) -> nsamples
```
Let's check the output:
```{r}
nsamples
```
As we will not intend to use the geometry in our model, we can remove it:
```{r}
st_geometry(nsamples) <- NULL
```
Let's check the output:
```{r}
nsamples
```
As our task is a binary classification (i.e any site can be either healthy or wilted), the variable to estimate is a *factor* (not a character).
Let's change the data type of such variable:
```{r}
nsamples$score = as.factor(nsamples$score)
```
Let's check the result:
```{r}
nsamples
```
A simple summary of the extracted data can be useful:
```{r}
nsamples |>
group_by(score) |>
summarize(n())
```
This mean the dataset is balanced which is very good.
### Saving the extracted dataset
Now, let's save the *nsamples* object. Just in case R crashes due to lack of memory.
```{r}
#uncomment if needed
#st_write(nsamples, "./banana_data/nsamples.csv", overwrite=TRUE)
```
### Classification of Fusarium wilt using machine learning (ML)
The overall process to classify the crop disease under study will be conducted using the [*tidymodels* framework](https://www.tidymodels.org/) which is an extension of the *tidyverse* suite. It is especially focused towards providing a generalized way to define, run and optimize ML models in R.
#### Exploratory analysis
As a first step in modeling, it's always a good idea to visualize the data. Let's start with a *boxplot* to displays the distribution of a vegetation index. It visualizes five summary statistics (the median, two hinges and two whiskers), and all "outlying" points individually.
```{r}
p <- ggplot(nsamples, aes(score, ndre))+
r4pde::theme_r4pde()
p + geom_boxplot()
```
Next, we will do a scatterplot to visualize the indices NDRE and CIRE:
```{r}
ggplot(nsamples) +
aes(x = ndre, y = cire, color = score) +
geom_point(shape = 16, size = 4) +
labs(x = "NDRE", y = "CIRI") +
r4pde::theme_r4pde() +
scale_color_manual(values = c("#71b075", "#ba0600"))
```
#### Splitting the data
Next step is to divide the data into a training and a test set. The `set.seed()` function can be used for reproducibility of the computations that are dependent on random numbers. By default, the training/testing split is 0.75 to 0.25.
```{r}
set.seed(42)
data_split <- initial_split(data = nsamples)
data_train <- training(data_split)
data_test <- testing(data_split)
```
Let's check the result:
```{r}
data_train
```
#### Defining the model
We will use a logistic regression which is a simple model. It may be useful to have a look at [this explanation](https://mlu-explain.github.io/logistic-regression/) of such a model.
```{r}
spec_lr <-
logistic_reg() |>
set_engine("glm") |>
set_mode("classification")
```
#### Defining the recipe
The `recipe()` function to be used here has two arguments:
- A formula. Any variable on the left-hand side of the tilde (\~) is considered the model outcome (here, outcome). On the right-hand side of the tilde are the predictors. Variables may be listed by name, or you can use the dot (.) to indicate all other variables as predictors.
- The data. A recipe is associated with the data set used to create the model. This will typically be the training set, so `data = data_train` here.
```{r}
recipe_lr <-
recipe(score ~ ., data_train) |>
add_role(OBJECTID, new_role = "id") |>
step_zv(all_predictors()) |>
step_corr(all_predictors())
```
#### Evaluating model performance
Next, we need to specify what we would like to see for determining the performance of the model. Different modelling algorithms have different types of metrics. Because we have a binary classification problem (healthy vs. wilted classification), we will chose the AUC - ROC evaluation metric here.
#### Combining model and recipe into a workflow
We will want to use our recipe across several steps as we train and test our model. We will:
- Process the recipe using the training set: This involves any estimation or calculations based on the training set. For our recipe, the training set will be used to determine which predictors should be converted to dummy variables and which predictors will have zero-variance in the training set, and should be slated for removal.
- Apply the recipe to the training set: We create the final predictor set on the training set.
- Apply the recipe to the test set: We create the final predictor set on the test set. Nothing is recomputed and no information from the test set is used here; the dummy variable and zero-variance results from the training set are applied to the test set.
To simplify this process, we can use a model workflow, which pairs a model and recipe together. This is a straightforward approach because different recipes are often needed for different models, so when a model and recipe are bundled, it becomes easier to train and test workflows. We'll use the workflows package from *tidymodels* to bundle our model with our recipe.
Now we are ready to setup our complete modelling workflow. This workflow contains the model specification and the recipe.
```{r}
wf_bana_wilt <-
workflow(
spec = spec_lr,
recipe_lr
)
wf_bana_wilt
```
#### Fitting the logistic regression model
Now we use the workflow previously created to fit the model on our training data. We use the training partition of the data.
```{r}
#| warning: false
#| message: false
fit_lr <- wf_bana_wilt |>
fit(data = data_train)
```
Let's check the output:
```{r}
fit_lr
```
Now, we will use the fitted model to estimate health status in the training data:
```{r}
rf_training_pred <-
predict(fit_lr, data_train) |>
bind_cols(predict(fit_lr, data_train, type = "prob")) |>
# Add the true outcome data back in
bind_cols(data_train |>
select(score))
```
What we got?
```{r}
rf_training_pred
```
Let's estimate the training accuracy:
```{r}
rf_training_pred |> # training set predictions
accuracy(truth = score, .pred_class) -> acc_train
acc_train
```
The accuracy of the model on the training data is 0.85 which is above 0.5 (mere chance). This basically means that the model was able to learn predictive patterns from the training data. To see if the model is able to generalize what it learned when exposed to new data, we evaluate the model on our hold-out (or so-called test data). We created a test dataset when splitting the data at the start of the modelling.
#### Evaluating the model on test data
Now, we will use the fitted model to estimate health status in the testing data:
```{r}
lr_testing_pred <-
predict(fit_lr, data_test) |>
bind_cols(predict(fit_lr, data_test, type = "prob")) |>
bind_cols(data_test |> select(score))
```
What we got:
```{r}
lr_testing_pred
```
Let's compute the testing accuracy:
```{r}
lr_testing_pred |> # test set predictions
accuracy(score, .pred_class)
```
The resulting accuracy is similar to the accuracy on the training data. It is good for a first go and a relatively simple classification model.
```{r}
## Let's plot the AUC-ROC
lr_testing_pred |>
roc_curve(truth = score, .pred_wilted, event_level="second") |>
mutate(model = "Logistic Regression") |>
autoplot()
```
### Conclusions
In this section, we trained and tested a logistic regression model (LGM) using four spectral indices as predictor variables (i.e. NDVI, NDRE, CIRE and SIPI). Compare this section results, in terms of equation and accuracy, with the individual LGMs tested by [@ye2020][@fig-RS13].
![](imgs/RS_logregmodels.png){#fig-RS13 fig-align="center"}
Note that we have not tested other ML algorithms. But there are a lot of them available from the *tidymodels* framework (e.g. random forests, support vector machines, gradient boosting machines).
To conclude, this section illustrated how to use VIs derived from UAV-based multispectral imagery and ground data to develop an identification model for detecting banana Fusarium wilt. The results showed that a simple logistic regression model is able to identify Fusarium wilt of banana from several VIs with a good accuracy. However, before going too optimistic, I would suggest to study the @ye2020 paper and critically evaluate their experiment design, methods and results.
## Disease quantification
### Introduction
This section illustrates the use of UAV multispecral imagery for estimating the severity of cercospora leaf spot (CLS) disease in table beet, the most destructive fungal disease of table beet [@skaracis2010; @tan2023]. The CLS disease causes rapid defoliation and significant crop loss may occur through the inability to harvest with top-pulling machinery. It may also render the produce unsaleable for the fresh market [@skaracis2010].
@Saif2024 recently published on [Mendeley data](https://data.mendeley.com/datasets/v9b7rwrwx9/1) UAV imagery & ground truth CLS data collected a several table beef plots at Cornell AgriTech, Geneva, New York, USA. Note that, in this study, UAV multispectral images were collected using a Micasense Red Edge camera, similar to the one used in the case study for the detection of Fusarium wilt on banana. I have not found any scientific paper estimating CLS severity from this dataset. However, in @saif2023, a similar dataset was used to forecast table beet root yield at Cornell Agritech. While it is just a guess, it is worth visualizing the plots in the latter study.
![Color mosaic of third flight hyperspectral image. The visualization bands are the following: red (639 nm), green (550 nm), and blue (470 nm). The crops in the red boxes in the figure were the plots under study. Source: [@saif2023]](imgs/RS_beetyield.png){#fig-RSbeetyield fig-align="center"}
This section aims at estimating CLS leaf severity using vegetation indices (VIs) derived from the multispectral UAV imagery as spectral covariates. The section comprises two parts: (i) Part 1 creates a spectral covariate table to summarize the complete UAV imagery & ground truth data; and (ii) Part 2 trains and tests a machine learning model to estimate CLS severity.
### CLS severity on leaves
It is very important to visualize how different levels of table beet CLS disease severity look in RGB color leaf images. The figure below is a standard area diagram set (SADs) that is used by raters during the assessment of visual severity to increase their accuracy and reliability of the estimates.
![Standard area diagram set for the severity of table beet cercospora leaf spot. Source: [@delponte2019]](imgs/RS_cls_sev.png){#fig-RS-cls_sev fig-align="center"}
### Software setup
Let's start by cleaning up R memory:
```{r}
rm(list=ls())
```
Then, we need to install several packages (if they are not installed yet):
```{r}
#| warning: true
list.of.packages <- c("readr","terra", "tidyterra", "stars", "sf", "leaflet", "leafem", "dplyr", "ggplot2", "tidymodels")
new.packages <- list.of.packages[!(list.of.packages %in% installed.packages()[,"Package"])]
if(length(new.packages)) install.packages(new.packages)
```
Now, let's load all the required packages:
```{r}
#| warning: false
#| message: false
library(readr)
library(terra)
library(tidyterra)
library(stars)
library(sf)
library(leaflet)
library(leafem)
library(dplyr)
library(ggplot2)
library(tidymodels)
```
### Read the dataset
The following code assumes you have already downloaded the [dataset](https://data.mendeley.com/datasets/v9b7rwrwx9/1) and unzipped its content under the *data/cercospora_data* directory. What files are in that folder?
```{r}
list.files("data/cercospora_data")
```
Note that there is one CLS_DS.csv file with the CLS ground data and two folders with the UAV multispectral images. We can infer that 2021, 2022, 2023 refer to the image acquisition years. At this point, it is very important to check the README file. I will summarize the following points:
#### File naming convention
Each image file is named according to the plot number and the date of capture, using the format plt_rYYYYMMDD, where:
- **'plt'** stands for the plot number.
- **'YYYYMMDD'** represents the date on which the image was captured (year, month, day).
For example, the file name 5_r20210715 corresponds to an image taken on July 15, 2021, from study plot 5.
#### CLS Severity Data
File named CLS_DS contain visual assessments of CLS disease severity noted for each plot.
### Inspect the format of each file
Let's list the first images under an image folder:
```{r}
list.files("data/cercospora_data/multispec_2021_2022/")[1:15]
```
Note that, for 2021, there are six dates of image acquisition: 20210707, 20210715, 20210720,20210726, 20210802,20210825
### Read the orthomosaics
Now, let's read several plot images using the *terra* package:
```{r}
# Open a tif collected on study plot 5 on 7th July 2021
tif1 <- "data/cercospora_data/multispec_2021_2022/1_r20210707.tif"
tif2 <- "data/cercospora_data/multispec_2021_2022/2_r20210707.tif"
tif3 <- "data/cercospora_data/multispec_2021_2022/3_r20210707.tif"
tif4 <- "data/cercospora_data/multispec_2021_2022/4_r20210707.tif"
tif5 <- "data/cercospora_data/multispec_2021_2022/5_r20210707.tif"
tif6 <- "data/cercospora_data/multispec_2021_2022/6_r20210707.tif"
tif7 <- "data/cercospora_data/multispec_2021_2022/7_r20210707.tif"
tif8 <- "data/cercospora_data/multispec_2021_2022/8_r20210707.tif"
tif9 <- "data/cercospora_data/multispec_2021_2022/9_r20210707.tif"
tif10 <- "data/cercospora_data/multispec_2021_2022/10_r20210707.tif"
```
It may be convenient to increase image pixel size (to make the raster lighter):
```{r}
p01 <- terra::rast(tif1) %>% aggregate(20)
p02 <- terra::rast(tif2) %>% aggregate(20)
p03 <- terra::rast(tif3) %>% aggregate(20)
p04 <- terra::rast(tif4) %>% aggregate(20)
p05 <- terra::rast(tif5) %>% aggregate(20)
p06 <- terra::rast(tif6) %>% aggregate(20)
p07 <- terra::rast(tif7) %>% aggregate(20)
p08 <- terra::rast(tif8) %>% aggregate(20)
p09 <- terra::rast(tif9) %>% aggregate(20)
p10 <- terra::rast(tif10) %>% aggregate(20)
```
Let's check what we get:
```{r}
p01
```
Note that each image has 5 bands with \~21 cm pixel size.
As the images have been split in plots, it may be useful to "mosaic" them.
```{r}
# with many SpatRasters, make a SpatRasterCollection from a list
rlist <- list(p01, p02, p03, p04, p05, p06, p07, p08, p09, p10)
rsrc <- sprc(rlist)
m <- merge(rsrc)
```
What we got?
```{r}
m
```
Let's know band names:
```{r}
names(m)
```
We will rename band names:
```{r}
# Rename
m2 <- m %>%
rename(blue = "1_r20210707_1", green = "1_r20210707_2",
red = "1_r20210707_3", redge= "1_r20210707_4",
nir = "1_r20210707_5")
```
Let's get a summary of the mosaic:
```{r}
summary(m2)
```
```{r}
hist(m2)
```
Let's visualize the mosaic:
```{r}
ggplot() +
geom_spatraster(data = m2) +
facet_wrap(~lyr, ncol = 5) +
r4pde::theme_r4pde(font_size = 10)+
scale_fill_whitebox_c(
palette = "muted",
labels = scales::label_number(suffix = "%"),
n.breaks = 10,
guide = guide_legend(reverse = TRUE)
) +
labs(
fill = "",
title = "UAV multispectral mosaic",
subtitle = "07.07.2021"
)
```
Let's read all images collected at a later date:
```{r}
# Open all images collected on 25th August 2021
tif1 <- "data/cercospora_data/multispec_2021_2022/1_r20210825.tif"
tif2 <- "data/cercospora_data/multispec_2021_2022/2_r20210825.tif"
tif3 <- "data/cercospora_data/multispec_2021_2022/3_r20210825.tif"
tif4 <- "data/cercospora_data/multispec_2021_2022/4_r20210825.tif"
tif5 <- "data/cercospora_data/multispec_2021_2022/5_r20210825.tif"
tif6 <- "data/cercospora_data/multispec_2021_2022/6_r20210825.tif"
tif7 <- "data/cercospora_data/multispec_2021_2022/7_r20210825.tif"
tif8 <- "data/cercospora_data/multispec_2021_2022/8_r20210825.tif"
tif9 <- "data/cercospora_data/multispec_2021_2022/9_r20210825.tif"
tif10 <- "data/cercospora_data/multispec_2021_2022/10_r20210825.tif"
```
```{r}
p01 <- terra::rast(tif1) %>% aggregate(20)
p02 <- terra::rast(tif2) %>% aggregate(20)
p03 <- terra::rast(tif3) %>% aggregate(20)
p04 <- terra::rast(tif4) %>% aggregate(20)
p05 <- terra::rast(tif5) %>% aggregate(20)
p06 <- terra::rast(tif6) %>% aggregate(20)
p07 <- terra::rast(tif7) %>% aggregate(20)
p08 <- terra::rast(tif8) %>% aggregate(20)
p09 <- terra::rast(tif9) %>% aggregate(20)
p10 <- terra::rast(tif10) %>% aggregate(20)
```
```{r}
# with many SpatRasters, make a SpatRasterCollection from a list
rlist <- list(p01, p02, p03, p04, p05, p06, p07, p08, p09, p10)
rsrc <- sprc(rlist)
mm <- merge(rsrc)
```
```{r}
names(mm)
```
```{r}
# Rename
mm2 <- mm %>%
rename(blue = "1_r20210825_1", green = "1_r20210825_2",
red = "1_r20210825_3", redge= "1_r20210825_4",
nir = "1_r20210825_5")
```
Now, a visualization task:
```{r}
ggplot() +
geom_spatraster(data = mm2) +
facet_wrap(~lyr, ncol = 5) +
r4pde::theme_r4pde(font_size = 10)+
scale_fill_whitebox_c(
palette = "muted",
labels = scales::label_number(suffix = "%"),
n.breaks = 10,
guide = guide_legend(reverse = TRUE)
) +
labs(
fill = "",
title = "UAV multispectral mosaic",
subtitle = "08.25.2021"
)
```
### Reading severity data
Now, let's read the ground CLS data:
```{r}
file <- "data/cercospora_data/CLS_DS.csv"
cls <- readr::read_csv(file, col_names = TRUE, show_col_types = FALSE)
```
What we got?
```{r}
cls
```
Note that columns D0 to D6 refer to the disease severity recorded in percentage at different dates for each study plot. Note also that this object is structured in *wide format* (i.e. each row represents a single plot for each year).
```{r}
unique(cls$Plot)
```
Let's know the values stored in the *year* attribute:
```{r}
unique(cls$year)
```
### Preparing a set of spectral covariables per plot & date
The 2021 data collection dates are 20210707, 20210715, 20210720, 20210802,and 20210825.
Let's start creating a list of images per date for 2021:
```{r}
# Create a list .tif files collected at each date:
# These are the DXfiles
D0files <- Sys.glob("data/cercospora_data/multispec_2021_2022/*_r20210707.tif")
D1files <- Sys.glob("data/cercospora_data/multispec_2021_2022/*_r20210715.tif")
D2files <- Sys.glob("data/cercospora_data/multispec_2021_2022/*_r20210720.tif")
D3files <- Sys.glob("data/cercospora_data/multispec_2021_2022/*_r20210802.tif")
D4files <- Sys.glob("data/cercospora_data/multispec_2021_2022/*_r20210825.tif")
```
Now, define date variables:
```{r}
# These are the DX variables
(D0 <- as.Date('7/7/2021',format='%m/%d/%Y'))
(D1 <- as.Date('7/15/2021',format='%m/%d/%Y'))
(D2 <- as.Date('7/20/2021',format='%m/%d/%Y'))
(D3 <- as.Date('8/02/2021',format='%m/%d/%Y'))
(D4 <- as.Date('8/25/2021',format='%m/%d/%Y'))
```
The following block of code should be executed five times (one per each *DXfiles* & each *DX* variable).
The code reads each image included in a given *DXfiles*, computes several vegetation indices (VIs) for such image, and obtain the average value of original bands & VIs per each. All values are stored in a *vector* object.
```{r}
# Read all files for each date
lista <- lapply(D0files, function(x) rast(x))
# Compute global statistics
output <- vector("double", 5) # 1. output
for (i in seq_along(lista)) { # 2. sequence
img <- lista[[i]]
r <- clamp(img, 0, 1)
ndvi <- (r[[5]]-r[[3]])/(r[[5]]+r[[3]])
ndre = (r[[5]]-r[[4]]) / (r[[5]]+r[[4]])
cire = (r[[5]]) / (r[[4]]-1)
sipi = (r[[5]] - r[[1]]) / (r[[4]]-r[[3]])
min <- minmax(ndvi)[1]
max <- minmax(ndvi)[2]
avg <- global(ndvi, fun="mean", na.rm=TRUE)[,1]
#dev <- global(ndvi, fun="std")
#mndvi <- (ndvi-min)/(max-min)
mndvi <- 2 + log((ndvi-avg)/(max-min))
nimg <- c(r,ndvi, ndre, cire, sipi, mndvi)
output[i] <- global(nimg, fun="mean", na.rm=TRUE)
#output[i] <- global(nimg, mean, na.rm=TRUE)
}
```
The following block of code combine all vectors produced previously into a matrix as a previous step to put all data into a dataframe with meaningful colummn names. It also add colummns storing each plot ID as well as the corresponding data. Note the the code also needs to be executed five times (one per each *DX* variable).
```{r}
#combine all vectors into a matrix
mat <- do.call("rbind",output) #combine all vectors into a matrix
#convert matrix to data frame
df <- as.data.frame(mat)
#specify column names
colnames(df) <- c('blue', 'green', 'red', 'redge', 'nir', 'ndvi',
'ndre', 'cire', 'sipi', 'mndvi')
date = D0
plot <- seq(1:40)
date <- rep(date,40)
newdf <- cbind(df,plot,date)
```
What we got?
```{r}
newdf
```