-
Notifications
You must be signed in to change notification settings - Fork 1
/
Copy path08-model-extension-SAS.Rmd
410 lines (308 loc) · 16 KB
/
08-model-extension-SAS.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
# Other Models - SAS {#model-extension-sas}
```{r include=FALSE}
require(SASmarkdown)
saspath <- "C:/Program Files/SASHome/SASFoundation/9.4/sas.exe"
sasopts <- "-nosplash -ls 200"
knitr::opts_chunk$set(engine.path=list(sas=saspath, saslog=saspath),
engine.opts=list(sas=sasopts, saslog=sasopts),
comment=NA)
```
## Other Experimental and Treatment Designs
Spatial models can be extended to fit other experimental designs such as CRD, Lattice, and split plot or treatment designs such as factorials.
Below are minimal examples that omit several steps conducted in section \@ref(rcbd-sas) (e.g. fitting the empirical variogram) for brevity. Also, although spatial variance is incorporated into each example, we have not made an effort to ensure that each is the best fitting model for the data. The examples are intended to illustrate the correct syntax rather than the complete process of proper model fitting.
Unless indicated otherwise, these data are adapted from the R package agridat. Specific links to csv data files are given in the examples.
### Completely randomized design (CRD)
Data: [cochran_crd.csv](https://raw.githubusercontent.com/IdahoAgStats/guide-to-field-trial-spatial-analysis/master/data/cochran_crd.csv)<br><br>
This study investigated the effect of sulfur on controlling scab disease in potatoes and had seven treatments (trt) measuring the percent surface area infected (inf). Row and column indices are also given. The design was a completely randomized design with 8 control replications and 4 treatment replications.
Two possible methods of adjustment are shown: row column trend adjustment (as suggested in agridat), and a spline adjustment.<br><br>
```{r SAS8_1, engine="sashtml", collectcode=TRUE, engine.path=saspath, include=TRUE, engine.opts=sasopts, comment=""}
ods html close;
proc format;
invalue has_NA
'NA' = .;
run;
filename CRD url "https://raw.githubusercontent.com/IdahoAgStats/guide-to-field-trial-spatial-analysis/master/data/cochran_crd.csv";
data CRD;
infile CRD firstobs=2 delimiter=',';
input inf trt$ row col;
informat inf has_NA.;
if inf=. then delete;
run;
proc mixed data=crd;
class trt;
model inf=trt row col;
run;
proc glimmix data=crd;
class trt;
effect sp_r = spline(row col);
model inf=trt sp_r;
random row col/type=rsmooth;
run;
ods html;
```
### Multi-way Factorials
Data: [chinloy_fractionalfactorial.csv](https://raw.githubusercontent.com/IdahoAgStats/guide-to-field-trial-spatial-analysis/master/data/chinloy_fractionalfactorial.csv)<br><br>
Factorial experiments consider multiple treatments and their combinations. The study here evaluated the effect of 5 different fertilizer treatments (nitrogen, phosphorus, potassium, bagasse, and a filter mud press), each at various concentrations, on sugarcane yield. The levels of {0,1,2} indicate the relative concentrations of each fertilizer. Because all possible treatment combinations were too numerous, only a subset were used (fractional factorial) and only 2-way interactions with nitrogen are evaluated in the model. The treatments were laid out in a randomized complete block design (block).
A Gaussian spatial model is used below.<br><br>
```{r SAS8_2, engine="sashtml", collectcode=TRUE, engine.path=saspath, include=TRUE, engine.opts=sasopts, comment=""}
ods html close;
proc format;
invalue has_NA
'NA' = .;
run;
filename FACT url "https://raw.githubusercontent.com/IdahoAgStats/guide-to-field-trial-spatial-analysis/master/data/chinloy_fractionalfactorial.csv";
data Factorial;
infile FACT firstobs=2 delimiter=',';
input yield block$ row col trt N P K B F;
informat yield has_NA.;
if yield=. then delete;
run;
proc mixed data=Factorial maxiter=150;
class block N P K B F;
model yield = N P K B F N*P N*K N*B N*F/ ddfm=kr;
random block;
repeated/subject=intercept type=sp(gau) (row col) local;
parms (6) (.25) (.21) (.1);
run;
ods html;
```
### Alpha lattice
The *burgueno.alpha* data set is set up as an incomplete block alpha design with 16 treatment levels (gen). There are 12 blocks, 2 each within 3 reps. The code below illustrates the layout of the reps and blocks. Two examples of spatial adjustment are shown: 1) assuming a Gaussian spatial adjustment, and 2) utilizing a spline adjustment.<br><br>
```{r SAS8_3, engine="sashtml", collectcode=TRUE, engine.path=saspath, include=TRUE, engine.opts=sasopts, comment=""}
proc format;
invalue has_NA
'NA' = .;
run;
filename ALPHA url "https://raw.githubusercontent.com/IdahoAgStats/guide-to-field-trial-spatial-analysis/master/data/burgueno_alpha.csv";
data Lattice;
infile ALPHA firstobs=2 delimiter=',';
input rep$ block$ row col gen$ yield;
informat yield has_NA.;
if yield=. then delete;
run;
proc sgplot data=Lattice;
styleattrs datacolors=(cx990F26 cx99600F cx54990F);
HEATMAPPARM y=row x=col COLORgroup=rep/ outline;
refline 2.5 4.5/axis=y LINEATTRS=(color=black thickness=4) ;
title1 'Lattice: Layout of Reps';
run;
proc sgplot data=Lattice;
styleattrs datacolors=(cx990F26 cxCC7A88 cxB33E52 cxE6B8BF
cx99600F cxCCAA7A cxB3823E cxE6D2B8
cx54990F cxA3CC7A cx78B33E cxCFE6B8);
HEATMAPPARM y=row x=col COLORgroup=block/ outline;
refline 2.5 4.5/axis=y LINEATTRS=(color=black thickness=4) ;
refline 4.5/axis=x LINEATTRS=(color=black thickness=4);
title1 'Lattice: Layout of Blocks';
run;
```
```{r SAS8_3_1, engine="sashtml", collectcode=TRUE, engine.path=saspath, include=TRUE, engine.opts=sasopts, comment=""}
ods html close;
proc mixed data=Lattice ;
class block rep gen;
model yield = gen/ ddfm=kr;
random rep block(rep);
repeated/subject=intercept type=sp(gau) (row col) local;
parms (4)(30711) (57790)(86861) (133229);
run;
proc glimmix data=Lattice ;
class block rep gen;
effect sp_r = spline(row col);
model yield = gen sp_r/ ddfm=kr;
random row col/type=rsmooth;
run;
ods html;
```
### Latin square
Latin is a special example of a lattice experiment where each treatment occurs once in each row and in each column. As a result, the row and column effects are used to model spatial effects intrinsically.
The *cochran.latin** data set examines the effect of an 6 "operators" (persons) on the difference between the true plot height and operator-measured shoot height from 6 wheat plots. Each person measured the plots in a different order according to a Latin Square design. While this example is not strictly spatial in nature, it illustrates the setup for analysis.<br><br>
```{r SAS8_4, engine="sashtml", collectcode=TRUE, engine.path=saspath, include=TRUE, engine.opts=sasopts, comment=""}
ods html close;
proc format;
invalue has_NA
'NA' = .;
run;
filename LAT url "https://raw.githubusercontent.com/IdahoAgStats/guide-to-field-trial-spatial-analysis/master/data/cochran_latin.csv";
data Latin;
infile LAT firstobs=2 delimiter=',';
input row col operator$ diff;
informat diff has_NA.;
if diff=. then delete;
run;
proc mixed data=latin;
class row col operator;
model diff = operator;
random row col;
run;
ods html;
```
### Split plot
A split plot is a factorial treatment design with a restriction on the randomization of the factors. In this example, the *durban.splitplot* data looks at the effect of two factors: fungicide and barley varieties. The study is set up with 4 blocks. Within each block, 2 fungicides are randomized as whole or main plots. Within each fungicide treatment, 70 barley varieties are then randomized separately as subplots. The resulting analysis then breaks out separate error terms for the whole plots and the subplots. The plots are arranged into 10 rows x 56 columns ('beds'). Spatial adjustment can then be introduced on top of this structure, if needed. The examples below illustrate a spherical spatial model adjustment and a spline adjustment.<br><br>
```{r SAS8_5, engine="sashtml", collectcode=TRUE, engine.path=saspath, include=TRUE, engine.opts=sasopts, comment=""}
ods html close;
proc format;
invalue has_NA
'NA' = .;
run;
filename SPLIT url "https://raw.githubusercontent.com/IdahoAgStats/guide-to-field-trial-spatial-analysis/master/data/durban_splitplot.csv";
data splitplot;
infile SPLIT firstobs=2 delimiter=',';
input yield block$ gen$ fung$ row bed;
informat yield has_NA.;
if yield=. then delete;
run;
proc mixed data=splitplot;
class block gen fung row bed;
model yield = fung gen fung*gen/outp=residuals;
random block block*fung;
repeated/subject=intercept type=sp(sph) (row bed) local;
parms (7)(0.03)(0.03)(0.03) (0.01);
run;
proc glimmix data=splitplot;
class block gen fung;
effect sp_r = spline(row bed);
model yield = fung gen fung*gen sp_r;
random block block*fung;
random row bed/type=rsmooth;
run;
ods html;
```
### Split-split plot
Like the Split-Plot above, the Split-Split-Plot is a factorial design, this time with an additional restriction on the randomization. The *archbold.apple* data used here describes the yield of apple trees under the impact of tree spacing (the whole or main plot), tree root stock (the split plot), and tree variety (the split-split plot). That is, 3 tree spacings are randomized in each block. Within those spacings, 4 root stocks are randomized separately, and then within each of those 2 varieties ('gen') are randomized. There are 5 blocks ('rep') and separate error terms for spacing, root stock, and variety are broken out in the analysis. In addition, we have row and column ('pos') information for the plots. This example uses a spline spatial adjustment.<br><br>
```{r SAS8_6, engine="sashtml", collectcode=TRUE, engine.path=saspath, include=TRUE, engine.opts=sasopts, comment=""}
ods html close;
proc format;
invalue has_NA
'NA' = .;
run;
filename SPLIT url "https://raw.githubusercontent.com/IdahoAgStats/guide-to-field-trial-spatial-analysis/master/data/archbold_apple.csv";
data sp_sp_plot;
infile SPLIT firstobs=2 delimiter=',';
input rep$ row pos spacing$ stock$ gen$ yield trt;
informat yield has_NA.;
if yield=. then delete;
run;
proc glimmix data=sp_sp_plot;
class rep spacing stock gen;
effect sp_r = spline(row pos);
model yield = spacing stock spacing*stock gen gen*spacing gen*stock gen*spacing*stock sp_r;
random rep rep*spacing rep*stock*spacing;
random row pos/type=rsmooth;
run;
ods html;
```
### Split block or Strip plot
This study (*little.splitblock*) investigated the effects of nitrogen rates (4 levels) and harvest dates (5 dates) on sugarbeet yields using 4 blocks. Similar to a Latin square, the split block design has two main factors defined in rows and columns. Within each block the rows and columns, each representing one of the factors, are independently randomized. The effect of this in the analysis of variance is to break out separate error terms and DF for each main effect as well as their interaction. In this example the spline spatial adjustment is demonstrated.
```{r SAS8_7, engine="sashtml", collectcode=TRUE, engine.path=saspath, include=TRUE, engine.opts=sasopts, comment=""}
ods html close;
proc format;
invalue has_NA
'NA' = .;
run;
filename SPLITB url "https://raw.githubusercontent.com/IdahoAgStats/guide-to-field-trial-spatial-analysis/master/data/little_splitblock.csv";
data sb;
infile SPLITB firstobs=2 delimiter=',';
input row col yield harvest nitro block$;
informat yield has_NA.;
if yield=. then delete;
run;
proc glimmix data=sb;
class harvest nitro block;
effect sp_r = spline(row col);
model yield = harvest nitro harvest*nitro sp_r;
random block harvest*block nitro*block harvest*nitro*block;
random row col/type=rsmooth;
run;
ods html;
```
### Augmented design
The augmented experimental design occurs most commonly, if not exclusively, in plant breeding studies. It can be useful when the number of treatments is very large and the primary goal of the study is to rank or select genotypes that perform to a specified level. The number of treatments and/or limited materials often preclude complete replication of all treatments. To adjust for this, only a select set of genotypes, usually of known performance, are replicated in the design. The error estimated from these select genotypes is then utilized in the analysis to evaluate the remaining genotypes. @Burgueno2018 have a very good discussion of this design and analysis of it. The example below uses their 'P2' model and the reader is referred to @Burgueno2018 for more details.
The data used here refer to a wheat genotype evaluation study carried out near Lind Washington. The study looked at 922 lines ('name'), of which, 8 were replicated known varieties. The P2 model, mentioned above, compares the averages of replicated lines to unreplicated lines. The data steps and procedures used below define these groups in an indicator variable named d2. The response variable, 'yieldg', is converted to kilograms ('yieldkg') to facilitate computations and avoid numeric overflow errors due to large values. <br><br>
```{r SAS8_8, engine="sashtml", collectcode=TRUE, engine.path=saspath, include=TRUE, engine.opts=sasopts, comment=""}
filename AUG url "https://raw.githubusercontent.com/IdahoAgStats/guide-to-field-trial-spatial-analysis/master/data/augmented_lind.csv";
PROC IMPORT OUT= WORK.augmented
DATAFILE= AUG
DBMS=CSV REPLACE;
GETNAMES=YES;
DATAROW=2;
RUN;
data augmented;
set augmented;
if yieldg = 999999 or yieldg=. then delete; /* Remove missing values */
prow=prow*11.7; /*convert row and column indices to feet */
pcol=pcol*5.5;
run;
proc freq noprint data=augmented;
tables name/out=controls;
run;
data controls;
set controls;
if count >1;
run;
proc sort data=controls;
by name;
run;
proc sort data=augmented;
by name;
run;
data augmented;
merge augmented controls;
by name;
if count=. then d2=2; /* Unreplicated */
else d2=1; /* Replicated */
yieldkg=yieldg/1000;
run;
```
Following the steps described earlier, we first fit a base model, yieldkg = d2, to obtain residuals. We cannot use a base model of `yieldkg = name` because all the unreplicated lines will produce residual values of 0. <br><br>
The residuals are then plotted in a heat map. This map shows evidence of spatial patterns and variability across the field. <br><br>
```{r SAS8_9, engine="sashtml", collectcode=TRUE, engine.path=saspath, include=TRUE, engine.opts=sasopts, comment=""}
PROC mixed data=augmented;
class name d2;
model yieldkg = d2/noint outp=residuals ddf=229 229;
lsmeans d2;
*lsmeans name(d2)/slice = d2;
run;
proc sgplot data=residuals;
HEATMAPPARM y=pRow x=pCol COLORRESPONSE=resid/ colormodel=(cx014458 cx1E8C6E cxE1FE01);
title1 'Field Map';
run;
```
The spatial variability evident in the residuals is then modeled as before. Only one model, power, is applicable to this data.<br><br>
```{r SAS8_10, engine="sashtml", collectcode=TRUE, engine.path=saspath, include=TRUE, engine.opts=sasopts, comment=""}
/*
proc variogram data=residuals plots=pairs(thr=50) ;
compute novariogram nhclasses=60;
coordinates xc=prow yc=pcol;
var resid;
run;
proc variogram data=residuals plots(only)=(semivar);
compute lagd=6.6 maxlags=20;
coordinates xc=prow yc=pcol;
var resid;
run;
*/
proc variogram data=residuals plots(only)=(fitplot);
where yieldkg ^= .;
coordinates xc=pcol yc=pRow;
compute lagd=6.6 maxlags=25;
model form=auto(mlist=(gau, exp, pow, sph) nest=1);
var resid;
run;
```
Using the variogram model parameters estimated above, an adjusted model is then fit to the P2 hypothesis of @Burgueno2018. The residuals from this model show little, if any, spatial variability remains after adjustment.
*Note: the lsmeans statement is commented out in this example to avoid a large amount of output.*<br><br>
```{r SAS8_11, engine="sashtml", collectcode=TRUE, engine.path=saspath, include=TRUE, engine.opts=sasopts, comment=""}
PROC mixed data=augmented;
class name d2;
model yieldkg = d2 name(d2)/outp=adjresiduals ddf=229 229;
lsmeans d2;
repeated/subject=intercept type=sp(pow)(prow pcol) local;
ods output SolutionR =parms;
parms (0.074) (0.0051)(0.475) ;
*lsmeans name(d2)/slice = d2;
run;
proc sgplot data=adjresiduals;
HEATMAPPARM y=pRow x=pCol COLORRESPONSE=resid/ colormodel=(cx014458 cx1E8C6E cxE1FE01);
title1 'Field Map';
run;
```