08-model-extension-SAS.Rmd

# Other Models - SAS {#model-extension-sas}

```{r include=FALSE}
require(SASmarkdown)
saspath <- "C:/Program Files/SASHome/SASFoundation/9.4/sas.exe"
sasopts <- "-nosplash -ls 200"
knitr::opts_chunk$set(engine.path=list(sas=saspath, saslog=saspath),
                  engine.opts=list(sas=sasopts, saslog=sasopts), 
                  comment=NA)
```

## Other Experimental and Treatment Designs 

Spatial models can be extended to fit other experimental designs such as CRD, Lattice, and split plot or treatment designs such as factorials. 

Below are minimal examples that omit several steps conducted in section \@ref(rcbd-sas) (e.g. fitting the empirical variogram) for brevity. Also, although spatial variance is incorporated into each example, we have not made an effort to ensure that each is the best fitting model for the data. The examples are intended to illustrate the correct syntax rather than the complete process of proper model fitting. 

Unless indicated otherwise, these data are adapted from the R package agridat. Specific links to csv data files are given in the examples.

### Completely randomized design (CRD) 

Data: [cochran_crd.csv](https://raw.githubusercontent.com/IdahoAgStats/guide-to-field-trial-spatial-analysis/master/data/cochran_crd.csv)<br><br>

This study investigated the effect of sulfur on controlling scab disease in potatoes and had seven treatments (trt) measuring the percent surface area infected (inf). Row and column indices are also given. The design was a completely randomized design  with 8 control replications  and 4 treatment replications.

Two possible methods of adjustment are shown: row column trend adjustment (as suggested in agridat), and a spline adjustment.<br><br>

```{r SAS8_1, engine="sashtml", collectcode=TRUE, engine.path=saspath, include=TRUE, engine.opts=sasopts, comment=""}

ods html close;
proc format;
invalue has_NA
 'NA' = .;
run;

filename CRD url "https://raw.githubusercontent.com/IdahoAgStats/guide-to-field-trial-spatial-analysis/master/data/cochran_crd.csv";

data CRD;
    infile CRD firstobs=2 delimiter=',';
    input inf trt$ row col;
  	informat inf has_NA.;
    if inf=. then delete;
run;

proc mixed data=crd;
	class trt;
	model inf=trt row col;
run;

proc glimmix data=crd;
	class trt;
	effect sp_r = spline(row col);
	model inf=trt sp_r;
	random row col/type=rsmooth;
run;
 
ods html; 
```

### Multi-way Factorials 

Data: [chinloy_fractionalfactorial.csv](https://raw.githubusercontent.com/IdahoAgStats/guide-to-field-trial-spatial-analysis/master/data/chinloy_fractionalfactorial.csv)<br><br>

Factorial experiments consider multiple treatments and their combinations. The study here evaluated the effect of 5 different fertilizer treatments (nitrogen, phosphorus, potassium, bagasse, and a filter mud press), each at various concentrations, on sugarcane yield. The levels of {0,1,2} indicate the relative concentrations of each fertilizer. Because all possible treatment combinations were too numerous, only a subset were used (fractional factorial) and only 2-way interactions with nitrogen are evaluated in the model. The treatments were laid out in a randomized complete block design (block).

A Gaussian spatial model is used below.<br><br>

```{r SAS8_2, engine="sashtml", collectcode=TRUE, engine.path=saspath, include=TRUE, engine.opts=sasopts, comment=""}
ods html close;

proc format;
invalue has_NA
 'NA' = .;
run;

filename FACT url "https://raw.githubusercontent.com/IdahoAgStats/guide-to-field-trial-spatial-analysis/master/data/chinloy_fractionalfactorial.csv";

data Factorial;
    infile FACT firstobs=2 delimiter=',';
    input yield block$ row col trt N P K B F;
  	informat yield has_NA.;
    if yield=. then delete;
run;
 
proc mixed data=Factorial maxiter=150;
	class block N P K B F;
	model yield = N P K B F N*P N*K N*B N*F/ ddfm=kr;
	random block;
	repeated/subject=intercept type=sp(gau) (row col) local;
	parms (6) (.25) (.21) (.1);
run;

ods html;
```

### Alpha lattice

The *burgueno.alpha* data set is set up as an incomplete block alpha design with 16 treatment levels (gen). There are 12 blocks, 2 each within 3 reps. The code below illustrates the layout of the reps and blocks. Two examples of spatial adjustment are shown: 1) assuming a Gaussian spatial adjustment, and 2) utilizing a spline adjustment.<br><br>

```{r SAS8_3, engine="sashtml", collectcode=TRUE, engine.path=saspath, include=TRUE, engine.opts=sasopts, comment=""}
proc format;
invalue has_NA
 'NA' = .;
run;

filename ALPHA url "https://raw.githubusercontent.com/IdahoAgStats/guide-to-field-trial-spatial-analysis/master/data/burgueno_alpha.csv";

data Lattice;
    infile ALPHA firstobs=2 delimiter=',';
    input rep$ block$ row col gen$ yield;
  	informat yield has_NA.;
    if yield=. then delete;
run;

proc sgplot data=Lattice;
	styleattrs datacolors=(cx990F26 cx99600F cx54990F);
	HEATMAPPARM y=row x=col COLORgroup=rep/ outline; 
	refline 2.5 4.5/axis=y LINEATTRS=(color=black thickness=4) ;
title1 'Lattice: Layout of Reps';

run; 

proc sgplot data=Lattice;
	styleattrs datacolors=(cx990F26 cxCC7A88 cxB33E52 cxE6B8BF
						   cx99600F cxCCAA7A cxB3823E cxE6D2B8 
						   cx54990F cxA3CC7A cx78B33E cxCFE6B8);
	HEATMAPPARM y=row x=col COLORgroup=block/ outline; 
	refline 2.5 4.5/axis=y LINEATTRS=(color=black thickness=4) ;
	refline 4.5/axis=x LINEATTRS=(color=black thickness=4);
title1 'Lattice: Layout of Blocks';
run;
```
 
```{r SAS8_3_1, engine="sashtml", collectcode=TRUE, engine.path=saspath, include=TRUE, engine.opts=sasopts, comment=""}
ods html close;

proc mixed data=Lattice ;
	class block rep gen;
	model yield = gen/ ddfm=kr;
	random rep block(rep);
	repeated/subject=intercept type=sp(gau) (row col) local;
	parms (4)(30711) (57790)(86861) (133229);
run;

proc glimmix data=Lattice ;
	class block rep gen;
	effect sp_r = spline(row col);
	model yield = gen sp_r/ ddfm=kr;
	random row col/type=rsmooth;
run;

ods html;
```


### Latin square

Latin is a special example of a lattice experiment where each treatment occurs once in each row and in each column. As a result, the row and column effects are used to model spatial effects intrinsically. 

The *cochran.latin** data set examines the effect of an 6 "operators" (persons) on the difference between the true plot height and operator-measured shoot height from 6 wheat plots. Each person measured the plots in a different order according to a Latin Square design. While this example is not strictly spatial in nature, it illustrates the setup for analysis.<br><br>

```{r SAS8_4, engine="sashtml", collectcode=TRUE, engine.path=saspath, include=TRUE, engine.opts=sasopts, comment=""}
ods html close;

proc format;
invalue has_NA
 'NA' = .;
run;

filename LAT url "https://raw.githubusercontent.com/IdahoAgStats/guide-to-field-trial-spatial-analysis/master/data/cochran_latin.csv";

data Latin;
    infile LAT firstobs=2 delimiter=',';
    input row col operator$ diff;
  	informat diff has_NA.;
    if diff=. then delete;
run;

proc mixed data=latin;
	class row col operator;
	model diff = operator;
	random row col;
run;

ods html;
```

### Split plot

A split plot is a factorial treatment design with a restriction on the randomization of the factors. In this example, the *durban.splitplot* data looks at the effect of two factors: fungicide and barley varieties. The study is set up with 4 blocks. Within each block, 2 fungicides are randomized as whole or main plots. Within each fungicide treatment, 70 barley varieties are then randomized separately as subplots. The resulting analysis then breaks out separate error terms for the whole plots and the subplots. The plots are arranged into 10 rows x 56 columns ('beds'). Spatial adjustment can then be introduced on top of this structure, if needed. The examples below illustrate a spherical spatial model adjustment and a spline adjustment.<br><br>

```{r SAS8_5, engine="sashtml", collectcode=TRUE, engine.path=saspath, include=TRUE, engine.opts=sasopts, comment=""}
ods html close;

proc format;
invalue has_NA
 'NA' = .;
run;

filename SPLIT url "https://raw.githubusercontent.com/IdahoAgStats/guide-to-field-trial-spatial-analysis/master/data/durban_splitplot.csv";

data splitplot;
    infile SPLIT firstobs=2 delimiter=',';
    input yield block$ gen$ fung$ row bed;
  	informat yield has_NA.;
    if yield=. then delete;
run;

proc mixed data=splitplot;
	class block gen fung row bed;
	model yield = fung gen fung*gen/outp=residuals;
	random block block*fung;
	repeated/subject=intercept type=sp(sph) (row bed) local;
	parms (7)(0.03)(0.03)(0.03) (0.01);
run;

proc glimmix data=splitplot;
	class block gen fung;
	effect sp_r = spline(row bed);
	model yield = fung gen fung*gen sp_r;
	random block block*fung;
	random row bed/type=rsmooth;
run;

ods html;
```


### Split-split plot

Like the Split-Plot above, the Split-Split-Plot is a factorial design, this time with an additional restriction on the randomization. The *archbold.apple* data used here describes the yield of apple trees under the impact of tree spacing (the whole or main plot), tree root stock (the split plot), and tree variety (the split-split plot). That is, 3 tree spacings are randomized in each block. Within those spacings, 4 root stocks are randomized separately, and then within each of those 2 varieties ('gen') are randomized. There are 5 blocks ('rep') and separate error terms for spacing, root stock, and variety are broken out in the analysis. In addition, we have row and column ('pos') information for the plots. This example uses a spline spatial adjustment.<br><br>

```{r SAS8_6, engine="sashtml", collectcode=TRUE, engine.path=saspath, include=TRUE, engine.opts=sasopts, comment=""}
ods html close;

proc format;
invalue has_NA
 'NA' = .;
run;

filename SPLIT url "https://raw.githubusercontent.com/IdahoAgStats/guide-to-field-trial-spatial-analysis/master/data/archbold_apple.csv";

data sp_sp_plot;
    infile SPLIT firstobs=2 delimiter=',';
    input rep$ row pos spacing$ stock$ gen$ yield trt;
  	informat yield has_NA.;
    if yield=. then delete;
run;

proc glimmix data=sp_sp_plot;
	class rep spacing stock gen;
	effect sp_r = spline(row pos);
	model yield = spacing stock spacing*stock gen gen*spacing gen*stock gen*spacing*stock sp_r;
	random rep rep*spacing rep*stock*spacing;
	random row pos/type=rsmooth;
run;
 
 ods html;
```

### Split block or Strip plot

This study (*little.splitblock*) investigated the effects of nitrogen rates (4 levels) and harvest dates (5 dates) on sugarbeet yields using 4 blocks. Similar to a Latin square, the split block design has two main factors defined in rows and columns. Within each block the rows and columns, each representing one of the factors, are independently randomized. The effect of this in the analysis of variance is to break out separate error terms and DF for each main effect as well as their interaction. In this example the spline spatial adjustment is demonstrated.

```{r SAS8_7, engine="sashtml", collectcode=TRUE, engine.path=saspath, include=TRUE, engine.opts=sasopts, comment=""}
ods html close;

proc format;
invalue has_NA
 'NA' = .;
run;

filename SPLITB url "https://raw.githubusercontent.com/IdahoAgStats/guide-to-field-trial-spatial-analysis/master/data/little_splitblock.csv";

data sb;
    infile SPLITB firstobs=2 delimiter=',';
    input row col yield harvest nitro block$;
  	informat yield has_NA.;
    if yield=. then delete;
run;

proc glimmix data=sb;
	class harvest nitro block;
	effect sp_r = spline(row col);
	model yield = harvest nitro harvest*nitro sp_r;
	random block harvest*block nitro*block harvest*nitro*block;
	random row col/type=rsmooth;
run;
 
ods html;
```


### Augmented design

The augmented experimental design occurs most commonly, if not exclusively, in plant breeding studies. It can be useful when the number of treatments is very large and the primary goal of the study is to rank or select genotypes that perform to a specified level. The number of treatments and/or limited materials often preclude complete replication of all treatments. To adjust for this, only a select set of genotypes, usually of known performance, are replicated in the design. The error estimated from these select genotypes is then utilized in the analysis to evaluate the remaining genotypes. @Burgueno2018 have a very good discussion of this design and analysis of it. The example below uses their 'P2' model and the reader is referred to @Burgueno2018 for more details.

The data used here refer to a wheat genotype evaluation study carried out near Lind Washington. The study looked at 922 lines ('name'), of which, 8 were replicated known varieties. The P2 model, mentioned above, compares the averages of replicated lines to unreplicated lines. The data steps and procedures used below define these groups in an indicator variable named d2. The response variable, 'yieldg', is converted to kilograms ('yieldkg') to facilitate computations and avoid numeric overflow errors due to large values. <br><br>


```{r SAS8_8, engine="sashtml", collectcode=TRUE, engine.path=saspath, include=TRUE, engine.opts=sasopts, comment=""}
filename AUG url "https://raw.githubusercontent.com/IdahoAgStats/guide-to-field-trial-spatial-analysis/master/data/augmented_lind.csv";

PROC IMPORT OUT= WORK.augmented
     DATAFILE= AUG
     DBMS=CSV REPLACE;
     GETNAMES=YES;
     DATAROW=2; 
RUN;

data augmented;
	set augmented;
	if yieldg = 999999 or yieldg=. then delete; /* Remove missing values */
	prow=prow*11.7; /*convert row and column indices to feet */
	pcol=pcol*5.5;
run;

proc freq noprint data=augmented;
	tables name/out=controls;
run;

data controls;
	set controls;
	if count >1;
run;

proc sort data=controls;
	by name;
run;
     
proc sort data=augmented;
	by name;
run;

data augmented;
	merge augmented controls;
	by name;
	if count=. then d2=2; /* Unreplicated */
	else d2=1;            /* Replicated */
	yieldkg=yieldg/1000;
run;

```

Following the steps described earlier, we first fit a base model, yieldkg = d2, to obtain residuals. We cannot use a base model of `yieldkg = name` because all the unreplicated lines will produce residual values of 0. <br><br>

The residuals are then plotted in a heat map. This map shows evidence of spatial patterns and variability across the field. <br><br>

```{r SAS8_9, engine="sashtml", collectcode=TRUE, engine.path=saspath, include=TRUE, engine.opts=sasopts, comment=""}
PROC mixed data=augmented;
	class name d2;
	model yieldkg = d2/noint outp=residuals ddf=229 229;
	lsmeans d2;
	*lsmeans name(d2)/slice = d2;
run;

proc sgplot data=residuals;
	HEATMAPPARM y=pRow x=pCol COLORRESPONSE=resid/ colormodel=(cx014458 cx1E8C6E cxE1FE01); 
title1 'Field Map';
run;
```

The  spatial variability evident in the residuals is then modeled as before. Only one model, power, is applicable to this data.<br><br>

```{r SAS8_10, engine="sashtml", collectcode=TRUE, engine.path=saspath, include=TRUE, engine.opts=sasopts, comment=""}
/*
proc variogram data=residuals plots=pairs(thr=50) ;
   compute novariogram nhclasses=60;
   coordinates xc=prow yc=pcol;
   var resid;
run;

proc variogram data=residuals plots(only)=(semivar);
   compute lagd=6.6 maxlags=20;
   coordinates xc=prow yc=pcol;
   var resid;
run;

*/
proc variogram data=residuals plots(only)=(fitplot);
   where yieldkg ^= .;
   coordinates xc=pcol yc=pRow;
   compute lagd=6.6 maxlags=25;
   model form=auto(mlist=(gau, exp, pow, sph) nest=1);
  var resid;
run;
```

Using the variogram model parameters estimated above, an adjusted model is then fit to the P2 hypothesis of @Burgueno2018. The residuals from this model show little, if any, spatial variability remains after adjustment. 

*Note: the lsmeans statement is commented out in this example to avoid a large amount of output.*<br><br>

```{r SAS8_11, engine="sashtml", collectcode=TRUE, engine.path=saspath, include=TRUE, engine.opts=sasopts, comment=""}
PROC mixed data=augmented;
	class name d2;
	model yieldkg = d2 name(d2)/outp=adjresiduals ddf=229 229;
	lsmeans d2;
	repeated/subject=intercept type=sp(pow)(prow pcol) local;
	ods output SolutionR =parms;
	parms (0.074) (0.0051)(0.475)  ;
	*lsmeans name(d2)/slice = d2;
run;

proc sgplot data=adjresiduals;
	HEATMAPPARM y=pRow x=pCol COLORRESPONSE=resid/ colormodel=(cx014458 cx1E8C6E cxE1FE01); 
title1 'Field Map';
run;
```