-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathpoint-pattern-analysis.Rmd
381 lines (283 loc) · 13.9 KB
/
point-pattern-analysis.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
---
title: "Presentation Slides Notebook"
output:
html_document:
df_print: paged
---
# Point Patterns
## Objectives
1. Tignan yung events ay mag exhibit ng systematic pattern as oppose to being randomly distributed. Ito ba ay may clustering or regularity sa isang area kesa random or indenpendent scattering.
2. Tignan yung intensity ng mga point pattern kung nag va vary.
3. Tignan ang presenya ng spatial dependence among the events. Ito ay kung asan yung event B ay present kung meron event A.
4. Gumamit ng models para sa mga na observed na patterns.
## Analysis Approach
1. Patterns in event location are the focus.
2. Events may have attributes which can be used to distinguish types.
3. Stochastic aspect is where events are likely to occur.
4. Does a pattern exhibit clustering or regularity?
5. Over what spatial scales do patterns exist?
### 1st Order
* Sa 1st order effects ay interesado lamang tayo sa **means** and **variances**.
* Sa point pattern walang means kundi **intensity**. Tinitignan nito ang mean number of events per unit area. Ang intensity ay essentially a mean.
* Ang mathetical formula ng lambda(s) ay complicated. Ang intensity ay tinitignan yung mean number of event given a location and if you keep on making the area smaller you derive to the intensity.
* Kung ang mga intensity may maliit lamang na variations, puwede natin masabi na ang point patterns ay **homogeneous**.
### 2nd Order
* Ang 2nd order property ay ang number of events in pairs of areas.
* Ito ay mga spatial dependence kung saan kung may relasyon ang mga points sa isa't isa -- repel or cluster?
* Mahalaga ito makuha para malaman kung CSR, clustering or regularity ang point patterns dahil may kanya kanyang analysis ang bawat isa.
* Ito ay function ng dalawang location -- product of the two events i and j divided by the area of i and j
## Visualization Issues
1. May underlying population distribution from which events arise in a region.
2. If the population varies we would expect events to cluster in areas of high population. May clustering ba on the basis on the population alone?
3. Are they more or less clustered than we would expect on the basis of population alone.
## Exploring Point Patterns
1st order effects
* Quadrat Methods
* Kernel estimation
2nd order effects
* Nearest neighbors distances
* K-function
### Quadrat methods
Quadrat methods, summarize number of events in each quadrat in region R.
To get intesity, divide the quadrat result in the quadrat area.
Disadvantages:
* Loses spatial detail. Just count the number of events in each quadrat but does not determine if the scatter of the events. Sa modelling dapat ma describe pa rin natin ang events in terms of nearness or scatter.
* Converts data to area value.
* Making quadrats smaller generates *higher variation* and to some extremes some quadrats will have no data or 0.
Advantages:
```{r}
library(spatstat)
data(swedishpines)
x<-swedishpines
Q <- quadratcount(x, nx=4,ny=4)
plot(Q)
plot(round(Q/600,4))
Q2 <- quadratcount(x, nx=8,ny=8)
plot(Q2)
```
### Kernel Estimation
Ang Kernel estimation ay parang moving window Quadrat method. Gumagamit ito ng kernel at edge correction. Ang kernel ay ang moving window at posibleng may window kung saan kasali ang exterior ng study region. Ito ang tinatawag na edge correction.
Ang tau ay smoothing parameter.Ginagamit ito sa density mapping para ma visualize yung intensity. Imbes na moving window, ang kernel ay bilog at may radius, at kikuha lahat ng events sa loob.
Kapag malaki ang tau mas smooth, pero pag maliit may spikes on may improve yung intensity.
The contribution of an event can extend in multiple kernels unlike the contribution of event in quadrat in which is only in 1 quadrat.
**Adaptive Kernel Estimation** is one way to correct edge defects. Pwede iba iba ang tau dito kesa sa constant tau.
```{r}
plot(density(x,5),axes=FALSE)
plot(density(x,10),axes=FALSE)
plot(density(x,15),axes=FALSE)
```
### Nearest neighbor distances
Ito ay ginagamit upang mag investigate 2nd oder properties - kung ang pattern ba ay CSR, clustering or may regularity. Ang mga tanong ay, kung ang distribution ng distances. Kung may maraming nearest neighbor na maliit ang distances, pwede masabi na may clustering at kung malalaki naman ang distances ay maaring masabi na may regularity.
* event - event, distance between a random event to its nearest event, it can be a tall tree near a tall tree, both are events, this is actual data
* point - event, distance between a selected point to an event, a point could be empty, given a person location, is there a tall tree? this is representation of randomness,
Given ng dataset, ang W ay nearest neighbor of each event. This is the statistical measure, and will get a collection of metrics.
$$G_{hat}(w)=\frac{w_{no. events} \leq w_{nominated}}{n}$$
Ang G function ay gamit sa event-event (w), at F function ay sa point-event (x)
* Ang early sharpening could indicate clustering since marami na may close Ghat sa lower values ng w(distance). Dominant sa smaller w values.
* Ang late sharpening naman indicated repulsion kasi sa higher values of w(distance) na nagkaroon ng maraming distribution/frequency. Dominant sa high w values.
Dahil sa mga graph, ang CSR ay nag eexhibit ng stair-like juggedness. Or in general dapat sloping upwards.
EDA is exploratory data analysis.
```{r}
Fc <- Fest(cells)
Gc <- Gest(cells)
Kc <- Kest(cells)
Lc <- Lest(cells)
plot(Fc,rs~r,main="Border-corrected Estimate of F")
plot(Fc,theo~r,main="Poisson Estimate of F")
plot(Fc)
plot(Gc,rs~r,main="Border-corrected Estimate of G")
plot(Gc,theo~r,main="Poisson Estimate of G")
plot(Gc)
plot(Kc,border~r,main="Border-corrected Estimate of K")
plot(Kc,theo~r,main="Poisson Estimate of K")
plot(Kc)
plot(Lc,border~r,main="Border-corrected Estimate of L")
plot(Lc,theo~r,main="Poisson Estimate of L")
plot(Lc)
```
What if nearest events are outside study region? and are clustered on one corner? This introduce a correction. Ang edge correction ay nag dedefine ng exterior bound. Yung mga nasa labas ay hindi hahanapan ng nearest neighbor pero yung events sa loob neto makaka contribute sa analysis ng study region.
### K-function
* Ang limitation ng nearest neighbors considers only shortest scales.
* Given a location, ang intensity at relationship with other events are regardless of direction.
* Isotrophy, direction does not matter.
* It compares the expected events to real events.
## Modeling Spatial Point Patterns
### Objectives
* To test a baseline hypothesis
* To construct models to explain observed patterns
* The standard model against which a spatial point pattern is compared is **Complete Spatial Randomness**
### Complete Spatial Randomness (CSR)
Reasons for beginning an analysis with a test for CSR
* Rejection of CSR is prerequisity for attempt to model an observed pattern.
* Test can assist to explore a datase and assist in the formulation of alternatives to CSR.
* CSR operates as dividing hypothesis between regular and clustered patterns.
The hypothesis asserts that:
1. The number of events in the study region follows a Poisson distribution. This implies constant intensity, no 1st order effects.
2. Given *n* events in *A*, the events are an independent ramdom sample from a uniform distribution on A. This implis no spatial interaction.
Remarks:
1. A pattern for which CSR is not rejected merits further formal statistical analysis.
## Test for CSR
### Simple Quadrat Tests for CSR
$$X^2=\frac{(m-1)S^2}{\bar X}$$
* S^2 = sample variance, Xbar = sample mean
* When statistics is large = clustering, small = regularity.
* Sa R, p-values na ang ipina pakita, X^2 is the p-value
$$ICS=\frac{S^2}{\bar X}-1$$
* ICS < 0, suggestion of regularity,
* ICS > 0, clustering
* ICS = 0, CSR
* Disadvatge: unequal quadrat size affect analysis
```{r}
data("swedishpines")
M <- quadrat.test(swedishpines,nx=3,ny=4)
M
plot(M)
plot(swedishpines)
plot(M, add=T, cex=1)
```
### Nearest Neighbor Tests for CSR
```{r}
## without edge correction ##
clarkevans.test(swedishpines)
## with guard area (edge correction) ##
clip1<-owin(c(20,50),c(20,50))
clarkevans.test(swedishpines,correction="guard",clipregion=clip1)
clip2<-owin(c(20,80),c(20,80))
clarkevans.test(swedishpines,correction="guard",clipregion=clip2)
```
### K-function Tests for CSR
## Analysis of Multiple Types of Events
* A multivariate spatial point process where event have classification types.
* The univariate are referred as components of multivariate process.
Analysis Objectives
* Detection of relationship in the pattern of one type from another.
* Identify indepence of types of events as opposed to attraction or repulsion.
* One event may attract or repel another event. CSR cannot be connected with multivariate, but for individual process.
* Hypothesizes a positive or negative dependence among the event types.
* Independence does not imply that any of the component need to be CSR.
## How to analyze multivariate data
### Quadrat Count Analysis
This works by using cross tabulations. Kada quarant ay may count ng present at absent ng types. Compare to 3.84, kung malaki ay wala silang reason to be independent.
```{r}
data("lansing")
plot(lansing,main="Lansing Woods")
plot(split(lansing))
plot(density(split(lansing)),ribbon=F)
plot(Kcross(lansing,"redoak","whiteoak"))
plot(Kdot(lansing,"hickory"))
data(amacrine)
plot(amacrine,main="Rabbit Amacrine Cells")
Env <- envelope(amacrine,Kcross,nsim=99,i="on",j="off")
plot(Env)
```
### Quadrat Count Analysis
* G(h) Given event i, whats the nearest even j, the probability of getting another type of event?
* F(h) given a point, nearest event of j is less that or equal to h
* For an independent patter: G(h) = F(h), the nearest distances from events of j to randomly selected event of point must be the same.
### Bivariate or Coss K-Function
* Under assumed independence, type i events should be random with respect to type j events.
## Space-Time Interaction
Given events that are close in time (nangyari), are there events in space closer than would be expected, mas sabog ba earlier? Observe where true events happen as time goes by. Spatiotemporal.
Example: Onset of disease on days.
* The CSTR (Complete Spatiotemporal Randomness) is the absence of structure in time as well in space.
* Space-time clustering is the alternative to CSTR
* Space time clustering is said to exist if, among those events that are closer in time, there are events that are closer in space.
### Knox test for space-time interaction
* Quantifies based on critical space and time distances.
* Uses s and tau, for critical space and time distances respectively
$$X=\sum_{i=1} ^N \sum_{j=1} ^{i-1}s_{ij}t_{ij}$$
### Mantel test for space-time interaction
* This avoids problem of determining critical distances, rather get the average space and time distances
* Mantel is more popular than Knox. It also implies correlation.
```{r}
Input =("
Onset Lat Lon
72 34.13583 117.9236
59 34.17611 118.3153
41 33.82361 118.1875
72 34.19944 118.5347
61 34.06694 117.7514
32 33.92917 118.2097
52 34.01500 118.0597
47 34.06722 118.2264
65 34.08333 118.1069
75 34.38750 118.5347
")
MantelData = read.table(textConnection(Input),header=TRUE)
space.dists <- dist(cbind(MantelData$Lon, MantelData$Lat))
time.dists <- dist(MantelData$Onset)
as.matrix(space.dists)
as.matrix(time.dists)
install.packages("ade4")
library(ade4)
mantel.rtest(space.dists, time.dists, nrepet = 9999)
```
### K-function for Space-time interaction
* If there is space time interaction, D(h,t) will be large
## Clustering around a Point Source
### Neyman-Scott process
```{r}
library(spatstat)
data("redwood")
x <- redwood
plot(x, main = "Strauss-Ripley Redwood Saplings Dot Map")
summary(x)
qa <- quadratcount(x, 25,25)
plot(intensity(qa, image = TRUE, main = "25x25 Quadrat Map of Redwood Saplings"))
#Viasualization of Intensity using Kernel Estimation
plot(density(x, 0.05), axes = TRUE, main = "Bandwith = 0.05")
plot(x, add = TRUE)
contour(density(x, 0.05), axes = FALSE, add = TRUE)
plot(density(x, 0.10), axes = TRUE, main = "Bandwidth = 0.10")
plot(x, add = TRUE)
contour(density(x, 0.10), axes = FALSE, add = TRUE)
#Test for CSR
#Quadrat Test
QT <- quadrat.test(x,3) #two-sided test on a 3x3 grid
QT
plot(QT, main = "Quadrat Test on a 3X3 grid")
plot(x, add = TRUE)
quadrat.test(x, 3, alternative = "clustered") #one-sided test for clustering
plot(intensity(quadratcount(x,3), image = TRUE), main = "3x3 Quadrat Map Saplings")
#Clark-Evans Test
clarkevans.test(x) #no edge correction and two-sided test
clarkevans.test(x, alternative = "clustered")
#no edge correction and one-sided test for clustering
clip<- owin(c(0.2, 0.8),c(-0.8,-0.2)) #defines the "desired interior" for edge correction
clarkevans.test(x, correction = "guard", clipregion = clip)
#with edge correction and two-sided test
clarkevans.test(x, correction = "guard", clipregion = clip, alternative = "clustered")
#with edge correction and one-sided test for clustering
#Nearest-Neighbor Distance and Simulation Envelopes
Fc <- Fest(x)
Gc <- Gest(x)
Lc <- Lest(x)
Kc <- Kest(x)
plot(Fc)
Fenv <- envelope(x, Fest, nrank = 1, nsim = 999)
plot(Fenv, main = "Simulation Envelope for F")
plot(Gc)
Genv <- envelope(x, Gest, nrank=1, nsim = 999)
plot(Genv, main = "Simulation Envelopes for G")
plot(Lc)
Lenv <- envelope(x, Lest, nsim= 999, rank = 1)
plot(Lenv, main = "Simulation Envelopes for L")
plot(Kc)
Kenv <- envelope(x, Kest, nsim = 999, rank =1)
plot(Kenv, main = "Simulation Envelopes for K")
#Model Fitting under Neyman-Scott Process
model1 <- kppm(x, trend = ~1, clusters = "Thomas", statistic = "K")
plot(model1)
summary(model1)
model1Kenv <- envelope(model1, Kest, nsim = 999, rank =1)
plot(model1Kenv)
model1Lenv <- envelope(model1, Lest, nsim = 999, rank =1)
plot(model1Lenv)
#KPPM
kppm(x,
trend = ~1,
clusters = c("Thomas", "MatClust","Cauchy","VarGamma","LGCP"),
method = c("mincon", "clik2", "palm"),
statistic = c("K","pcf"),
)
```