Skip to content
This repository has been archived by the owner on Nov 27, 2019. It is now read-only.

Latest commit

 

History

History
74 lines (52 loc) · 2.5 KB

snippets.md

File metadata and controls

74 lines (52 loc) · 2.5 KB

Using Weights to Specify Dependence

12 = [(Yi1 - Y-bar1)*(Yi2 - Y-bar2)] / [(Yi1 - Y-bar1)2 * (Yi2 - Y-bar2)2]1/2 display(YouTubeVideo("MQACCcfTpXc"))

  • Measured using a semivariogram *
    • Look at pairs of observations separated by a distance $h$
    • Gives us a lot of data — every observation has $n-1$ possible pairs!
    • Illustrates the spatial structure of a variable
\* We usually just call it a variogram (there is a technical difference between the two but for clarity we'll use the term 'variogram' generally)

Calculating Semivariance

  • For each pair we know the distance between points
    • We measure the similarity of the point pairs (semivariance)
    • We plot the semivariance for a range of distances
\* In practice there is an important distinction between a *theoretical* and an *empirical* variogram — an empirical variogram is used to *fit* a theoretical variogram
  • $N(h)$ denotes the set of pairs of observations separated by distance $h$
  • $|N(h)|$ is the number of pairs in the set, and
  • $h$ is usually an approximate distance implemented using a certain tolerance

$$ \hat\lambda(h) = {1 \over {|N(h)|}} \sum_{(i,j)\in N(h)} (x_i - x_j)^2 $$

The $\in$ under the summation denotes all $i$,$j$ pairs within the set $N(h)$

Meuse Soil Data

  • The Maas river bank soil pollution data (Limburg, The Netherlands)
    • Sampled along the Dutch bank of the river Maas (Meuse) north of Maastricht
    • Data are those used in Burrough and McDonnell (1998, pp. 309–311)
  • The version we use here is a subset of the data provided with the gstat and sp R packages

Variogram Parts

  • Range
    • Distance at which point pairs stop being similar
    • In our meause example
      • Nearby samples have similar levels of zinc
      • Means samples separated by less than ... will be similar
      • Beyond 'range' samples are "independent"
  • Sill
    • Background level of variance. Sort of a baseline for study region
  • Nugget
    • Small scale discontinuity... error?

Loading/Plotting Data

Grab the data from online

meuse = pd.read_csv("https://raw.githubusercontent.com/filipkral/meuse/master/meuse.txt")

Quick plot using simple pandas plotting...

ax = meuse.plot.scatter('x', 'y', c='zinc', s=100).set_axis_off()

You can see the top of the data frame with:

meuse.head()