Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

develop and apply a robust "completeness index" #11

Open
dylanbeaudette opened this issue Sep 19, 2017 · 0 comments
Open

develop and apply a robust "completeness index" #11

dylanbeaudette opened this issue Sep 19, 2017 · 0 comments

Comments

@dylanbeaudette
Copy link
Member

dylanbeaudette commented Sep 19, 2017

The current version isn't very informative because:

  • it doesn't know about missing horizons
  • doesn't specifically take into account properties required by CEAP/APEX
  • ??? other reasons

Data required by CEAP/APEX, via CAH:

  • sand content
  • silt content
  • 1/3 bar bulk density
  • and OD bulk density (if present)
  • sum of bases
  • CaCO3 equivalent
  • EC (sat paste)
  • 1:1 pH
  • rock fragments
  • cec7 (NH4OAc)
  • organic C
  • 1/3 bar water ret.
  • 15 bar water ret.

Ideas

  1. simple score based on number of non-null fields in this list
  2. simple score based on number of non-null fields, weighted by number of expected hz, via OSD
  3. bit mask (e.g. 1010010101) type reporting so that the user knows what is missing
  4. weighted-score, based on column importance: e.g. clay is more important than al_dith

An example via aqp

library(soilDB)
library(aqp)

# get some example data
x <- fetchKSSL('amador')

# variable of interest
vars <-  c('sand', 'silt', 'clay', 'db_13b', 'db_od', 'bs82', 'bs7', 'caco3', 'ec_12pre', 'ph_h2o', 'cec7', 'oc', 'w3cld', 'w15l2')

# get soil depth based on detection of "non-soil horizons"
sdc <- getSoilDepthClass(x, name = "hzn_desgn", top = "hzn_top", bottom = "hzn_bot", p = 'Cr|R|Cd')

# existing aqp function for detecting missing data
# inform max_depth via getSoilDepthClass(...)
# filter out non-soil horizons (there shouldn't be many in the KSSL data)
res <- missingDataGrid(x, max_depth=max(sdc$depth, na.rm = TRUE), vars=vars, filter.column = 'hzn_desgn', filter.regex = 'Cr|R|Cd')

# compute (simple) data completeness index
# 100 - [ sum(pct missing by variable) / n_variables ]
res$dci <- round(100 - (rowSums(res[, -1]) / length(vars)))

# copy back into site-level attributes
site(x) <- res[, c('pedon_key', 'dci')]

# compare with current "pedon completeness index": pretty close
plot(pedon_completeness_index ~ dci, data=site(x))
  pedon_key sand silt clay db_13b db_od bs82 bs7 caco3 ec_12pre ph_h2o cec7  oc w3cld w15l2 dci
1     32585    0    0    0      0     0    0   0   100      100      0    0 100     0     0  79
2     52931    0    0    0    100   100  100   0   100      100    100    0   0   100   100  43
3     53074    0    0    0    100   100  100   0   100      100    100    0   0   100   100  43
4     53081    0    0    0    100   100  100   0   100      100    100    0   0   100   100  43
5     59423    0    0    0    100   100    0 100     0      100      0    0 100   100     0  57

image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant