Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Not all components are being returned by FetchNASIS #82

Closed
phytoclast opened this issue Oct 19, 2018 · 12 comments
Closed

Not all components are being returned by FetchNASIS #82

phytoclast opened this issue Oct 19, 2018 · 12 comments

Comments

@phytoclast
Copy link

A minority of map units that are not being captured by fetchNASIS. I am not sure if bug or data population/validation error. I queried the national database for all components located in area = MI077 (Kalamazoo County), and was able to see that the Schoolcraft coiid == 291548 was in the local database. But this among other components were missing from my NASIS import into R.

fc <- fetchNASIS(from='components', fill=TRUE, SS=F)
I’ve even tried making rmHzErrors=F

@brownag
Copy link
Member

brownag commented Oct 19, 2018

I queried MI077 (all MUs) into my local DB and then into selected set and was unable to locate a component with that combination of name and record ID (Schoolcraft:291548) -- either from within NASIS or via R.

I queried that coiid (Component by coiid) specifically in NASIS. It appears that it refers to a "Brady" component in DMU 149008A?
image

How exactly are you verifying that you have that component in your local database if you aren't loading it from selected set?

If you are sure it is in your local, are any portions of the missing data reflected in the lower level component fetching functions? i.e.

get_component_data_from_NASIS_db()
get_component_horizon_data_from_NASIS_db()
get_component_copm_data_from_NASIS_db()
get_component_cogeomorph_data_from_NASIS_db()
get_component_otherveg_data_from_NASIS_db()
get_component_esd_data_from_NASIS_db()
get_component_diaghz_from_NASIS_db()

@phytoclast
Copy link
Author

My apologies. I misread my tabular data from the GIS, and I am not sure how. That wasn't even the mukey or dmu. Since then, I have figured out that it wasn't the actual Schoolcraft component that lacked the horizon data, but a minor component. I I knew that the area was mostly Schoolcraft and saw the map unit name, but forgot to have my export only show major components.
These is the actual record ids for the various fields:
s.txt.lmapunitiid 187079
s.txt.compname Sleeth
s.txt.coiid 885254
s.txt.dmuiid 144830
s.txt.nationalmusym 68nt
s.txt.muname Schoolcraft loam, 0 to 2 percent slopes

@phytoclast
Copy link
Author

Apologies. I must have misread the id from the GIS. I am not sure where that record id came from. I have also forgot to restrict my export to major components, so the actual component was not Schoolcraft. The Schoolcraft happened to have all its horizon data. This is the component which lacked horizon data:
s.txt.lmapunitiid 187079
s.txt.compname Sleeth
s.txt.coiid 885254
s.txt.dmuiid 144830
s.txt.nationalmusym 68nt
s.txt.muname Schoolcraft loam, 0 to 2 percent slopes

I have confirmed that it is present in NASIS. I also found that the problem is not limited to minor components, in this case a Riddles map unit has only horizon data with the minor Kalamazoo component:

lmapunitiid compname coiid dmuiid nationalmusym muname horizon data? majcompflag
187078 Kalamazoo 885249 144829 68ns Riddles loam, 2 to 6 percent slopes yes no
187078 Oshtemo 885250 144829 68ns Riddles loam, 2 to 6 percent slopes no no
187078 Sleeth 885248 144829 68ns Riddles loam, 2 to 6 percent slopes no no
187078 Riddles 269322 144829 68ns Riddles loam, 2 to 6 percent slope no yes

I will follow up by trying those get component functions one by one.

@phytoclast
Copy link
Author

For the following, $coiid %in% '885254' applies.get1 <- get_component_data_from_NASIS_db() --> 1 obs
get2 <- get_component_horizon_data_from_NASIS_db() --> 5 obs
get3 <- get_component_copm_data_from_NASIS_db() --> 2 obs
get4 <- get_component_cogeomorph_data_from_NASIS_db() --> 2 obs
get5 <- get_component_otherveg_data_from_NASIS_db() --> 0 obs
get6 <- get_component_esd_data_from_NASIS_db() --> 0 obs
get7 <- get_component_diaghz_from_NASIS_db() --> 0 obs

Thus, horizon, parent material, and geomorph data is present. It lacks only ecosite, veg, and diaghz data. I also tried a component that didn't come up with missing data by the fetchNASIS function; it also lacked the same three data tables.

@phytoclast
Copy link
Author

My earlier claim that rmHzErrors=F does not work is proven to be an error. I ran the script again with this remove errors turned off. All the components and their horizon data are now showing up. I don't know why it appeared not to work before. It could be that output file was not refreshed due to a file lock by ArcGIS. Anyhow, I now think it is probably a matter of data population, and not a bug in the code.

@brownag
Copy link
Member

brownag commented Oct 22, 2018

No worries! Thank you for including the necessary info so we can try and replicate your issue!

Maybe the number was the COKEY (SSURGO) as opposed to COIID (NASIS)? I didn't check what you gave against COKEYs.

Now, down to the real problem: horizon data in NASIS not getting reflected in the fetchNASIS SPC

For Sleeth component in the natmusym '68nt' I get the same results from the lower-level fetch functions:

> library(soilDB)
> any(get_component_data_from_NASIS_db()$coiid %in% '885254')
[1] TRUE
> any(get_component_horizon_data_from_NASIS_db()$coiid %in% '885254')
[1] TRUE
> any(get_component_copm_data_from_NASIS_db()$coiid %in% '885254')
[1] TRUE
> any(get_component_cogeomorph_data_from_NASIS_db()$coiid %in% '885254')
[1] TRUE
> any(get_component_otherveg_data_from_NASIS_db()$coiid %in% '885254')
[1] FALSE
> any(get_component_esd_data_from_NASIS_db()$coiid %in% '885254')
-> QC: multiple ecosites / component. Use `get('multiple.ecosite.per.coiid', envir=soilDB.env)` for related coiid values.
[1] FALSE
> any(get_component_diaghz_from_NASIS_db()$coiid %in% '885254')
[1] FALSE

Now, when we try to get it from fetchNASIS_components() ... we find there is a horizon error in the 885254 component:

> any(fetchNASIS_components()$coiid %in% '885254')
-> QC: multiple ecosites / component. Use `get('multiple.ecosite.per.coiid', envir=soilDB.env)` for related coiid values.
-> QC: horizon errors detected, use `get('component.hz.problems', envir=soilDB.env)` for related coiid values
[1] FALSE
> any(get('component.hz.problems', envir=soilDB.env) %in% '885254')
[1] TRUE

And I just saw your comment. I was gonna track down your horizon error

@brownag brownag reopened this Oct 22, 2018
@brownag
Copy link
Member

brownag commented Oct 22, 2018

Testing horizon logic for one of the offending components:

> co <- fetchNASIS_components(rmHzErrors = F)
-> QC: multiple ecosites / component. Use `get('multiple.ecosite.per.coiid', envir=soilDB.env)` for related coiid values.
-> QC: horizon errors detected, use `get('component.hz.problems', envir=soilDB.env)` for related coiid values
> test_hz_logic(horizons(subsetProfiles(co, s='coiid == 885254')), 'hzdept_r', 'hzdepb_r')
hz_logic_pass 
        FALSE 

When the test_hz_logic() strict=FALSE flag is set, the only thing that results in hz logic fail is NA or overlapping horizons. When you look at the horizons for the offending component (via R) the problem becomes apparent (the Bt horizon is duplicated)

> sub.c <- subsetProfiles(co, s='coiid == 885254')
> test_hz_logic(horizons(sub.c), 'hzdept_r', 'hzdepb_r')
hz_logic_pass 
        FALSE 
> horizons(sub.c)
      coiid   chiid hzname hzdept_r hzdepb_r texture fragvoltot_l fragvoltot_r fragvoltot_h sandtotal_l sandtotal_r
1390 885254 2078195     Ap        0       25       L           NA           NA           NA          NA        43.5
1391 885254 2078196     Bt       25       33      CL           NA           NA           NA          NA        34.7
1392 885254 2078196     Bt       25       33      CL           NA           NA           NA          NA        34.7
1393 885254 2078197    Btg       33      117     SCL           NA           NA           NA          NA        58.9
1394 885254 2078198     2C      117      168    GR-S           NA           NA           NA          NA        95.0

This often happens when one or more child tables of the offending have multiple values marked as RV. fetchNASIS() attempts to "flatten" the heirarchical relations from the database, and often the RV record is used to reduce that many:one relationship. Here --- it is an issue with two RVs marked in the structure table

image

This is another instance of issues #32 #58 & #66; the main issue is NASIS doesn't have any front end validation to prevent you having multiple RVs checked off. We need to explicitly handle the cases with multiple RVs

@brownag brownag closed this as completed Oct 22, 2018
@dylanbeaudette
Copy link
Member

This might be another reason to switch the default value of rmHzErrors to FALSE.

@phytoclast
Copy link
Author

In order to avoid giving too much weight to this horizon, I should probably filter it out by averaging the quantitative values and returning only the first of the text values. If this is at all representative of the errors, however, all I'll really need to do is omit the 'structgrpname' field as irrelevant to ecosites, and then redefine the table using the unique() function.

@brownag
Copy link
Member

brownag commented Oct 22, 2018

Greg, yes, I also would be concerned about accidentally including the duplicated HZ in subsequent analysis if leaving rmHzErrors=F

Unfortunately, the issue with many:one flattening / duplication of RV records in child tables also affects things like texture group which are likely more related to ES concepts than structuregrp is.

The reason this has not been fixed sooner is the decision about which RV to reflect in the result is not straightforward. If two different records are marked RV, which one is actually representative?

In the case where you get overlap from multiple RVs within a single NASIS record, all data will be the same in the record with the exception of the attribute(s) that had multiple RVs.

This issue is complicated further by the fact that certain types of horizons (combination horizons; e.g. A/C) are internally represented by NASIS as overlapping horizons. Presumably, at least some of the component horizon level attributes (texture, clay content, etc.) will be different in these "intentionally overlapping" cases.

Your approach of averaging would be a good one in that it would deal with both the identical data issue, as well as combination horizons (assuming you got the weighting of the different "kinds" of horizon material sorted out). Averaging is easy enough to implement in a particular function that summarizes a pedon/component's horizon data and returns a value (or set of values) for each input profile, but becomes more challenging with putting your modified representation back into the original SPC.

@phytoclast
Copy link
Author

Thanks Andrew. I can dispense with the texture field as well, since I quantify weighted averages for sand, clay, organic matter, and pH for a 0-50 cm and a 0-150 cm depth zones, before I arrive at my groupings. As far as I am aware, our B/E horizons are populated for the components as a composite texture of the two materials. However, I am told that this is not the case with pedons, .

@brownag
Copy link
Member

brownag commented Oct 22, 2018

OK, sounds excellent!

I think the composite approach is probably very very common for the component data, especially where any of the NASIS horizon-level calculations are involved. And luckily the combination horizons are pretty rare in general. So unlikely to cause major problems, but nonetheless the averaging approach would be the safest if you had to deal with those data.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants