-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ensure that new rock fragment sorting code gives similar results as old code #1
Comments
Fixed: results are almost identical (all size fractions: < 1-2% difference) apart from bugs in original code related to odd fragment population spanning multiple size classes. The new values are more accurate than the old values. |
I've implemented a new SQL version that I think is preferable to this new version. The most compelling reason being that it should be more efficient to implement with SDA and NASIS Web Reports. SDA has a limit of 100k (?) records, while NASIS Web Reports are already quite slow. My SQL code uses temporary tables which eliminates some of the redundancy in the original SQL code. Currently I've added the results to get_extended_data_from_NASIS_db(). See the test example below from usiteiid LIKE '%CA071%' and areasymbol LIKE 'IN%'. remotes::install_github("ncss-tech/soilDB", dependencies=FALSE, upgrade=FALSE, build=FALSE) library(soilDB) ed <- get_extended_data_from_NASIS_db() ed$frag_summary <- ed$frag_summary[order(ed$frag_summary$phiid), ] test <- diff_data(ed$frag_summary, ed$frag_summary_v2)
If others could test with their data, I think we safely update with this new version. Stephen |
AdditionsThere are 175 (of 23031 total) rows in the SQL-based summary that are additional to the R-based approach. It looks like these may be an artifact of NA fragment volume? It is hard to say because all of the ModificationsUse of 76mm as
The R-based summary pushes these fragments into CB (not ideal).
The SQL-based summary uses the RV and keeps them as GR.
In my example (~ 4k pedons) 30% of what should be GR use 76mm as This seems like a data population problem that should be fixed vs. worked-around. Fragment size spans more than 1 class
**Interpretation of
R-based:
SQL-based:
SummaryThe differences are slight (< 1% of 8,460,509 cells). I suggest we decide on the assumptions of the underlying algorithm (see above) and what fraction of data errors we are willing to leave for QC activities. |
Some more ideas after a conversation with Jay.
I am going to write a very basic test for the 75/76mm issue and add to |
Current standards aren't all that clear. |
A thought. We might consider including some "slop" in our evaluation of fragsize to catch those that are within 1mm. |
I just heard from Curtis Monger than the 75mm gravel/cobble cutoff may be a typo. I'll post back when I hear more. |
It's unclear to me why the default for .sieve() is the fragsize_h instead of fragsize_r. Can you explain your rationale? I unaware of another instance in soilDB or NASIS where the H if selected before the RV. We've spent a lot of time discussing the L, RV and H issue in the past, and I don't recall any preference for the H. It seems to me if the fragsize range overlaps a class, or their is confusion as to the class limits, then the RV would be more a more robust estimate. Also, if the user purposely entered an RV, then they're in effect saying that is the most common fragsize. If using the H results in a misclassification, then the effect is shift in the particle size distribution. |
I agree that the use of low-rv-high in my argument is the opposite of what I typically advocate. I think that there is a distinction to be made here:
The use of H vs. RV (when present) is largely by convention, as inherited from the original SQL and (I think) data population practices in the 'west. It is very rare for folks to describe and RV for records in the Using the RV first (when present) might be a reasonable compromise, but I can think of cases where the RV is completely arbitrary:
Either way, the choice of precedence (RV over H, H over RV) will lead to some (< 1% ?) errors. How about we decide and then go from there. There will be very little effect on pedons in SSR2 since the RV is very rarely populated. I do not think that there is a case to be made for computing the RV from L/H and then using that. Overlapping ranges are (in my opinion) a data population error. Again, this line of reasoning only applies to pedon data. |
Settling on a compromise due to the various upper ends used to define gravels (74mm, 75mm, 76mm). The latest NSSH part 618 defines gravel as 2mm<= x < 75mm. Relevant changes:
|
The old code (SQL) may have missed some data that spanned multiple size classes.
Test:
The text was updated successfully, but these errors were encountered: