Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data loss when using uncode() #253

Closed
monikashea opened this issue Jun 29, 2022 · 3 comments · Fixed by #254
Closed

Data loss when using uncode() #253

monikashea opened this issue Jun 29, 2022 · 3 comments · Fixed by #254

Comments

@monikashea
Copy link

I recently updated soilDB from 2.6.14 to 2.7.2. Since then I have had an issue with the cosoilmoist table, and I think I was able to track the issue down to the uncode() function.

Bear with me: I am trying to access the cosoilmoist table so that I can get water table depth low, representative, and high values. In the past I was able to isolate this by filtering the rows where status == "Wet" and selecting dept_l, dept_r, and dept_h.

For the sake of this example:
where <- "mukey IN ('426539', '426540', '426541')"

When I use soilDB::get_cosoilmoist_from_SDA(WHERE = where) there is no data in the "status" column (and also the "month" column is empty). I've looked at this in a few different counties throughout WI and IL and it's always the case, so it's not an issue with inconsistent data collection.

Before I knew this function was available in soilDB I made my own query and ran it through SDA_query(). Here is my query:
q.cosoilmoist <- paste0("SELECT co.mukey, co.cokey, co.comppct_r, com.*, cosm.* FROM component AS co LEFT JOIN comonth AS com ON com.cokey = co.cokey LEFT JOIN cosoilmoist AS cosm ON cosm.comonthkey = com.comonthkey WHERE ", where, " ORDER BY co.mukey;")

After running d.cosoilmoist <- soilDB::SDA_query(q.cosoilmoist), the month and soimoiststat values are in the table! But after running d.cosoilmoist <- soilDB::uncode(d.cosoilmoist, droplevels = FALSE), the month and soimoiststat values are gone. So it looks like there is something going on in uncode().

I looked at a couple other get_(something)_from_SDA() functions and they also seem to have some lost data.

@brownag
Copy link
Member

brownag commented Jun 29, 2022

Hey, thanks for bringing this to our attention. There are a couple issues with uncoding coded values and setting the factor levels, and we have been meaning to cycle back to column name uncoding and factor related problems in #241, #242

As you have observed SDA has all of its data pre-decoded. I believe the only reason that uncode() is used in the SDA functions is to set factor levels--which is clearly breaking in your case.

The plan was to abstract out the factor level finding code from uncode so it does not need to be handled this way, which has a high likelihood of wiping out values when data are factors or something is a bit different about the database/generic assumptions associated with a database. Thanks for bringing it up, I think I had been aware this could happen, and thought it had been fixed. Since we have no direct tests of the output of these functions at present, it escaped me, sorry for any inconvenience

I will move resolving this and related factor issues to the top of the pile. I found an issue in uncode() pertaining to SDA that should be a simple fix, I think there are some paired issues the post-processing e.g. .cosoilmoist_prep()

@brownag
Copy link
Member

brownag commented Jun 30, 2022

I made some progress on this yesterday, and think the primary issue has been resolved at least for get_cosoilmoist_from_SDA(). Still thinking on interfaces to new functions for helping with metadta, and need to test the rest of the SDA functions and some more edge cases, see draft PR #254

You can install a development branch with:

remotes::install_github("ncss-tech/soilDB@factorsETC")

Would be glad to have you test it out, let me know if you see anything else that is off.

@monikashea
Copy link
Author

Thanks for fixing this @brownag! The development version is working great now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants