-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[traits.build workflow] Precedence of column-level metadata over trait-level metadata over location-level metadata not working #60
Comments
This is actually intentional - we had to code either trait-level or location-level metadata to take precedence over the other and fairly randomly picked location. If you have a good reason to swap it around to trait-level metadata taking precedence, you're welcome to change it. |
@ehwenk According to the original tests before the big test structure overhaul, the intention was that column metadata (dataset level) > trait metadata > location metadata. Let me know if you agree with these precedences, and I can change them. |
@yangsophieee I am very happy for you to switch it to trait metadata > location metadata > dataset metadata! |
@ehwenk thanks! what about the column precedences? |
@yangsophieee I think it probably would always be the same - you shouldn't fill in anything at the trait_level if there is a universal value or column given at the dataset level. Filling in at the trait_level (or location_level) should be a clear signal that this is the correct variable value/column for that trait/location |
@yangsophieee I hadn't thought about the different workflows for fixed values vs values from columns. For fixed values, we decided values specified in the traits-section overwrites values from the dataset-section; and location-section values overwrite traits-section values. (Although I have no preference actually for the order or traits vs location.) Meanwhile dataset-level column-specified values should have ultimate precedence - otherwise what does it mean to read in a value from a column if you're going to overwrite it? Then that cell in the data.csv file should be edited (either actually edited, or using custom_R_code). Is it realisitic to add an |
I've just spent a long time staring at Test_2023_4 and Test_2023_2 outputs and am not sure anything needs to be changed. As in, I don't think issue #60 and issue #38 are issues. What are specific variables, datasets that you think are giving erroneous outcomes? I also think that I misstated something in my last comment. In order of information being read in (i.e. lower in this list overwrites items at top):
As I said before it was arbitrary is 3-4 or 5-6 occurs first, but I don't want to change that right now. But, I have some (possibly unfounded) concerns with this list of variables in Line 137 of process.R
I don't think |
I'm pretty sure the code doesn't allow you to read in a column of metadata from a location right now. I discussed with @ehwenk and we agree it's not something we need to implement. |
In terms of precedence of trait vs location metadata, both @ehwenk and I agree that we think trait metadata should have precedence over location metadata. But in terms of urgency this can be fixed later if it's too difficult. |
Addresses #60, checking precedence of metadata fields across various sections of metadata are working correctly (e.g. location > trait metadata) - Added additional testing for checking precedence of metadata - Fixed an error with `dataset_test` where I didn't realise the data variable already uses `process_custom_code` so it was applying it twice - Removed old testing files - @ehwenk and I agree that trait metadata should probably take precedence over location metadata but we will leave this to a later date (trait metadata is read in via `process_parse_data` and location metadata replaces it afterwards in `dataset_process` around Line 141, so you'd have to move the location metadata part into `process_parse_data`, which will presumably require splitting location data into location properties vs the other variables and having them input at different times)
Currently location-level metadata is overwriting trait-level metadata, e.g. for Test_2023_2 - Alstonia scholaris,
leaf_mass_per_area
should havebasis_of_value
asmeasurement
but it's currentlyexpert_score
(the value for the location Cape Tribulation).I think this is occurring in the loop around Line 161 of
process.R
. It's happening for both long (see Test_2023_4) and wide datasets.In addition, column metadata at the dataset level is being overwritten by trait-level metadata for Test_2023_2 and Test_2023_4. E.g. in Test_2023_2, Homalanthus novoguineensis should have
basis_of_value
asmodel_derived
but instead it's overwritten by the trait-level value,measurement
.The text was updated successfully, but these errors were encountered: