-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TG2- Use the word "EMPTY" instead of NULL and provide definition #111
Comments
Thanks Christian - we came to much the same conclusion with EMPTY to include empty, NULL, /N, -9999 etc. We have made a note to define the term and will bulk change the names. We are changing the descriptions, etc.as we work through them. |
Yes, we need a standard definition for a function isEmpty(informationElement) which returns true or false. Some values for isEmpty=true are obvious: empty string, null (if the language supports it e.g in java, javascript, sql), undefined (e.g. in javascript), a char array of size 0 (e.g. in C). Other values are probably reasonable as we see them in data as serializations of null out of relational databases. These include "\N", "NULL", "null". Other values need substantive discussion. One class of these are strings that users put into data to mean an empty value, these include "n/a", "not applicable", "[not applicable]", "[data not available]". The scope of what is considered empty by a standard isEmpty() function needs discussion. |
An example (minimal) implementation of an isEmpty() function can be found at https://github.com/FilteredPush/event_date_qc/blob/1abbd3f02eb6c28129764defab78f72156972864/src/main/java/org/filteredpush/qc/date/DateUtils.java#L1832 |
@cgendreau Plan is currently for Alex to update all of the NULL names to EMPTY in bulk after the Gainesville meeting. |
See also discussion under #147 |
Talking to @tucotuco we think we need to combine elements of both versions. We will discuss and come back with a suggestion tomorrow. |
In the light of discussions, I have amended the definition of EMPTY in #152 to read "A field that is present but does not contain any characters or values. A field containing non-printing or other invalid characters or values may be separately detected." The reasoning seems ok as we already have Expected Responses that state, for example (#162) -"INTERNAL_PREREQUISITES_NOT_MET if the field dwc:taxonRank is not present or is EMPTY; COMPLIANT if the value of the field dwc:taxonRank is in the specified source authority; otherwise NOT_COMPLIANT." We are then allowing separately in theory for a field not present and a field that is EMPTY. |
I'm putting together test data for #49 VALIDATION_YEAR_EMPTY, and I think we need to revert to the rendtion from @tucotuco we also need to be explicit about spaces, as elsewhere we have asserted that whitespace should be trimmed before testing, thus a value with only whitespace would be empty. The meaning of invalid characters is very different from non-printing characters, and the two shouldn't be mixed and treated separately. I'd suggest: EMPTY: A field that is needed as input is not present, or, the input field |
Thanks @chicoreus. This would seem to cover it but the wording is a little odd. How about EMPTY: A field that is needed as input is not present, or the input field We will need to update the TG2 vocabulary @ArthurChapman. |
@Tasilee that works. |
@ArthurChapman the entry in the vocabulary #152 is out of sync with this discussion, thought the entry in the vocabulary feels more current than the discussion here. "EMPTY: A field that is either not present or does not contain any characters or values other than white space. Note: A field containing invalid characters or values (including serializations of NULL values) are NOT_EMPTY and may be separately detected but fields containing only non-printing characters (q.v.) are treated as EMPTY." It is worth looking at the documentation of String.trim() in Java, which contains the following text: "where space is defined as any character whose codepoint is less than or equal to 'U+0020' (the space character). " I would suggest that we follow that text and amend the current definition of empty to make use of that definition of space (which includes other whitespace characters (tab, line feed, carriage return) and non-printing characters). "EMPTY: An information element that is either not present or does not contain any characters or values other than those in the range U+0000 to U+0020. Note: An information element containing invalid characters (e.g. letters in an information element that would be expected to contain integers) or values (including string serializations of the NULL value) are NOT_EMPTY and may be separately detected." Using only characters in the range U+0000 to U+0020 as EMPTY also reduces the need to tackle that at the interface within a mechanism implementing the tests where data are presented to the test, the mechanism is likely to be unable to distinguish between cases where a term was absent in the original data and the term was present but contained no data. Either case could be handled by the mechanism by presenting a test with a null or an empty string. By starting at U+0000 we are effectively being explicit that null objects are also EMPTY. |
I like - and have changed definition in #152 |
Can this issue be closed? |
I don't think the word
NULL
should be used in tests title and definition.I think the word
EMPTY
should be used instead and a definition (ofEMPTY
) should be added to a glossary.see #20 (comment)
The text was updated successfully, but these errors were encountered: