Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TG2-ISSUE_COORDINATEPRECISION_UNLIKELY #293

Closed
ArthurChapman opened this issue Feb 12, 2024 · 33 comments
Closed

TG2-ISSUE_COORDINATEPRECISION_UNLIKELY #293

ArthurChapman opened this issue Feb 12, 2024 · 33 comments
Labels
Conformance Immature/Incomplete A test where substantial work is needed to develop the specification to the point where the test ca Issue A potential issue Parameterized Test requires a parameter SPACE Test Tests created by TG2, either CORE, Supplementary or DO NOT IMPLEMENT TG2 VOCABULARY

Comments

@ArthurChapman
Copy link
Collaborator

ArthurChapman commented Feb 12, 2024

TestField Value
GUID 32aca770-1f99-45f1-87a4-f4a582c02b50
Label ISSUE_COORDINATEPRECISION_UNLIKELY
Description Is the value of dwc:coordinatePrecision a likely value ?
TestType Issue
Darwin Core Class Location
Information Elements ActedUpon dwc:coordinatePrecision
Information Elements Consulted
Expected Response EXTERNAL_PREREQUISITES_NOT_MET if the bdq:sourceAuthority is not available; INTERNAL_PREREQUISITES_NOT_MET if dwc:coordinatePrecision is bdq:Empty; POTENTIAL_ISSUE if the value of dwc:coordinatePrecision is not in the bdq:sourceAuthority; otherwise NOT_ISSUE.
Data Quality Dimension Likelihood
Term-Actions COORDINATEPRECISION_UNLIKELY
Parameter(s) bdq:sourceAuthority
Source Authority bdq:sourceAuthority default = "Darwin coordinatePrecision" {[http://rs.tdwg.org/dwc/terms/coordinatePrecision]} {dwc:coordinatePrecision vocabulary API [NO CURRENT API EXISTS]}
Specification Last Updated 2024-02-13
Examples [dwc:coordinatePrecision="15": Response.status=RUN_HAS_RESULT, Response.result=POTENTIAL_ISSUE, Response.comment="dwc:coordinatePrecision does not have an equivalent in the bdq:sourceAuthority"]
[dwc:individualCount="0.01667": Response.status=RUN_HAS_RESULT, Response.result=NOT_ISSUE, Response.comment="dwc:coordinatePrecision has an equivalent in the bdq:sourceAuthority bdq:sourceAuthority"]
Source TG2
References
Example Implementations (Mechanisms)
Link to Specification Source Code
Notes Zero, for example, is not a valid value for dwc:coordinatePrecision. Neither are most real numbers between 0 and 1 (e.g., 0.2 to be an unlikely value for dwc:coordinatePrecision, because no one would use nearest fifths of a degree. This bdq:Supplementary test is not regarded as CORE (cf. bdq:CORE) because of one or more of the reasons: not being widely applicable; not informative; not straightforward to implement or likely to return a high percentage of either bdq:COMPLIANT or bdq:NOT_COMPLIANT results (cf bdq:Response.result). A Supplementary test may be implemented as CORE when a suitable use case exists.
@ArthurChapman ArthurChapman added TG2 Validation SPACE Test Tests created by TG2, either CORE, Supplementary or DO NOT IMPLEMENT VOCABULARY Supplementary Tests supplementary to the core test suite. These are tests that the team regarded as not CORE. Conformance Parameterized Test requires a parameter labels Feb 12, 2024
@ArthurChapman
Copy link
Collaborator Author

I have created this test as a "STANDARD" test as it conforms with similar tests that are tested against a bdq:sourceAuthority. Perhaps this should be CORE, but can't be until we have a Lookup Table that we can link to.

@chicoreus
Copy link
Collaborator

@ArthurChapman this feels like it should be an INRANGE test rather than a STANDARD test. The definition of dwc:coordinatePrecision is " A decimal representation of the precision of the coordinates given in the dwc:decimalLatitude and dwc:decimalLongitude." This indicates an arbitrary positive real number, with some upper limit (my math isn't good enough to be sure, but I'm guessing that a precision of 360.0 with any coordinate implies a location anywhere on the surface of the earth). Examples start at a precision of 1.0 and get smaller than there, but there are probably reasonable values for precision when the location is known to a resolution of more than one degree.

The value is an arbitrary real number, so this is a test of the value against a range, rather than a test against a vocabulary, thus no vocabulary needed. There are standard values for precision when translating from one form of coordinates to another, but they aren't the only possibilities.

@chicoreus
Copy link
Collaborator

Propose:

INTERNAL_PREREQUISITES_NOT_MET if dwc:latitude is NOT_EMPTY and dwc:coordinatePrecision is EMPTY; COMPLIANT if dwc:latitude is EMPTY or if the value of dwc:coordinatePrecision is a positive real number less than or equal to 360. otherwise NOT_COMPLIANT.

We need to not assert that data are not fit for purpose by asserting INTERNAL_PREREQUISITES_NOT_MET for an empty value when an empty value is a correct value for the situation, that is when when no value exists as the metadata term is expected to be empty as there is no georeference. The absence of a georeference is a different data quality problem assessed by other tests.

@chicoreus
Copy link
Collaborator

And not parameterized, and no source authority.

@chicoreus
Copy link
Collaborator

This might belong in CORE. Good georeference metadata is important for analysis of georeferences, though coordinate precision may be more important for downstream presentation than analytical purposes. It provides a means of asserting how many decimal places to display (and how many are relevant for analysis) when numeric coordinate data are serialized into strings and deserialized, potentially at several steps in the pathway, any of which could add arbitrary numbers of trailing zeroes, or alter low significance digits as the result of moving between string serializations and floating point numbers.

@tucotuco
Copy link
Member

An INRANGE version of this test is simpler, but it would be much less useful. The STANDARD version of the test can tell people if the precision term was very likely misunderstood.

In an INRANGE version I would say that the valid range is 0.0000001 < coordinatePrecision < 180. That would cover a proper georeference on the low end and a quadrant of the globe on the high end.

@chicoreus
Copy link
Collaborator

@tucotuco I can see that some values would be expected to be standard (e.g. 0.000278 for translation from degrees, minutes, seconds with a precision of 1 second to decimal degrees), but there are a large number of potential sources of original coordinate data (PLSS, state plane feet, OSGB coordinates, etc), and affects of coordinate transformations that should in effect mean that the precision is an arbitrary value, not constrainable by a vocabulary. I can't see a clear way of distinguishing between the misunderstandings of the term and valid precision values produced from a range of different original forms with various transformations into decimal degrees. Perhaps an ISSUE that flags cases where the precision is not one of the expected set of values from a small set of typical (decimal degrees to n digits of precision, decimal degrees from degrees/minutes, decimal degrees from degrees/minutes and tenth of a minute, decimal degrees from degrees minutes seconds to one second), but not a validation.

@tucotuco
Copy link
Member

I can see ISSUE as a more appropriate test than STANDARD. A vocabulary would have issues with the values being strings to represent the numbers, and with the precision of the values representing the precision. An ISSUE test would be much more useful than an INRANGE test.

@ArthurChapman
Copy link
Collaborator Author

I've changed to an ISSUE test - @tucotuco do you still see it having a Lookup Table? Otherwise the Expected Response would be difficult to write.

@chicoreus
Copy link
Collaborator

@tucotuco "A vocabulary would have issues with the values being strings to represent the numbers, and with the precision of the values representing the precision." I'm not sure what you mean here. Can you provide some examples of what an issue would assert is correct and not correct based on a controlled vocabulary?

@ArthurChapman ArthurChapman added Issue A potential issue and removed Validation labels Feb 12, 2024
@chicoreus
Copy link
Collaborator

@tucotuco it seems likely that I'm misunderstanding the nature of the problem and what this test is intended to detect...

@chicoreus chicoreus changed the title TG2-VALIDATION_COORDINATEPRECISION_STANDARD TG2-ISSUE_COORDINATEPRECISION_STANDARD Feb 12, 2024
@ArthurChapman
Copy link
Collaborator Author

ArthurChapman commented Feb 12, 2024

@chicoreus - you have changed to ISSUE-COORDINATEPRECISION_STANDARD - I was suggesting ISSUE_COORDINATEPRECISION_LIKELY

@chicoreus chicoreus changed the title TG2-ISSUE_COORDINATEPRECISION_STANDARD TG2-ISSUE_COORDINATEPRECISION_LIKELY Feb 12, 2024
@chicoreus
Copy link
Collaborator

@ArthurChapman missed the likely.... Fixed.

@tucotuco
Copy link
Member

A lookup table would be required, but implementation would likely need more than that because of the precision of the representation of the precision. For example, "0.000278" and "0.0002778" are both likely values, but is "0.00028"? Or "0.0003"? The perfectly precise values of the likely values can not even be expressed as numbers with finite precision.

@ArthurChapman
Copy link
Collaborator Author

Thanks @chicoreus - Standard implies a Conformance test whereas Likely implies a Likelihood Test

@ArthurChapman
Copy link
Collaborator Author

OK - for now if we leave it as a sourceAuthority and say that NO CURRENT API EXISTS - we can label it ?"Incomplete" and NEEDS WORK. Once we have a LookUp Table we can change to either Supplementary of CORE.

@chicoreus
Copy link
Collaborator

So we need to be concerned with the precision of the precision...

This feels like a test (or perhaps that is a related one) that needs to take the verbatim coordinate as an information element consulted, assess whether the verbatim coordinate is decimal degrees, degrees decimal minutes, or degrees minutes seconds, and if so test to see if the precision is a reasonable value for one of those. A very large number of cases can probably be covered with a small number of likely ranges that can be specified within the specification of the test and don't need an external vocabulary. If we exclude transformations from other coordinate systems from consideration, there are probably only a small number of cases to consider. If we include transformations from other coordinate systems we are likely into the place where we just have to ask if arbitrary values are in range. Assuming, of course, that I'm understanding the problem, something I'm not yet convinced of.

@chicoreus
Copy link
Collaborator

@ArthurChapman @tucotuco I'm just not seeing the need for a source authority. This very much feels like a likely set of values could be enumerated in the test, with an extension point for provided by a parameter that allows the addition of a list of other parameters (e.g. original data were likely to have included PLSS data (thus precise to section, half section, quarter section, quarter quarter section, etc)).

@ArthurChapman
Copy link
Collaborator Author

@chicoreus - I don't see why not a soucreAuthority. If you hard wire a set of values in the test itself, it is a big job to change later and would require a lengthy process (through the TDWG system), but if you use a sourceAuthority with a list of likely values, it is easy to add new ones as they arise and without having to go through a lengthy and difficult process. I think it is the simplest solution.

@chicoreus
Copy link
Collaborator

This is also a place we could explore the extension point in the response for representing uncertainty, particular values between 180 and 1 are possible {known to quadrant, known to 10 degrees, etc ), but less likely than values between 0.00001 and 1 inclusive, where particular values in that range have high likelyood, a value of 0.000001 is unlikely but not totally implausible, and values less than 0.000001 are implasuble. Thus issue could assert potential issue with a qualifier of uncertainty of the issue.

@chicoreus
Copy link
Collaborator

@ArthurChapman it isn't taxon names or values for sex, it is math. I'm not very comfortable with a controlled vocabulary for mathematical values. A parameter would readily allow other cases without a change to the test specifications.

@ArthurChapman
Copy link
Collaborator Author

@chicoreus We define bdq:sourceAuthority as "namespace that provides a reference for values required for a test evaluation" To me it makes no difference if that list of values is an alphabetical list of values or a numerical list - especially if that numeric list is a list of discreet values.

@ArthurChapman
Copy link
Collaborator Author

@chicoreus - bdq:sourceAuthority is not the same as a controlled Vocabulary, even though many of the sourceAuthority are controlled vocabularies.

@ymgan
Copy link
Collaborator

ymgan commented Feb 14, 2024

uhm ... maybe replace the individualCount examples with coordinatePrecision?

@ArthurChapman
Copy link
Collaborator Author

Thanks @ymgan - My mistake - done.

@Tasilee
Copy link
Collaborator

Tasilee commented Feb 18, 2024

The lengthy discussion on this 'test' strongly suggests taking the simpler of the two strategies (range and likely values), and use ISSUE_COORDINATE_PRECISION_LIKELY with a range of 0.00001 to 1.0 being LIKELY and setting the status to Immature/Incomplete. We can Note a potential implementation of a Source Authority list of likely values.

@ArthurChapman
Copy link
Collaborator Author

If @tucotuco believes that a SourceAuthority is the best way to go with this test and that he believes that he could create one when needed - I'd go that way. It may be years before anyone decides to take it further. I'd suggest we label this test Immature/Incomplete at this stage.

@ArthurChapman ArthurChapman added Immature/Incomplete A test where substantial work is needed to develop the specification to the point where the test ca and removed Supplementary Tests supplementary to the core test suite. These are tests that the team regarded as not CORE. labels Feb 18, 2024
@tucotuco
Copy link
Member

A useful implementation would require a SourceAuthority of values combined with an algorithm to determine if a value is "close enough" to one of those values. Given the requirement of an algorithm, all of it could be done in code without a SourceAuthority. In any case, the range implementation would be of much less utility, only catching if the value is larger than or smaller than expected.

@chicoreus chicoreus changed the title TG2-ISSUE_COORDINATEPRECISION_LIKELY TG2-ISSUE_COORDINATEPRECISION_UNLIKELY Feb 22, 2024
@chicoreus
Copy link
Collaborator

Fixing name and label to reflect this being an issue rather than a validation.

@ArthurChapman
Copy link
Collaborator Author

@chicoreus - We don't have any tests expressed in the negative - we had years of discussion on this. You've changed this to negative! To fit with ALL other tests - it should be called ISSUE_COORDINATEPRECISION_LIKELY. The Expected Response stays as is.

@chicoreus
Copy link
Collaborator

chicoreus commented Feb 22, 2024 via email

@ArthurChapman
Copy link
Collaborator Author

In that case, we will need to change #29, #72, #94

@chicoreus
Copy link
Collaborator

I should rephrase, all issues should be phrased to point out what the issue is, that is what is the thing that is a problem.

The following are correct:

#293 ISSUE_COORDINATEPRECISION_UNLIKELY

ISSUE_COORDINATES_OUTSIDEEXPERTRANGE #292 (fixed from IN to OUTSIDE)

ISSUE_OUTLIER_DETECTED #291

ISSUE_COORDINATES_CENTEROFCOUNTRY #287

ISSUE_ESTABLISHMENTMEANS_NOTEMPTY #94 (correctly phrased, if there is a value in the term, then the data may lack quality for what organisms occur when uses, (but have value for studies of introduction of taxa)).

ISSUE_DATAGENERALIZATIONS_NOTEMPTY #72 (likewise, if there is a value in dataGeneralizations, then there may be a quality issue for use of the data (depending on the need for precision and the value of the dataGeneralizations))

ISSUE_ANNOTATION_NOTEMPTY #29 (likewise correctly phrases, if an annotation exists it might point to a data quality concern).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Conformance Immature/Incomplete A test where substantial work is needed to develop the specification to the point where the test ca Issue A potential issue Parameterized Test requires a parameter SPACE Test Tests created by TG2, either CORE, Supplementary or DO NOT IMPLEMENT TG2 VOCABULARY
Projects
None yet
Development

No branches or pull requests

5 participants