-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TG2-ISSUE_COORDINATES_CENTEROFCOUNTRY #287
Comments
@jhnwllr Could you check this TEST please? Is there an API that we can link to? |
Is the spatial buffer dependent on the size of the country? Is the spatial buffer dependent on a combination of the size of the country and the resolution of the country shape spatial data? |
The spatial buffer is set as a default - under Parameterized - people can put different value if they wish. 3000 meters thought to be a good value given work carried out by John Waller. @jhnwllr replied separately as I have now separated out PCL1 and ADM1 types into separate files. "I use PCL1 as a politically neutral name for "countries". So see this file I just generated for "countries". There isn't yet an API endpoint which just lists the centroids GBIF is using, but you can use occurrence search to get a "list of the centroids with occurrences" so to speak. |
Source Authority and Notes updated following advice from @jhnwllr above. |
Is this now an Immature/Incomplete or something else? If the former, we need to start adding relevant Notes. |
I think this is Supplementary - given that we do have a good SourceAuthority. ALthough there is not an API at the moment, the link that @jhnwllr is an alternative that should work. |
Should be straightforward to implement without an API given https://raw.githubusercontent.com/jhnwllr/catalogue-of-centroids/master/PCLI.tsv, ask if the coordinate is near one of the points given for the country code in that file. Propose changing from: EXTERNAL_PREREQUISITES_NOT_MET if the bdq:sourceAuthority is not available; INTERNAL_PREREQUISITES_NOT_MET if any of dwc:country, dwc:decimalLatitude, dwc:decimalLongitude are EMPTY; POTENTIAL_ISSUE if the geographic coordinates are within the distance given by bdq:spatialBufferInMeters from the center of the supplied dwc:country as represented in the bdq:sourceAuthority; otherwise NOT_ISSUE. to: EXTERNAL_PREREQUISITES_NOT_MET if the bdq:sourceAuthority is not available; INTERNAL_PREREQUISITES_NOT_MET if any of dwc:country, dwc:decimalLatitude, dwc:decimalLongitude are EMPTY or if dwc:geodeticDatum is not EPSG:4326; POTENTIAL_ISSUE if the geographic coordinates are within the distance given by bdq:spatialBufferInMeters from the center (or one of the centers, if the bdq:sourceAuthority provides more than one per country code) of the supplied dwc:countryCode as represented in the bdq:sourceAuthority; otherwise NOT_ISSUE. Remove country as an information element, just use dwc:countryCode as consulted. Alternately: EXTERNAL_PREREQUISITES_NOT_MET if the bdq:sourceAuthority is not available; INTERNAL_PREREQUISITES_NOT_MET if any of dwc:country, dwc:decimalLatitude, dwc:decimalLongitude are EMPTY POTENTIAL_ISSUE if the geographic coordinates are within the distance given by bdq:spatialBufferInMeters from the center (or one of the centers, if the bdq:sourceAuthority provides more than one per country code) of the supplied dwc:countryCode as represented in the bdq:sourceAuthority; otherwise NOT_ISSUE. with a slightly larger spatial buffer to add in uncertainty from potential differences in the datum. |
Expected Response modified to cater for the possibility of more than one centroid, Specification Last Updated added, and Notes modified. Test made CORE rather than Supplementary as don't need an API, as we can use the file prepared by @jhnwllr |
…f-xml from the test specifications as of 2024-08-20 (AM) following discussions of issues in TG2 working meeting in Seattle. Adding #287 as core test. Regenerating human readable markdown lists of tests.
Expected response doesn't quite read right in the bits about multiple possible centers. Also needs to allow points centered on the country with a coordinateUnertaintyInMeters approximating the country, perhaps change from: EXTERNAL_PREREQUISITES_NOT_MET if the bdq:sourceAuthority is not available; INTERNAL_PREREQUISITES_NOT_MET if any of dwc:countryCode, dwc:decimalLatitude, dwc:decimalLongitude are EMPTY; POTENTIAL_ISSUE if the geographic coordinates are within the distance given by bdq:spatialBufferInMeters from the center (or one of the centers), of the bdq:sourceAuthority provides more than one per country code of the supplied dwc:countryCode as represented in the bdq:sourceAuthority; otherwise NOT_ISSUE. To: EXTERNAL_PREREQUISITES_NOT_MET if the bdq:sourceAuthority is not available; INTERNAL_PREREQUISITES_NOT_MET if any of dwc:countryCode, dwc:decimalLatitude, dwc:decimalLongitude are EMPTY; POTENTIAL_ISSUE if (1) the geographic coordinates are within the distance given by bdq:spatialBufferInMeters from the center of the supplied dwc:countryCode as represented in the bdq:sourceAuthority (or one of the centers, if the bdq:sourceAuthority provides more than one per country code) and (2) the dwc:coordinateUncertaintyInMeters is less than half the square root of the area of the country; otherwise NOT_ISSUE. Adding coordinateUncertaintyInMeters as an information element consulted. We could be more general about the coordinateUncertaintyInMeters being large, e.g. "large relative to the size of the country" and put the half the square root of the area in the notes. Square root of the area of the country is available in the default source authority, and wouldn't force us to add a spatial source authority for country boundaries (we could do that and phrase a coordinate uncertainty in meters that is less than the radius of a circle that the country fits into (which could be precalculated from country shape data), Square root of the area is a simple pragmatic way to estimate a large uncertainty relative to the size of the country that would make the behavior of the test consistent across implementations, and is provided in the default source authority. |
…uery Getty TGN, throwing source authority exceptions, more cleanup of handling source authorities with exceptions. Adding a stub for tdwg/bdq#287.
…troids PCLI file converted to a shape file. This was passing all but one row in @Tasilee's test validation data, but have also added the proposed test for large coordinate uncertainty relative to country size proposed as a change to the specification.
I don't think that works @chicoreus. If dwc:coordinateUncertaintyInMeters is EMPTY then NOT_ISSUE as you have (1) and (2) for POTENTIAL_ISSUE? |
@Tasilee good catch, needs explicit handling of empty for coordinateUncertaintyInMeters. How about: EXTERNAL_PREREQUISITES_NOT_MET if the bdq:sourceAuthority is not available; INTERNAL_PREREQUISITES_NOT_MET if any of dwc:countryCode, dwc:decimalLatitude, dwc:decimalLongitude are EMPTY; POTENTIAL_ISSUE if (1) the geographic coordinates are within the distance given by bdq:spatialBufferInMeters from the center of the supplied dwc:countryCode as represented in the bdq:sourceAuthority (or one of the centers, if the bdq:sourceAuthority provides more than one per country code) and (2) the dwc:coordinateUncertaintyInMeters is EMPTY or less than half the square root of the area of the country; otherwise NOT_ISSUE. |
I think that works @chicoreus - but then I have just flown half way around the world and may have brain fog! Just thinking of the cases where one has a country (e.g. Australia or Nova Hollandia - quite common) and a center of the country is given. In that case the half the square root of the area of the country - Square Root of the area of Australia is ~2,782 km - that is greater than the distance from the center to any part of mainland Australia - it works. Chile, I'm not so sure though being long and thin! |
@ArthurChapman in the PCLI country centroid data set, Chile has a area of about 736593 km², this would give a radius of 429 km, and the conclusion that a coordinate uncertainty in meters of larger than 429000 would be large relative to the country. That isn't being precise and asserting what coordinate uncertainty in meters would produce a circle that entirely encloses the country (for Chile, much of the country would be outside that circle), but it does feel like a good pragmatic estimator of uncertainties that are relatively large in comparison to the country. Alternative is to include another source authority for country shapes, and obtain values of radius of a circle that would contain the entire country from there, but then there will be uncertainties in how people representing uncertainties containing a country did so, and using the half square root of the area seems like a reasonable conservative estimator for large uncertainty relative to country size, which is, in essence, what we are trying to exclude from being flagged as potentially problematic here. |
It seems reasonable to improve the Expected Response from EXTERNAL_PREREQUISITES_NOT_MET if the bdq:sourceAuthority is not available; INTERNAL_PREREQUISITES_NOT_MET if any of dwc:countryCode, dwc:decimalLatitude, dwc:decimalLongitude are EMPTY; POTENTIAL_ISSUE if the geographic coordinates are within the distance given by bdq:spatialBufferInMeters from the center (or one of the centers), of the bdq:sourceAuthority provides more than one per country code of the supplied dwc:countryCode as represented in the bdq:sourceAuthority; otherwise NOT_ISSUE. to EXTERNAL_PREREQUISITES_NOT_MET if the bdq:sourceAuthority is not available; INTERNAL_PREREQUISITES_NOT_MET if any of dwc:countryCode, dwc:decimalLatitude, dwc:decimalLongitude are EMPTY; POTENTIAL_ISSUE if (1) the geographic coordinates are within the distance given by bdq:spatialBufferInMeters from the center of the supplied dwc:countryCode as represented in the bdq:sourceAuthority (or one of the centers, if the bdq:sourceAuthority provides more than one per country code) and (2) the dwc:coordinateUncertaintyInMeters is EMPTY or less than half the square root of the area of the country; otherwise NOT_ISSUE. NEEDS WORK?? |
I am happy with that. It will be interesting to see how it works in practice. Perhaps another, more complicated, way is to look at dwc:locality if it only contains a country name, but that would be difficult to work in practice. For example if the dwc:locality only said "Australia" or "Chile", but then you'd need to find all the synonyms "Nova Hollandia", etc. and country names at the time of the event and then use the centroid of those historical countries over time and that we don't have. It may be possible, but I think extremely difficult to do well. I am happy to use the @Tasilee suggestion and see what feedback one gets over time. |
I've added dwc:coordinateUncertaintyInMeters as an information element consulted for the new specification. I think we can take the needs work off. |
The text was updated successfully, but these errors were encountered: