Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Define Geospatial Accepted Formats for DCAT-US #5010

Open
1 task
jbrown-xentity opened this issue Dec 9, 2024 · 7 comments
Open
1 task

Define Geospatial Accepted Formats for DCAT-US #5010

jbrown-xentity opened this issue Dec 9, 2024 · 7 comments
Assignees

Comments

@jbrown-xentity
Copy link
Contributor

User Story

In order to support data providers and questions around DCAT-US accepted spatial field values, data.gov admins want a detailed list and test/example use cases for what should be valid spatial values for DCAT-US.

Acceptance Criteria

[ACs should be clearly demoable/verifiable whenever possible. Try specifying them using BDD.]

  • GIVEN a DCAT-US dataset object with a spatial field filled out
    WHEN that field is examined
    THEN it is clear whether that format is supported or not.

Background

Current examples: https://github.com/GSA/ckanext-datajson/tree/main/ckanext/datajson/tests/datajson-samples

Security Considerations (required)

None

Sketch

We need to fully define the list of acceptable sources. The current logic of support is here: https://github.com/GSA/ckanext-geodatagov/blob/main/ckanext/geodatagov/logic.py#L445-L515
Need to start the list of use cases, and decide if the envelope use case (see here) is an acceptable format and should be included.
Make sure every test case is defined.

@rshewitt rshewitt moved this to 🏗 In Progress [8] in data.gov team board Dec 12, 2024
@rshewitt rshewitt self-assigned this Dec 12, 2024
@rshewitt
Copy link
Contributor

the schema error occurs here in the datajson extension

@rshewitt
Copy link
Contributor

rshewitt commented Dec 12, 2024

What does DCATUS define as valid "spatial" values? Of those, what do we support? spec

the "spatial" field is optional in DCATUS 1.1. if it exists it must be a string with at least 1 character.

  • a bounding coordinate box for the dataset represented in latitude / longitude pairs where the coordinates are specified in decimal degrees and in the order of: minimum longitude, minimum latitude, maximum longitude, maximum latitude
    • 1.0,2.0,3.5,5.5
  • a latitude / longitude pair (in decimal degrees) representing a point where the dataset is relevant
  • a geographic feature expressed in Geography Markup Language using the Simple Features Profile
  • a geographic feature from the GeoNames database
    • United States
    • California

Other cases

  • If the input can be JSON deserialized and it's a list of 2 points ( e.g. [[3,4],[5,6]]) otherwise return the string as-is. for example,
two_points = "[[3,4],[5,6]]"
translate_spatial(two_points) # returns => '{"type": "Polygon", "coordinates": [[[3, 4], [3, 6], [5, 6], [5, 4], [3, 4]]]}'

geojson = '{"type":"Polygon","coordinates":[[[-124.3926,32.5358],[-124.3926,42.0022],[-114.1252,42.0022],[-114.1252,32.5358],[-124.3926,32.5358]]]}'
translate_spatial(geojson) # returns => same as input

just because the input can be JSON deserialized doesn't mean it's compatible with solr

we could check if the input is valid geojson instead of letting solr complain when something is incompatible (assuming this happens but the point being some downstream process complains)

import geojson

data = '{"type":"Polygon","coordinates":[[[-124.3926,32.5358],[-124.3926,42.0022],[-114.1252,42.0022],[-114.1252,32.5358],[-124.3926,32.5358]]]}'

geojson.loads(data) # => doesn't throw an exception which means it's valid

Conclusion

  • we support the following string formats
    • "minX, minY, maxX, maxY"
    • "[ [ minX, minY ], [ maxX, maxY ] ]"
    • a geonames value
    • anything that can be JSON deserialized

@tdlowden
Copy link
Member

I don't know how to interpret what

means. docs in GML mention both simplePolygon and gridEnvelope, but is envelope not valid bc it's not... simple?

@tdlowden
Copy link
Member

tdlowden commented Dec 12, 2024

simple features profile: https://portal.ogc.org/files/?artifact_id=39853

image

@rshewitt
Copy link
Contributor

rshewitt commented Dec 12, 2024

"spatial" is optional in all dcatus schemas but if present needs to a string with at least 1 character in it. if the spatial data is an object like the 3rd example in this source ( control+f "spatial" and navigate to it ) then validation will fail. dcatus specifies a JSON object as an acceptable value in some circumstances which is different from a string. basically, the root of a common problem we see ( e.g. "ERROR #2: 'spatial':{'coordinates': [[-78.9823, 35.5216], [-78.2607, 36.0742]], 'type': 'envelope'} is not valid under any of the given schemas" ) isn't deep. they're not providing the correct data type.

@tdlowden
Copy link
Member

Understood. The issue here is the source you cited is from arcGIS and specifically is available to export saying it DOES abide by the DCAT-US format

image

So regarding this envelope type.... do we need to ask ESRI to adapt?

@rshewitt
Copy link
Contributor

as long as we're using solr for search we have to conform to what it supports. "spatial" data expressed as geojson (e.g. {"type": "Polygon", "coordinates": [[[10.0, 0.0], [10.0, 5.0], [15.0, 5.0], [15.0, 0.0], [10.0, 0.0]]]}) must be one of these types

  • "Point"
  • "MultiPoint"
  • "LineString"
  • "MultiLineString"
  • "Polygon"
  • "MultiPolygon"
  • "GeometryCollection"

We can add support for translating a geojson "envelope" in [ southwestPnt, northeastPnt ] format into a polygon compatible with solr but until that happens the data provider needs to update the value to something we support so they would have to convert

# from this
{'coordinates': [[-78.9823, 35.5216], [-78.2607, 36.0742]], 'type': 'envelope'}

# into this
"""{
    "type": "Polygon",
    "coordinates": [
      [[-78.9823, 35.5216], [-78.2607, 35.5216], [-78.2607, 36.0742], [-78.9823, 36.0742], [-78.9823, 35.5216]]
    ]
}"""

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: 🏗 In Progress [8]
Development

No branches or pull requests

3 participants