-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Spatial data support #7859
Comments
There are no plans that i know of yet. In theory, we should be able to create an extension package, much like the duckdb model, rather than extending the core DataFusion engine. I suspect there would be certain things that are not yet feasible (like adding a Perhaps we can do something similar for the JSON/BSON support we are discussing in #7845 |
Thanks @alamb. Here is my example of handling geometry parquet data: apache/arrow-rs#4945 |
Since it hasn't been mentioned yet, I'd add there is already a project for Arrow extension types for geospatial data: https://github.com/geoarrow/geoarrow/blob/main/extension-types.md This is related to the GeoParquet project. |
Thanks @wjones127 -- I had forgotten about extension types. Maybe we could add support for extension types in DataFusion's core and use that extension point to implement a geospatial package on top of DataFusion 🤔 Having a good first use case (Geospatial and possible JSON) to drive the requirements seems like a good idea. If you agree, I can try and write up a larger project description |
@alamb I have the same requirement as well and hope to initiate it as soon as possible. If possible, I can also contribute code for this. |
That is great news @yukkit -- I don't think I have the bandwidth to try and organize an effort to add Geospatial support to DataFusion in the near term. I wonder if anyone has the bandwidth to help organize an effort to add extension type support? I don't know enough about how this works to really do so without additional research, and sadly I don't have the time at the moment to devote there |
@alamb Ok, if possible, I plan to support UDT (user-defined type) in datafusion, I will paste my ideas in the next few days for anyone to discuss |
I would love to see a design proposal for user defined types. ❤️ -- thank you! |
Of course, it's absolutely essential! |
My goal is to enable spatial support in projects such as datafusion via https://github.com/geoarrow/geoarrow-rs |
As noted in #13797 (comment) spatial data support is now unblocked by using a dense union to support multiple geometry types! Progress is now underway in https://github.com/geoarrow/geoarrow-rs/tree/main/rust/geodatafusion to match as much of the PostGIS API as is possible. Having true extension types (in the sense of #12644) will be massively beneficial, because then we can keep track of the data's coordinate reference system and use more memory-efficient types where possible. But at least this unblocks us for the time being. Contributions are welcome 🙂 |
Is your feature request related to a problem or challenge?
Currently,
datafusion
does not support spatial data, any plan to support this?Describe the solution you'd like
Similar to duckdb: https://duckdb.org/docs/extensions/spatial.html
Describe alternatives you've considered
Duckdb
Additional context
https://cloud.google.com/bigquery/docs/geospatial-data
The text was updated successfully, but these errors were encountered: