Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Spatial data support #7859

Open
Folyd opened this issue Oct 18, 2023 · 12 comments
Open

Spatial data support #7859

Folyd opened this issue Oct 18, 2023 · 12 comments
Labels
enhancement New feature or request

Comments

@Folyd
Copy link
Contributor

Folyd commented Oct 18, 2023

Is your feature request related to a problem or challenge?

Currently, datafusion does not support spatial data, any plan to support this?

Describe the solution you'd like

Similar to duckdb: https://duckdb.org/docs/extensions/spatial.html

Describe alternatives you've considered

Duckdb

Additional context

https://cloud.google.com/bigquery/docs/geospatial-data

@Folyd Folyd added the enhancement New feature or request label Oct 18, 2023
@alamb
Copy link
Contributor

alamb commented Oct 18, 2023

There are no plans that i know of yet.

In theory, we should be able to create an extension package, much like the duckdb model, rather than extending the core DataFusion engine.

I suspect there would be certain things that are not yet feasible (like adding a GEOMETRY type / alias for example) but otherwise the existiing extension points for DataFusion should be sufficient (ScalarUDFs, AggregateUDFs, etc)

Perhaps we can do something similar for the JSON/BSON support we are discussing in #7845

@Folyd
Copy link
Contributor Author

Folyd commented Oct 19, 2023

Thanks @alamb.

Here is my example of handling geometry parquet data: apache/arrow-rs#4945
There are Geoparquet format in the community: https://geoparquet.org/
Also see: https://getindata.com/blog/introducing-geoparquet-data-format/

@wjones127
Copy link
Member

Since it hasn't been mentioned yet, I'd add there is already a project for Arrow extension types for geospatial data: https://github.com/geoarrow/geoarrow/blob/main/extension-types.md

This is related to the GeoParquet project.

@alamb
Copy link
Contributor

alamb commented Oct 19, 2023

Thanks @wjones127 -- I had forgotten about extension types.

Maybe we could add support for extension types in DataFusion's core and use that extension point to implement a geospatial package on top of DataFusion 🤔

Having a good first use case (Geospatial and possible JSON) to drive the requirements seems like a good idea.

If you agree, I can try and write up a larger project description

@yukkit
Copy link
Contributor

yukkit commented Oct 19, 2023

@alamb I have the same requirement as well and hope to initiate it as soon as possible. If possible, I can also contribute code for this.

@alamb alamb changed the title Spatial data support [EPIC] Spatial data support Oct 20, 2023
@alamb alamb changed the title [EPIC] Spatial data support Spatial data support? Oct 20, 2023
@alamb alamb changed the title Spatial data support? [Epic] Spatial data support? Oct 20, 2023
@alamb alamb changed the title [Epic] Spatial data support? Spatial data support Oct 20, 2023
@alamb
Copy link
Contributor

alamb commented Oct 20, 2023

@alamb I have the same requirement as well and hope to initiate it as soon as possible. If possible, I can also contribute code for this.

That is great news @yukkit -- I don't think I have the bandwidth to try and organize an effort to add Geospatial support to DataFusion in the near term. I wonder if anyone has the bandwidth to help organize an effort to add extension type support? I don't know enough about how this works to really do so without additional research, and sadly I don't have the time at the moment to devote there

@yukkit
Copy link
Contributor

yukkit commented Oct 23, 2023

@alamb Ok, if possible, I plan to support UDT (user-defined type) in datafusion, I will paste my ideas in the next few days for anyone to discuss

@alamb
Copy link
Contributor

alamb commented Oct 23, 2023

I would love to see a design proposal for user defined types. ❤️ -- thank you!

@yukkit
Copy link
Contributor

yukkit commented Oct 24, 2023

I would love to see a design proposal for user defined types. ❤️ -- thank you!

Of course, it's absolutely essential!

@kylebarron
Copy link
Contributor

My goal is to enable spatial support in projects such as datafusion via https://github.com/geoarrow/geoarrow-rs

@kylebarron
Copy link
Contributor

I'd argue that spatial data support is pretty much blocked until datafusion has support for user-defined types, since it's pretty crucial to pass along GeoArrow metadata, so it's really exciting to see #11513 / #11160 !

@kylebarron
Copy link
Contributor

As noted in #13797 (comment) spatial data support is now unblocked by using a dense union to support multiple geometry types! Progress is now underway in https://github.com/geoarrow/geoarrow-rs/tree/main/rust/geodatafusion to match as much of the PostGIS API as is possible.

Having true extension types (in the sense of #12644) will be massively beneficial, because then we can keep track of the data's coordinate reference system and use more memory-efficient types where possible. But at least this unblocks us for the time being.

Contributions are welcome 🙂

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

5 participants