-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: geography constructor from geoarrow #49
Merged
Merged
Changes from all commits
Commits
Show all changes
18 commits
Select commit
Hold shift + click to select a range
577821e
ENH: geography constructor from geoarrow
jorisvandenbossche 703707a
add support for planar and oriented keywords
jorisvandenbossche 08fa01b
make arguments position/keyword only
jorisvandenbossche 6778ae1
only build function for recent s2geography
jorisvandenbossche a01dc4b
TEMP test with my branch of s2geography
jorisvandenbossche d69b272
init_geoarrow only when available
jorisvandenbossche 03a07bf
fix env
jorisvandenbossche 8716817
add geometry_encoding option for WKT/WKB without extenstion type
jorisvandenbossche 5c4367d
correct test name
jorisvandenbossche dc993aa
separate Arrow ABI to separate file + include before s2geography
jorisvandenbossche 09be68a
Revert "TEMP test with my branch of s2geography"
jorisvandenbossche 37cb4ef
Merge remote-tracking branch 'upstream/main' into geoarrow-input
jorisvandenbossche 4784c18
clean-up + add tessellate_tolerance
jorisvandenbossche 1b43efd
input validation
jorisvandenbossche b2ff23d
importorksip for pyarrow
jorisvandenbossche 6b88d22
Merge remote-tracking branch 'upstream/main' into geoarrow-input
jorisvandenbossche 4a58567
Update src/geoarrow.cpp
jorisvandenbossche 829646f
re-raise RuntimeError as ValueError
jorisvandenbossche File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -13,3 +13,4 @@ dependencies: | |
- ninja | ||
- pytest | ||
- pip | ||
- geoarrow-pyarrow |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -14,3 +14,4 @@ dependencies: | |
- ninja | ||
- pytest | ||
- pip | ||
- geoarrow-pyarrow |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -71,3 +71,4 @@ Input/Output | |
|
||
from_wkt | ||
to_wkt | ||
from_geoarrow |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,57 @@ | ||
#pragma once | ||
|
||
#include <stdint.h> | ||
|
||
#ifdef __cplusplus | ||
extern "C" { | ||
#endif | ||
|
||
// Extra guard for versions of Arrow without the canonical guard | ||
#ifndef ARROW_FLAG_DICTIONARY_ORDERED | ||
|
||
#ifndef ARROW_C_DATA_INTERFACE | ||
#define ARROW_C_DATA_INTERFACE | ||
|
||
#define ARROW_FLAG_DICTIONARY_ORDERED 1 | ||
#define ARROW_FLAG_NULLABLE 2 | ||
#define ARROW_FLAG_MAP_KEYS_SORTED 4 | ||
|
||
struct ArrowSchema { | ||
// Array type description | ||
const char* format; | ||
const char* name; | ||
const char* metadata; | ||
int64_t flags; | ||
int64_t n_children; | ||
struct ArrowSchema** children; | ||
struct ArrowSchema* dictionary; | ||
|
||
// Release callback | ||
void (*release)(struct ArrowSchema*); | ||
// Opaque producer-specific data | ||
void* private_data; | ||
}; | ||
|
||
struct ArrowArray { | ||
// Array data description | ||
int64_t length; | ||
int64_t null_count; | ||
int64_t offset; | ||
int64_t n_buffers; | ||
int64_t n_children; | ||
const void** buffers; | ||
struct ArrowArray** children; | ||
struct ArrowArray* dictionary; | ||
|
||
// Release callback | ||
void (*release)(struct ArrowArray*); | ||
// Opaque producer-specific data | ||
void* private_data; | ||
}; | ||
|
||
#endif // ARROW_C_DATA_INTERFACE | ||
#endif // ARROW_FLAG_DICTIONARY_ORDERED | ||
|
||
#ifdef __cplusplus | ||
} | ||
#endif |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,127 @@ | ||
#include <s2geography.h> | ||
|
||
#include "arrow_abi.h" | ||
#include "constants.hpp" | ||
#include "creation.hpp" | ||
#include "geography.hpp" | ||
#include "pybind11.hpp" | ||
|
||
namespace py = pybind11; | ||
namespace s2geog = s2geography; | ||
using namespace spherely; | ||
|
||
py::array_t<PyObjectGeography> from_geoarrow(py::object input, | ||
bool oriented, | ||
bool planar, | ||
float tessellate_tolerance, | ||
py::object geometry_encoding) { | ||
if (!py::hasattr(input, "__arrow_c_array__")) { | ||
throw std::invalid_argument( | ||
"input should be an Arrow-compatible array object (i.e. has an '__arrow_c_array__' " | ||
"method)"); | ||
} | ||
py::tuple capsules = input.attr("__arrow_c_array__")(); | ||
py::capsule schema_capsule = capsules[0]; | ||
py::capsule array_capsule = capsules[1]; | ||
|
||
const ArrowSchema* schema = static_cast<const ArrowSchema*>(schema_capsule); | ||
const ArrowArray* array = static_cast<const ArrowArray*>(array_capsule); | ||
|
||
s2geog::geoarrow::Reader reader; | ||
std::vector<std::unique_ptr<s2geog::Geography>> s2geog_vec; | ||
|
||
s2geog::geoarrow::ImportOptions options; | ||
options.set_oriented(oriented); | ||
if (planar) { | ||
auto tol = S1Angle::Radians(tessellate_tolerance / EARTH_RADIUS_METERS); | ||
options.set_tessellate_tolerance(tol); | ||
} | ||
if (geometry_encoding.is(py::none())) { | ||
try { | ||
reader.Init(schema, options); | ||
} catch (const std::exception& ex) { | ||
// re-raise RuntimeError as ValueError | ||
throw py::value_error(ex.what()); | ||
} | ||
} else if (geometry_encoding.equal(py::str("WKT"))) { | ||
reader.Init(s2geog::geoarrow::Reader::InputType::kWKT, options); | ||
} else if (geometry_encoding.equal(py::str("WKB"))) { | ||
reader.Init(s2geog::geoarrow::Reader::InputType::kWKB, options); | ||
} else { | ||
throw std::invalid_argument("'geometry_encoding' should be one of None, 'WKT' or 'WKB'"); | ||
} | ||
|
||
try { | ||
reader.ReadGeography(array, 0, array->length, &s2geog_vec); | ||
} catch (const std::exception& ex) { | ||
// re-raise RuntimeError as ValueError | ||
throw py::value_error(ex.what()); | ||
} | ||
|
||
// Convert resulting vector to array of python objects | ||
auto result = py::array_t<PyObjectGeography>(array->length); | ||
py::buffer_info rbuf = result.request(); | ||
py::object* rptr = static_cast<py::object*>(rbuf.ptr); | ||
|
||
py::ssize_t i = 0; | ||
for (auto& s2geog_ptr : s2geog_vec) { | ||
rptr[i] = make_py_geography(std::move(s2geog_ptr)); | ||
i++; | ||
} | ||
return result; | ||
} | ||
|
||
void init_geoarrow(py::module& m) { | ||
m.def("from_geoarrow", | ||
&from_geoarrow, | ||
py::arg("input"), | ||
py::pos_only(), | ||
py::kw_only(), | ||
py::arg("oriented") = false, | ||
py::arg("planar") = false, | ||
py::arg("tessellate_tolerance") = 100.0, | ||
py::arg("geometry_encoding") = py::none(), | ||
R"pbdoc( | ||
Create an array of geographies from an Arrow array object with a GeoArrow | ||
extension type. | ||
|
||
See https://geoarrow.org/ for details on the GeoArrow specification. | ||
|
||
This functions accepts any Arrow array object implementing | ||
the `Arrow PyCapsule Protocol`_ (i.e. having an ``__arrow_c_array__`` | ||
method). | ||
|
||
.. _Arrow PyCapsule Protocol: https://arrow.apache.org/docs/format/CDataInterface/PyCapsuleInterface.html | ||
|
||
Parameters | ||
---------- | ||
input : pyarrow.Array, Arrow array | ||
Any array object implementing the Arrow PyCapsule Protocol | ||
(i.e. has a ``__arrow_c_array__`` method). The type of the array | ||
should be one of the geoarrow geometry types. | ||
oriented : bool, default False | ||
Set to True if polygon ring directions are known to be correct | ||
(i.e., exterior rings are defined counter clockwise and interior | ||
rings are defined clockwise). | ||
By default (False), it will return the polygon with the smaller | ||
area. | ||
planar : bool, default False | ||
If set to True, the edges of linestrings and polygons are assumed | ||
to be linear on the plane. In that case, additional points will | ||
be added to the line while creating the geography objects, to | ||
ensure every point is within 100m of the original line. | ||
By default (False), it is assumed that the edges are spherical | ||
(i.e. represent the shortest path on the sphere between two points). | ||
tessellate_tolerance : float, default 100.0 | ||
The maximum distance in meters that a point must be moved to | ||
satisfy the planar edge constraint. This is only used if `planar` | ||
is set to True. | ||
geometry_encoding : str, default None | ||
By default, the encoding is inferred from the GeoArrow extension | ||
type of the input array. | ||
However, for parsing WKT and WKB it is also possible to pass an | ||
Arrow array without geoarrow type but with a plain string or | ||
binary type, if specifying this keyword with "WKT" or "WKB", | ||
respectively. | ||
)pbdoc"); | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,111 @@ | ||
from packaging.version import Version | ||
|
||
import numpy as np | ||
|
||
import pytest | ||
|
||
import spherely | ||
|
||
|
||
pytestmark = pytest.mark.skipif( | ||
Version(spherely.__s2geography_version__) < Version("0.2.0"), | ||
reason="Needs s2geography >= 0.2.0", | ||
) | ||
|
||
pa = pytest.importorskip("pyarrow") | ||
ga = pytest.importorskip("geoarrow.pyarrow") | ||
|
||
|
||
def test_from_geoarrow_wkt(): | ||
|
||
arr = ga.as_wkt(["POINT (1 1)", "POINT(2 2)", "POINT(3 3)"]) | ||
|
||
result = spherely.from_geoarrow(arr) | ||
expected = spherely.points([1, 2, 3], [1, 2, 3]) | ||
# object equality does not yet work | ||
# np.testing.assert_array_equal(result, expected) | ||
assert spherely.equals(result, expected).all() | ||
|
||
# without extension type | ||
arr = pa.array(["POINT (1 1)", "POINT(2 2)", "POINT(3 3)"]) | ||
result = spherely.from_geoarrow(arr, geometry_encoding="WKT") | ||
assert spherely.equals(result, expected).all() | ||
|
||
|
||
def test_from_geoarrow_wkb(): | ||
|
||
arr = ga.as_wkt(["POINT (1 1)", "POINT(2 2)", "POINT(3 3)"]) | ||
arr_wkb = ga.as_wkb(arr) | ||
|
||
result = spherely.from_geoarrow(arr_wkb) | ||
expected = spherely.points([1, 2, 3], [1, 2, 3]) | ||
assert spherely.equals(result, expected).all() | ||
|
||
# without extension type | ||
arr_wkb = ga.as_wkb(["POINT (1 1)", "POINT(2 2)", "POINT(3 3)"]) | ||
arr = arr_wkb.cast(pa.binary()) | ||
result = spherely.from_geoarrow(arr, geometry_encoding="WKB") | ||
assert spherely.equals(result, expected).all() | ||
|
||
|
||
def test_from_geoarrow_native(): | ||
|
||
arr = ga.as_wkt(["POINT (1 1)", "POINT(2 2)", "POINT(3 3)"]) | ||
arr_point = ga.as_geoarrow(arr) | ||
|
||
result = spherely.from_geoarrow(arr_point) | ||
expected = spherely.points([1, 2, 3], [1, 2, 3]) | ||
assert spherely.equals(result, expected).all() | ||
|
||
|
||
polygon_with_bad_hole_wkt = ( | ||
"POLYGON " | ||
"((20 35, 10 30, 10 10, 30 5, 45 20, 20 35)," | ||
"(30 20, 20 25, 20 15, 30 20))" | ||
) | ||
|
||
|
||
def test_from_geoarrow_oriented(): | ||
# by default re-orients the inner ring | ||
arr = ga.as_geoarrow([polygon_with_bad_hole_wkt]) | ||
|
||
result = spherely.from_geoarrow(arr) | ||
assert ( | ||
str(result[0]) | ||
== "POLYGON ((20 35, 10 30, 10 10, 30 5, 45 20, 20 35), (20 15, 20 25, 30 20, 20 15))" | ||
) | ||
|
||
# if we force to not orient, we get an error | ||
with pytest.raises(ValueError, match="Inconsistent loop orientations detected"): | ||
spherely.from_geoarrow(arr, oriented=True) | ||
|
||
|
||
def test_from_wkt_planar(): | ||
arr = ga.as_geoarrow(["LINESTRING (-64 45, 0 45)"]) | ||
result = spherely.from_geoarrow(arr) | ||
assert spherely.distance(result, spherely.point(-30.1, 45)) > 10000 | ||
|
||
result = spherely.from_geoarrow(arr, planar=True) | ||
assert spherely.distance(result, spherely.point(-30.1, 45)) < 100 | ||
|
||
result = spherely.from_geoarrow(arr, planar=True, tessellate_tolerance=10) | ||
assert spherely.distance(result, spherely.point(-30.1, 45)) < 10 | ||
|
||
|
||
def test_from_geoarrow_no_extension_type(): | ||
arr = pa.array(["POINT (1 1)", "POINT(2 2)", "POINT(3 3)"]) | ||
|
||
with pytest.raises(ValueError, match="Expected extension type"): | ||
spherely.from_geoarrow(arr) | ||
|
||
|
||
def test_from_geoarrow_invalid_encoding(): | ||
arr = pa.array(["POINT (1 1)", "POINT(2 2)", "POINT(3 3)"]) | ||
|
||
with pytest.raises(ValueError, match="'geometry_encoding' should be one"): | ||
spherely.from_geoarrow(arr, geometry_encoding="point") | ||
|
||
|
||
def test_from_geoarrow_no_arrow_object(): | ||
with pytest.raises(ValueError, match="input should be an Arrow-compatible array"): | ||
spherely.from_geoarrow(np.array(["POINT (1 1)"], dtype=object)) |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure which is best, though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, given this is for geoarrow IO and in geoarrow we generally use the "geometry" term, I thought to keep that as well for the keyword ... But either way it is not ideal.