Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add PointCloud subclass of Vector #492

Open
wants to merge 11 commits into
base: main
Choose a base branch
from

Conversation

rhugonnet
Copy link
Member

@rhugonnet rhugonnet commented Feb 16, 2024

This PR adds the PointCloud class to facilitate interfacing with this special sub-type of vector data very common in geospatial analysis.
In short, a point cloud is a subclass of Vector containing only 2D point geometries and associated to a main data column (+ optionally other auxiliary data columns). This main data column represents the values .data of the point cloud, that can be compared to the .data of a raster format (which is impossible for a typical vector), or of that of another point cloud if some of their coordinates are the same, or within close tolerance.

Additionally, all functionalities of Raster using .data that just need coordinates can then be ported to PointCloud (array interface with NumPy, subsampling, binning, zonal stats, variography, etc).

Summary

The idea behind the implementation of the PointCloud class is two-fold:

  1. Substantially improve the interfacing of point clouds with vectors or raster, which are currently based on passing volatile arguments of "coordinates" (that could be defined with many shapes/formats) or returning volative 1D arrays of data values associated with no coordinates: https://geoutils.readthedocs.io/en/stable/raster_vector_point.html#rasterpoint-operations.
  2. Natively support large point cloud files of type LAS that are also used widely, are not supported by GeoPandas (didn't find a single issue on this, not planned at all), which can benefit from the same load mechanism as Raster and support chunked reading.

For this, we add the new optional dependency laspy for reading and writing LAS-type format. And we override some of the behaviour of Vector in PointCloud to allow for a load() functionality.

Details

This PR adds the PointCloud class, including:

  • A new data_column attribute containing the data column name of the point cloud,
  • A new data attribute corresponding to the data of the point cloud,
  • A new all_columns attribute containing all data columns of the point cloud (Vector.columns also contains geometry),
  • A new is_loaded attribute and load() method for when data is read from a LAS file,
  • A new nb_points attribute returning the number of points in the point cloud,
  • A new set of from_array(), from_tuples() and from_xyz() methods to easily create a point cloud from different inputs, and their equivalent to_array(), to_tuples() and to_xyz(),
  • A new pointcloud_equal() method simply checking vector_equal() and that the data_column is equal as well,
  • Wraps the existing grid() functionality as a class method.

The new PointCloud class overrides the attributes ds, crs, bounds and columns to allow loading only metadata, and implicitly loading the point cloud.

This PR updates existing input/output of some functions:

  • Raster.to_pointcloud() now returns a PointCloud instead of a Vector.

Left to-do for this PR

Small changes:

  • Add LAS file and Point-only vector file in geoutils-data to run tests,
  • Update Raster.from_regular_pointcloud() to accept a PointCloud input,
  • Update Raster.interp_points() and Raster.reduce_points() to accept a PointCloud as input for coordinates,
  • Update Vector.create_mask() to accept a PointCloud the manipulation of masking and booleans for point clouds,
  • Mirror every function relying only on adimensional .data (same whether 1D or 2D) to point cloud: subsample(), get_stats() (and function that work on 1D sets of coordinates like variogram(), once moved to GeoUtils).
  • Add __array_interface__ support for point clouds with the same coordinates, simply pointing to that of self.ds[data_column]?
  • Populate documentation page "The georeferenced point cloud", update the "Raster-Point interface" sections, update Quick start and Feature overview, and add a gallery example?

Potential bigger change?

This last change could be to:

  • Allow for arithmetic operations between Raster and PointCloud? (e.g., defaulting to using interp_points at the PointCloud coordinates, the comparison method could be tweaked in geoutils.config),

It could work as:

pc = PointCloud("pc_example.shp")
raster = Raster("rast_example.tif")

# An arithmetic operation between a raster and point cloud would always return a PointCloud
pc_diff = pc - raster

# Users could change the behaviour in the config
geoutils.config["raster_point.comparison_method"] = "reduce_points"
geoutils.config["raster_point.resampling"] = "cubic"

The only problem I see is that interpolation is not ideal for a point cloud much denser than a raster (e.g., a lidar point cloud). In that case, it's better to grid the point cloud into a raster, and compare the two rasters. But that is computationally-intensive, so happening under-the-hood is not ideal.
We could have a criteria based on point cloud density relative to the raster size:

  • Above a certain density, interpolate raster at the points; below that, grid the points,
  • Or always follow the same behaviour, but raise a warning if the point density is not ideal?

Or we do not add such a functionality and always leave this to the user, which remains fairly short as long as we support Raster/Raster arithmetic and PointCloud/PointCloud arithmetic:

rast_from_pc = pc.grid(rast)
my_op = (rast + rast_from_pc) / 4

pc_from_rast = rast.interp_points(pc)
my_op = (pc_from_rast + pc) / 4

And add a public function to calculate relative point density to help users decide when need be?

Future functionalities

Resolves #463

@rhugonnet rhugonnet marked this pull request as ready for review December 6, 2024 23:40
@rhugonnet
Copy link
Member Author

@adehecq @atedstone @erikmannerfelt This PR is now advanced enough to get your feedback! 😊
I think it's good we discuss it now before moving further forward, documenting and finalizing, to be sure we agree on all the concepts and behaviour introduced! I updated the description above to a good level of detail, which should allow you to grasp the idea behind this new object 😉.

All tests passing locally with a LAS file I have. Will need to add one to geoutils-data for CI to pass.

@atedstone
Copy link
Member

@rhugonnet , this looks exceptionally well thought through. I read your descriptions and the changed files,but have not tried out the functionality myself as I don't immediately have files lying around that would be a good test case. But from my side, this looks good to progress to finishing the outstanding tasks you listed.

Concerning the bigger possible change about arithmetic ops between Raster and PointCloud: my instinct would be to require the user to decide, so that the comparison is explicit and clear.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add a PointCloud subclass of Vector to consistently deal with points?
2 participants