Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use rtree spatial index from PlanetilerConfig bounds by default for Geopackages #635

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

bdon
Copy link
Contributor

@bdon bdon commented Jul 27, 2023

This is a first attempt at addressing #478 .

The existing issue is that all Shapefile or Geopackage features are read by every run, even if the --bounds argument is passed to limit the tile output or the bounds are determined implicitly based on the osm.pbf. To reduce the runtime we can:

  • Use a spatial index on the dataset
    • for Shapefile, we need .fbn, .fbx and/or qix sidecars depending on what software created the Shapefile. Maybe GeoTools can read all of these, but I'm not familiar with its internals.
    • for Geopackage we can read from one of the rtree_ indexes (A vanilla SQLite feature, no Spatialite required)
  • Only read from a passed list of tables, for example instead of iterating every Natural Earth table we pass only ne_50m_admin_0_boundary_lines_land as tables we care about. IMO, this doesn't reduce the runtime by much relative to spatial indexing, so it may be an unworthy optimization.

My proposal here is to not change any APIs, but change the default behavior of Geopackage to use the determined PlanetilerConfig bounds and apply the spatial index. Remaining todos:

  • determine if rtree_ indexes will always exist on gpkg or are optional
  • determine the correct behavior with GPKGs in different projections. Bounds in PlanetilerConfig are always LatLong, so if the GPKG data is LatLong, Web Mercator or another conformal projection we can use the index. Otherwise we must ignore the index.
  • Ultimately we need antimeridian handling. We can accomplish this by detecting a crossing and splitting it into two bboxes, but support for this would need to be pushed up to PlanetilerConfig itself.

Another relevant format would be FlatGeobuf, which has an optional spatial index as part of the single file, but I don't see key advantages of FGB over GPKG in most Planetiler use cases where data lives on disk.

@sonarqubecloud
Copy link

Kudos, SonarCloud Quality Gate passed!    Quality Gate passed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 0 Code Smells

100.0% 100.0% Coverage
0.0% 0.0% Duplication

@github-actions
Copy link

github-actions bot commented Jul 27, 2023

This Branch 866cab4 Base 4ecb02d
0:01:05 DEB [archive] - Tile stats:
0:01:05 DEB [archive] - Biggest tiles (gzipped)
1. 14/4942/6092 (157k) https://onthegomap.github.io/planetiler-demo/#14.5/41.82864/-71.40015 (poi:85k)
2. 9/154/190 (144k) https://onthegomap.github.io/planetiler-demo/#9.5/41.77078/-71.36719 (landcover:85k)
3. 10/308/380 (136k) https://onthegomap.github.io/planetiler-demo/#10.5/41.90214/-71.54297 (landcover:66k)
4. 10/308/381 (135k) https://onthegomap.github.io/planetiler-demo/#10.5/41.63994/-71.54297 (landcover:71k)
5. 14/4941/6092 (113k) https://onthegomap.github.io/planetiler-demo/#14.5/41.82864/-71.42212 (poi:65k)
6. 14/4941/6093 (112k) https://onthegomap.github.io/planetiler-demo/#14.5/41.81227/-71.42212 (building:62k)
7. 14/4940/6092 (101k) https://onthegomap.github.io/planetiler-demo/#14.5/41.82864/-71.44409 (building:92k)
8. 11/616/762 (98k) https://onthegomap.github.io/planetiler-demo/#11.5/41.7057/-71.63086 (landcover:71k)
9. 14/4942/6091 (96k) https://onthegomap.github.io/planetiler-demo/#14.5/41.84501/-71.40015 (building:79k)
10. 11/616/761 (95k) https://onthegomap.github.io/planetiler-demo/#11.5/41.83679/-71.63086 (landcover:72k)
0:01:05 DEB [archive] - Max tile sizes
                      z0    z1    z2    z3    z4    z5    z6    z7    z8    z9   z10   z11   z12   z13   z14   all
           boundary  151   336   409   544   872   332   437   552   802  1.6k    2k  6.9k  6.2k  5.6k  4.5k  6.9k
              water 7.7k  3.7k  8.6k  5.5k  2.6k  5.1k   15k   18k   16k   26k   15k   13k   17k   15k   12k   26k
              place    0     0   441   441   441   640   714    1k  1.6k  3.1k  5.8k  3.4k  1.7k   803   948  5.8k
            landuse    0     0     0     0   549   695  1.6k  6.7k   17k   44k   59k   50k   38k   19k   12k   59k
     transportation    0     0     0     0   313   776  1.2k    4k  5.7k   17k   13k   17k   62k   47k   33k   62k
           waterway    0     0     0     0   112   119     0     0     0    3k  2.3k    2k  2.1k  4.9k  2.4k  4.9k
               park    0     0     0     0     0     0  1.3k  4.3k  9.7k   18k   13k  8.2k  3.7k  3.4k  4.4k   18k
transportation_name    0     0     0     0     0     0   287   364  1.1k  1.9k  5.5k  4.7k  3.9k  3.4k   18k   18k
          landcover    0     0     0     0     0     0     0  9.9k   29k   85k   71k   81k   53k   30k   25k   85k
      mountain_peak    0     0     0     0     0     0     0  1.1k  1.8k  3.4k  4.3k  2.8k  1.4k  1.4k   869  4.3k
         water_name    0     0     0     0     0     0     0     0     0   486   461   433   452  1.2k  1.5k  1.5k
    aerodrome_label    0     0     0     0     0     0     0     0     0     0   666   328   273   221   221   666
            aeroway    0     0     0     0     0     0     0     0     0     0  1.6k  2.1k    3k  3.4k  2.8k  3.4k
                poi    0     0     0     0     0     0     0     0     0     0     0     0   568   565   85k   85k
           building    0     0     0     0     0     0     0     0     0     0     0     0     0   59k   92k   92k
        housenumber    0     0     0     0     0     0     0     0     0     0     0     0     0     0   35k   35k
          full tile 7.9k    4k  9.5k  6.4k  3.7k    6k   20k   41k   82k  195k  181k  134k  113k  127k  247k  247k
            gzipped 6.2k  3.5k  7.1k  5.2k  3.1k  4.8k   14k   29k   59k  144k  136k   98k   83k   91k  157k  157k
0:01:05 DEB [archive] -    Max tile: 247k (gzipped: 157k)
0:01:05 DEB [archive] -    Avg tile: 5.4k (gzipped: 4k) using weighted average based on OSM traffic
0:01:05 DEB [archive] -     # tiles: 4,115,039
0:01:05 DEB [archive] -  # features: 5,519,419
0:01:05 INF [archive] - Finished in 19s cpu:1m11s avg:3.7
0:01:05 INF [archive] -   read    1x(3% 0.6s wait:18s done:1s)
0:01:05 INF [archive] -   encode  4x(56% 11s wait:2s)
0:01:05 INF [archive] -   write   1x(22% 4s wait:13s)
0:01:05 INF [archive] - Finished in 1m6s cpu:3m36s gc:1s avg:3.3
0:01:05 INF [archive] - FINISHED!
0:01:05 INF [archive] - 
0:01:05 INF [archive] - ----------------------------------------
0:01:05 INF [archive] - data errors:
0:01:05 INF [archive] - 	render_snap_fix_input	16,734
0:01:05 INF [archive] - 	osm_multipolygon_missing_way	360
0:01:05 INF [archive] - 	osm_boundary_missing_way	55
0:01:05 INF [archive] - 	merge_snap_fix_input	12
0:01:05 INF [archive] - 	feature_centroid_if_convex_osm_invalid_multipolygon_empty_after_fix	2
0:01:05 INF [archive] - 	render_snap_fix_input2	1
0:01:05 INF [archive] - 	omt_fix_water_before_ne_intersect	1
0:01:05 INF [archive] - 	feature_polygon_osm_invalid_multipolygon_empty_after_fix	1
0:01:05 INF [archive] - 	feature_point_on_surface_osm_invalid_multipolygon_empty_after_fix	1
0:01:05 INF [archive] - ----------------------------------------
0:01:05 INF [archive] - 	overall          1m6s cpu:3m36s gc:1s avg:3.3
0:01:05 INF [archive] - 	lake_centerlines 2s cpu:5s avg:2.2
0:01:05 INF [archive] - 	  read     1x(20% 0.5s done:2s)
0:01:05 INF [archive] - 	  process  4x(0% 0s done:2s)
0:01:05 INF [archive] - 	  write    1x(0% 0s done:2s)
0:01:05 INF [archive] - 	water_polygons   15s cpu:41s avg:2.7
0:01:05 INF [archive] - 	  read     1x(41% 6s done:7s)
0:01:05 INF [archive] - 	  process  4x(26% 4s wait:4s done:6s)
0:01:05 INF [archive] - 	  write    1x(3% 0.5s wait:9s done:5s)
0:01:05 INF [archive] - 	natural_earth    6s cpu:13s avg:2
0:01:05 INF [archive] - 	  read     1x(95% 6s)
0:01:05 INF [archive] - 	  process  4x(13% 0.8s wait:6s)
0:01:05 INF [archive] - 	  write    1x(0% 0s wait:6s)
0:01:05 INF [archive] - 	osm_pass1        2s cpu:6s avg:3.1
0:01:05 INF [archive] - 	  read     1x(2% 0s wait:2s)
0:01:05 INF [archive] - 	  parse    4x(32% 0.6s)
0:01:05 INF [archive] - 	  process  1x(71% 1s)
0:01:05 INF [archive] - 	osm_pass2        19s cpu:1m14s avg:4
0:01:05 INF [archive] - 	  read     1x(0% 0s wait:11s done:8s)
0:01:05 INF [archive] - 	  process  4x(75% 14s)
0:01:05 INF [archive] - 	  write    1x(2% 0.4s wait:18s)
0:01:05 INF [archive] - 	ne_lakes         0s cpu:0s avg:0
0:01:05 INF [archive] - 	boundaries       0s cpu:0s avg:2.8
0:01:05 INF [archive] - 	agg_stop         0s cpu:0s avg:0
0:01:05 INF [archive] - 	sort             1s cpu:3s avg:2.5
0:01:05 INF [archive] - 	  worker  1x(52% 0.7s)
0:01:05 INF [archive] - 	archive          19s cpu:1m11s avg:3.7
0:01:05 INF [archive] - 	  read    1x(3% 0.6s wait:18s done:1s)
0:01:05 INF [archive] - 	  encode  4x(56% 11s wait:2s)
0:01:05 INF [archive] - 	  write   1x(22% 4s wait:13s)
0:01:05 INF [archive] - ----------------------------------------
0:01:05 INF [archive] - 	archive	108MB
0:01:05 INF [archive] - 	features	284MB
-rw-r--r-- 1 runner docker 87M Feb  4 08:27 run.jar
0:01:06 DEB [archive] - Tile stats:
0:01:06 DEB [archive] - Biggest tiles (gzipped)
1. 14/4942/6092 (157k) https://onthegomap.github.io/planetiler-demo/#14.5/41.82864/-71.40015 (poi:85k)
2. 9/154/190 (144k) https://onthegomap.github.io/planetiler-demo/#9.5/41.77078/-71.36719 (landcover:85k)
3. 10/308/380 (136k) https://onthegomap.github.io/planetiler-demo/#10.5/41.90214/-71.54297 (landcover:66k)
4. 10/308/381 (135k) https://onthegomap.github.io/planetiler-demo/#10.5/41.63994/-71.54297 (landcover:71k)
5. 14/4941/6092 (113k) https://onthegomap.github.io/planetiler-demo/#14.5/41.82864/-71.42212 (poi:65k)
6. 14/4941/6093 (112k) https://onthegomap.github.io/planetiler-demo/#14.5/41.81227/-71.42212 (building:62k)
7. 14/4940/6092 (101k) https://onthegomap.github.io/planetiler-demo/#14.5/41.82864/-71.44409 (building:92k)
8. 11/616/762 (98k) https://onthegomap.github.io/planetiler-demo/#11.5/41.7057/-71.63086 (landcover:71k)
9. 14/4942/6091 (96k) https://onthegomap.github.io/planetiler-demo/#14.5/41.84501/-71.40015 (building:79k)
10. 11/616/761 (95k) https://onthegomap.github.io/planetiler-demo/#11.5/41.83679/-71.63086 (landcover:72k)
0:01:06 DEB [archive] - Max tile sizes
                      z0    z1    z2    z3    z4    z5    z6    z7    z8    z9   z10   z11   z12   z13   z14   all
           boundary  151   336   409   544   872   332   437   552   802  1.6k    2k  6.9k  6.2k  5.6k  4.5k  6.9k
              water 7.7k  3.7k  8.6k  5.5k  2.6k  5.1k   15k   18k   16k   26k   15k   13k   17k   15k   12k   26k
              place    0     0   441   441   441   640   714    1k  1.6k  3.1k  5.8k  3.4k  1.7k   803   948  5.8k
            landuse    0     0     0     0   549   695  1.6k  6.7k   17k   44k   59k   50k   38k   19k   12k   59k
     transportation    0     0     0     0   313   776  1.2k    4k  5.7k   17k   13k   17k   62k   47k   33k   62k
           waterway    0     0     0     0   112   119     0     0     0    3k  2.3k    2k  2.1k  4.9k  2.4k  4.9k
               park    0     0     0     0     0     0  1.3k  4.3k  9.7k   18k   13k  8.2k  3.7k  3.4k  4.4k   18k
transportation_name    0     0     0     0     0     0   287   364  1.1k  1.9k  5.5k  4.7k  3.9k  3.4k   18k   18k
          landcover    0     0     0     0     0     0     0  9.9k   29k   85k   71k   81k   53k   30k   25k   85k
      mountain_peak    0     0     0     0     0     0     0  1.1k  1.8k  3.4k  4.3k  2.8k  1.4k  1.4k   869  4.3k
         water_name    0     0     0     0     0     0     0     0     0   486   461   433   452  1.2k  1.5k  1.5k
    aerodrome_label    0     0     0     0     0     0     0     0     0     0   666   328   273   221   221   666
            aeroway    0     0     0     0     0     0     0     0     0     0  1.6k  2.1k    3k  3.4k  2.8k  3.4k
                poi    0     0     0     0     0     0     0     0     0     0     0     0   568   565   85k   85k
           building    0     0     0     0     0     0     0     0     0     0     0     0     0   59k   92k   92k
        housenumber    0     0     0     0     0     0     0     0     0     0     0     0     0     0   35k   35k
          full tile 7.9k    4k  9.5k  6.4k  3.7k    6k   20k   41k   82k  195k  181k  134k  113k  127k  247k  247k
            gzipped 6.2k  3.5k  7.1k  5.2k  3.1k  4.8k   14k   29k   59k  144k  136k   98k   83k   91k  157k  157k
0:01:06 DEB [archive] -    Max tile: 247k (gzipped: 157k)
0:01:06 DEB [archive] -    Avg tile: 5.4k (gzipped: 4k) using weighted average based on OSM traffic
0:01:06 DEB [archive] -     # tiles: 4,115,039
0:01:06 DEB [archive] -  # features: 5,519,419
0:01:06 INF [archive] - Finished in 19s cpu:1m11s avg:3.6
0:01:06 INF [archive] -   read    1x(3% 0.5s wait:18s done:1s)
0:01:06 INF [archive] -   encode  4x(56% 11s wait:2s)
0:01:06 INF [archive] -   write   1x(22% 4s wait:13s)
0:01:06 INF [archive] - Finished in 1m6s cpu:3m36s gc:1s avg:3.3
0:01:06 INF [archive] - FINISHED!
0:01:06 INF [archive] - 
0:01:06 INF [archive] - ----------------------------------------
0:01:06 INF [archive] - data errors:
0:01:06 INF [archive] - 	render_snap_fix_input	16,734
0:01:06 INF [archive] - 	osm_multipolygon_missing_way	360
0:01:06 INF [archive] - 	osm_boundary_missing_way	55
0:01:06 INF [archive] - 	merge_snap_fix_input	12
0:01:06 INF [archive] - 	feature_centroid_if_convex_osm_invalid_multipolygon_empty_after_fix	2
0:01:06 INF [archive] - 	render_snap_fix_input2	1
0:01:06 INF [archive] - 	omt_fix_water_before_ne_intersect	1
0:01:06 INF [archive] - 	feature_polygon_osm_invalid_multipolygon_empty_after_fix	1
0:01:06 INF [archive] - 	feature_point_on_surface_osm_invalid_multipolygon_empty_after_fix	1
0:01:06 INF [archive] - ----------------------------------------
0:01:06 INF [archive] - 	overall          1m6s cpu:3m36s gc:1s avg:3.3
0:01:06 INF [archive] - 	lake_centerlines 2s cpu:5s avg:2.3
0:01:06 INF [archive] - 	  read     1x(21% 0.5s done:2s)
0:01:06 INF [archive] - 	  process  4x(0% 0s done:2s)
0:01:06 INF [archive] - 	  write    1x(0% 0s done:2s)
0:01:06 INF [archive] - 	water_polygons   15s cpu:41s avg:2.7
0:01:06 INF [archive] - 	  read     1x(41% 6s done:8s)
0:01:06 INF [archive] - 	  process  4x(27% 4s wait:4s done:5s)
0:01:06 INF [archive] - 	  write    1x(3% 0.5s wait:10s done:5s)
0:01:06 INF [archive] - 	natural_earth    7s cpu:13s avg:2
0:01:06 INF [archive] - 	  read     1x(95% 6s)
0:01:06 INF [archive] - 	  process  4x(13% 0.8s wait:6s)
0:01:06 INF [archive] - 	  write    1x(0% 0s wait:6s)
0:01:06 INF [archive] - 	osm_pass1        2s cpu:6s avg:3.4
0:01:06 INF [archive] - 	  read     1x(2% 0s wait:2s)
0:01:06 INF [archive] - 	  parse    4x(34% 0.7s)
0:01:06 INF [archive] - 	  process  1x(69% 1s)
0:01:06 INF [archive] - 	osm_pass2        19s cpu:1m14s avg:4
0:01:06 INF [archive] - 	  read     1x(0% 0s wait:11s done:8s)
0:01:06 INF [archive] - 	  process  4x(76% 14s)
0:01:06 INF [archive] - 	  write    1x(2% 0.4s wait:18s)
0:01:06 INF [archive] - 	ne_lakes         0s cpu:0s avg:0
0:01:06 INF [archive] - 	boundaries       0s cpu:0s avg:2.4
0:01:06 INF [archive] - 	agg_stop         0s cpu:0s avg:0
0:01:06 INF [archive] - 	sort             1s cpu:4s avg:2.8
0:01:06 INF [archive] - 	  worker  1x(48% 0.7s)
0:01:06 INF [archive] - 	archive          19s cpu:1m11s avg:3.6
0:01:06 INF [archive] - 	  read    1x(3% 0.5s wait:18s done:1s)
0:01:06 INF [archive] - 	  encode  4x(56% 11s wait:2s)
0:01:06 INF [archive] - 	  write   1x(22% 4s wait:13s)
0:01:06 INF [archive] - ----------------------------------------
0:01:06 INF [archive] - 	archive	108MB
0:01:06 INF [archive] - 	features	284MB
-rw-r--r-- 1 runner docker 87M Feb  4 08:29 run.jar

Full logs: https://github.com/onthegomap/planetiler/actions/runs/13131232735

Comment on lines 152 to 154
FeatureIndexManager indexer = new FeatureIndexManager(geoPackage,
features);
indexer.setIndexLocation(FeatureIndexType.RTREE);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's been a while, but if I remember correctly, this will actually build the index if it doesn't already exist in the DB before running the query.

Looking at the PR where I initially added this, should be possible to check for the presence of the index before trying to use it: #413 (comment)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That seems awkward and unexpected behavior to modify the database in-place, can we disable that? If it's being unzipped every time it could be slow for a large geopackage.

Maybe detect the absence of the index and ignore?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(nvm, I see you mentioned exactly that behavior I suggested) so the next step is addressing possible projection mismatches; we need to find some geopackages in the wild that are neither WGS84 or webmerc

@bdon bdon force-pushed the bdon/gpkg-indexing branch from c8d3fee to 0b2c894 Compare February 4, 2025 03:33
@bdon bdon marked this pull request as ready for review February 4, 2025 07:59
Copy link

sonarqubecloud bot commented Feb 4, 2025

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants