Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEATURE] canonical vector tile polygon representations for deduplication #1170

Closed
bdon opened this issue Feb 7, 2025 · 9 comments · Fixed by #1173
Closed

[FEATURE] canonical vector tile polygon representations for deduplication #1170

bdon opened this issue Feb 7, 2025 · 9 comments · Fixed by #1173

Comments

@bdon
Copy link
Contributor

bdon commented Feb 7, 2025

Is your feature request related to a problem? Please describe.
Deduplication of repetitive tiles makes all of the tile archive formats more efficient, usually this works by hashing the tile contents to determine if they are byte-for-byte identical. These tiles are almost always oceans or large areal features where there is a single square covering the whole tile.

Right now, it is possible for the tiling to emit polygons that are geometrically identical but have different vertex representations, such as these two geometries from -128,-128 to 4224, 4224:

geometry:
      RING[count=5](-128 -128,4224 -128,4224 4224,-128 4224,-128 -128)[OUTER]
geometry:
  RING[count=9](-128 4224,-128 -128,4093 -128,4099 -128,4224 -128,4224 4224,4099 4224,4093 4224,-128 4224)[OUTER]

Describe the solution you'd like
If all of the features in the tile are squares of the full extent of the tile, apply Douglas-Peucker with a tolerance of 0 to get a polygon with 5 vertices, then ensure that the start/end vertices are the smallest values.

Additional context
Maybe it already works this way and I have a bug in my implementation of this, but you can check the tiles 9/254/85 and 9/255/85 from a build on https://maps.protomaps.com/builds/:

pmtiles tile https://demo-bucket.protomaps.com/v4.pmtiles 9 254 85 > 9_254_85.pbf.gz

and then look at it with vtzero

@bdon
Copy link
Contributor Author

bdon commented Feb 7, 2025

We could limit this code path to only the existing condition containsOnlyFillsOrEdges

@msbarry
Copy link
Contributor

msbarry commented Feb 7, 2025

Is this only a concern for whole-tile filled polygons? If so I'm wondering how these are appearing in the first place? Planetiler's large geometry tile-slicer always generates fill polygons using this method that creates identical outputs:

public static VectorGeometry encodeFill(double buffer) {
int min = (int) Math.round(EXTENT * buffer / 256d);
int width = EXTENT + min + min;
return new VectorGeometry(new int[]{
CommandEncoder.commandAndLength(Command.MOVE_TO, 1),
zigZagEncode(-min), zigZagEncode(-min),
CommandEncoder.commandAndLength(Command.LINE_TO, 3),
zigZagEncode(width), 0,
0, zigZagEncode(width),
zigZagEncode(-width), 0,
CommandEncoder.commandAndLength(Command.CLOSE_PATH, 1)
}, GeometryType.POLYGON, 0);
}

@bdon
Copy link
Contributor Author

bdon commented Feb 7, 2025

@msbarry at what stage does that happen? The input is the pre-tiled OSM water polygons for ocean, and they only fill whole tiles after mergeOverlappingPolygons

@msbarry
Copy link
Contributor

msbarry commented Feb 7, 2025

That happens during initial feature processing. But then during tile post-processing in each archive worker it does tile-post-processing which merges ocean polygons, here's the relevant code

for (int i = 0; i < batch.in.size(); i++) {
  FeatureGroup.TileFeatures tileFeatures = batch.in.get(i);
  featuresProcessed.incBy(tileFeatures.getNumFeaturesProcessed());
  byte[] bytes, encoded;
  List<TileSizeStats.LayerStats> layerStats;
  Long tileDataHash;
  if (tileFeatures.hasSameContents(last)) {
    bytes = lastBytes;
    encoded = lastEncoded;
    tileDataHash = lastTileDataHash;
    layerStats = lastLayerStats;
    memoizedTiles.inc();
  } else {
    VectorTile tile = tileFeatures.getVectorTile(layerAttrStatsUpdater); // this invokes tile-post-processing which merges ocean polygons

For openmaptiles water layer, it does this for post-processing which merges those overlapping OSM water polygons into a full-tile fill:

  @Override
  public List<VectorTile.Feature> postProcess(int zoom, List<VectorTile.Feature> items) throws GeometryException {
    return items.size() > 1 ? FeatureMerge.mergeOverlappingPolygons(items, config.minFeatureSize(zoom)) : items;
  }

then later on it computes tileDataHash which is used to extend runs when tiles are the same using the output of tile-post-processing

if (archive.deduplicates() && tile.likelyToBeDuplicated() && bytes != null) {
  tileDataHash = generateContentHash(bytes);
} else {
  tileDataHash = null;
}

tile.likelyToBeDuplicated() does return true for that polygon you shared, but generateContentHash returns something different. So I think possible paths forward here are:

  1. Make FeatureMerge merge polygons utility simplify/normalize the output either all the time or just when it's likely to be duplicated
  2. Make generateContentHash normalize features first, either all the time or when it's likely to be duplicated

Or maybe there's another workaround here too ... ?

@bdon
Copy link
Contributor Author

bdon commented Feb 7, 2025

I think 1) is the right solution. In the polygons above, the one with 5 vertices is strictly better than the one with 9, and in 2) it's sensitive to which one it sees first so the 9 will get saved into the archive.

@msbarry
Copy link
Contributor

msbarry commented Feb 7, 2025

Got it, 1 sounds good. It's a bit dangerous though to touch polygons after they've gone through snapAndFix since it could introduce a robustness issue so I'm kind of leaning toward a limited fix only that only detects fill polygons. It needs to handle rotating the points so they start in the same corner too...

@bdon
Copy link
Contributor Author

bdon commented Feb 8, 2025

Fill polygons only is fine, that likely covers 99% of cases, though we should handle more than 1 fill polygon in a tile.

@msbarry
Copy link
Contributor

msbarry commented Feb 8, 2025

One shortcut we might be able to use would be in FeatureMerge.extractPolygons which runs after polygon merging, the first thing it does is get the area of the exterior ring. Maybe we could use that to see if it's bigger then a full tile, check that it's a square and use sqrt(area) to determine how big of a buffer to give the new outer ring.

private static void extractPolygons(Geometry geom, List<Polygon> result, double minArea, double minHoleArea) {
if (geom instanceof Polygon poly) {
if (Area.ofRing(poly.getExteriorRing().getCoordinateSequence()) > minArea) {
int innerRings = poly.getNumInteriorRing();
if (minHoleArea > 0 && innerRings > 0) {
List<LinearRing> rings = new ArrayList<>(innerRings);
for (int i = 0; i < innerRings; i++) {
LinearRing innerRing = poly.getInteriorRingN(i);
if (Area.ofRing(innerRing.getCoordinateSequence()) > minArea) {
rings.add(innerRing);
}
}
if (rings.size() != innerRings) {
poly = GeoUtils.createPolygon(poly.getExteriorRing(), rings);
}
}
result.add(poly);
}
} else if (geom instanceof GeometryCollection) {
for (int i = 0; i < geom.getNumGeometries(); i++) {
extractPolygons(geom.getGeometryN(i), result, minArea, minHoleArea);
}
}
}

@msbarry
Copy link
Contributor

msbarry commented Feb 8, 2025

Here's a proposed fix to narrowly handle this case when merged polygons end up as a fill: #1173

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants