Skip to content

Commit

Permalink
Merge pull request #18 from OSMCha/jlow/readme
Browse files Browse the repository at this point in the history
Refine high-level overview in README.md
  • Loading branch information
jake-low authored Aug 21, 2024
2 parents 5bce929 + 9f6fa51 commit f12a0a0
Showing 1 changed file with 15 additions and 17 deletions.
32 changes: 15 additions & 17 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,22 +1,28 @@
# osm-adiff-service

Read the minutely replication files published by OpenStreetMap planet, and query changesets on Overpass to create full representations of changesets. It also posts the tag changes summary to the OSMCha API.
This service reads the minutely replication files published by OpenStreetMap, and builds JSON documents which describe each changeset in detail (including information which is not included in the replication file). It publishes these JSON files to S3, and also POSTs a summary of tag changes to the [OSMCha API](https://osmcha.org/api-docs/).

# Real OpenStreetMap Changesets

When a changeset is pushed to OSM, this stack builds a representation of the exact change that happened:
Each changeset JSON contains complete information about the changeset:

* Changeset metadata - username, id, timestamp, comment etc.
* Elements - each feature that was added, modified, or deleted in the changeset.
* For each element, the current and previous version including geometry and metadata.

#### Details
## What is this, and why?

[OSMCha](https://osmcha.org)'s purpose is to let users view a changeset in its entirety, including metadata about the changeset and the "before" and "after" versions of every OSM element that was changed.

The OSM API publishes minutely [replication files](https://wiki.openstreetmap.org/wiki/Planet.osm/diffs) in [`.osc` format](https://wiki.openstreetmap.org/wiki/OsmChange) that contain some information about each edit that is made to OSM, but these files are optimized for small size and don't contain all of the details required by OSMCha. Specifically:

- they do not include old ("before") versions of elements that changed
- they don't include way geometries at all unless the geometry itself was edited (not just the tags)
- they don't include bounding boxes

A richer diff format called [augmented diff](https://wiki.openstreetmap.org/wiki/Overpass_API/Augmented_Diffs) addresses these limitations. [Overpass](https://wiki.openstreetmap.org/wiki/Overpass_API) is capable of producing this type of diff. The `osm-adiff-service` can be used to process a replication file from the OSM API, retrieve additional data about each change by getting an augmented diff from Overpass, and republish the resulting info as JSON.

* New changesets are pushed to `https://s3.amazonaws.com/mapbox/real-changesets/production/<changeset-id>.json`
* Augmented Diffs are pushed by Ovepass are pushed to `https://s3-ap-northeast-1.amazonaws.com/overpass-db-ap-northeast-1/augmented-diffs/<state-id.osc>`.
* The latest state id is published here `https://s3-ap-northeast-1.amazonaws.com/overpass-db-ap-northeast-1/augmented-diffs/latest`
These JSON artifacts are named as [real-changesets](https://github.com/osmus/osmcha-charter-project/blob/main/real-changesets-docs.md), and currently the OSMCha's data pipeline is publishing the files in an [AWS Open Data S3 Bucket](https://registry.opendata.aws/real-changesets/). The `real-changesets` are used by OSMCha to provide the visualization of changesets to users. The component used to render it on the browser is the [changeset-map](https://github.com/osmlab/changeset-map).

#### Example
#### Example JSON changeset output

```json
// 20170309131154
Expand Down Expand Up @@ -102,14 +108,6 @@ When a changeset is pushed to OSM, this stack builds a representation of the exa
}
```

## What is this, and why?

A lot of processes around inspecting and searching for potentially bad edits on OpenStreetMap depend on being able to view a "changeset" in its entirety. This helps in gauging the context of an edit, see similar edits by the same user, and see edits in their "finished" state (i.e. not in between a changeset).

Our primary tool for visualizing changesets has been [changeset-map](http://osmlab.github.io/changeset-map/). We depend on [augmented diffs](http://wiki.openstreetmap.org/wiki/Overpass_API/Augmented_Diffs) generated by Overpass to generate these changeset representations and visualizations.

Augmented Diffs contains complete representations of changes in OSM for every minute. One can also query for a custom time range, and filter by bounding box or other attributes. These queries can be extremely slow, especially for large changesets, and were a major bottleneck in scaling up changeset reviewing processes.

### How to run

#### JS library
Expand Down

0 comments on commit f12a0a0

Please sign in to comment.