Download and extract metadata information from a wikidata json dump file. This project is mostly meant to be used with a Pelias stack, in order to complete data coming from other sources (see below). However, all it does is extract data from the dump, so you might fin dother purposes to it.
Openstreetmap comes with a lot of useful information about places. However, sometimes you need more. This is an importer that will bring data coming from Wikidata and prepare it in order to import it into Pelias along with other importers (such as Openstreetmap or whosonfirst).
Warning: It is meant to complete other sources and is not a place source on its own. In other words, extracted data will often go into the addendum field.
Only items that meet at least one of the following conditions are extracted:
- has coordinates location
- has an OpenStreetMap relation id
- has a Who's on first id
Currently, this project extracts 2 fields:
- Translated names (labels)
Osm is usually not very complete or well updated when it comes to place names. Translated names of known places will be extracted from wikidata and will complete those imported from osm.
- Translated aliases
Sometimes, places ahve several names (or aliases). For example, the eiffel tower i ssometimes called La dame de fer in French. This information is often not present in Openstreetmap. However, here again wikidata tends to be more complete.
- Images
It is often nice to be able to show an image of places to the user when they use the autocomplete. Images can be extracted from wikidata (when available). (Only the first image is extracted).
Important note about: Images from wikidata are usually under a Creative Commons license. However, most of them require attribution. This importer does not extract information about attributions (license type or author). You are responsible for extracting them by other means (for example, using the wikimedia api) and showing it to your users accordingly.
imports.wikidata.datapath
This is the directory where the wikidata importer will look for files to import. It will also download files to this location.
imports.wikidata.download.sourceURL
URL of the source file (download phase). It has to be a gzip compressed json.
imports.wikidata.import.filename
The name of the gzipped json file to use in the extract phase.
imports.wikidata.import.langs
If configured, the languages to keep in the etract phase (for labels). By default, all will be kept.
Example:
{
"imports": {
"wikidata": {
"datapath": "/path/to/wikidata",
"download": {
"sourceURL": "http://dumps.wikimedia.your.org/wikidatawiki/entities/latest-all.json.gz"
},
"import": {
"filename": "latest-all.json.gz"
},
"langs": ["fr", "en", "es"]
}
}
}
The easiest and recommended way of using the importer is through Docker. The image tamaringo/pelias-wikidata
can be found on dockerhub
docker pull tamaringo/pelias-wikidata
# download wikidata json dump
docker run --rm -it tamaringo/pelias-wikidata bin/download
# extract the fields into the sqlite3 database
docker run --rm -it tamaringo/pelias-wikidata bin/extract
It takes about 9 hours to do a full extract.
After generating a sqlite3 file containing the information, you'll need to extend pelias in order to use it with your osm importer.
You can find an example of how to use the data in the example directory.
You will need to inject the stream into pelias' existing importPipeline.js. (You can do it right before 'streams.dbMapper()').
Example:
stream.extend = require('path/to/extend_osm');
streams.import = function(){
streams.pbfParser()
.pipe( streams.docConstructor() )
.pipe( streams.tagMapper() )
.pipe( streams.addressExtractor() )
.pipe( streams.blacklistStream() )
.pipe( streams.categoryMapper( categoryDefaults ) )
.pipe( streams.addendumMapper() )
.pipe( streams.popularityMapper() )
.pipe( streams.adminLookup() )
.pipe( stream.extend() )
.pipe( streams.dbMapper() )
.pipe( streams.extend() )
.pipe( streams.elasticsearch({name: 'openstreetmap'}) );
};