Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Map translation #22

Open
tbodt opened this issue Sep 27, 2024 · 34 comments
Open

Map translation #22

tbodt opened this issue Sep 27, 2024 · 34 comments

Comments

@tbodt
Copy link
Contributor

tbodt commented Sep 27, 2024

What would it take to support rendering the map in any language which has translations in OSM?

@tbodt
Copy link
Contributor Author

tbodt commented Sep 27, 2024

I see planetiler has a default list of languages: https://github.com/onthegomap/planetiler/blob/169627dea9b024f4b64f53039c302778c1c273bf/planetiler-core/src/main/java/com/onthegomap/planetiler/Planetiler.java#L106. This unfortunately doesn't include the language I want to use. I can imagine including all the languages would mean significantly bigger tile files...

@hyperknot
Copy link
Owner

Luckily it's not related to planetiler and is very simple to achieve.

  1. Download the JSON from the style you prefer, for example this JSON:
    https://tiles.openfreemap.org/styles/liberty

  2. Do a search and replace for "name_en" with "name_de" or whichever you prefer. Upload the JSON somewhere and simply point your style to this JSON instead of the default ones.

A dynamic JS snippet which does this is the following:

for (const layer of style.layers) {
  if (!layer.layout) continue

  const textField = layer.layout['text-field']
  if (!textField) continue

  // highway numbers, etc.
  if (isEqual(textField, ['to-string', ['get', 'ref']])) continue

  const id = layer.id

  let separator
  if (id.includes('line') || id.includes('highway')) {
    separator = ' '
  } else {
    separator = '\n'
  }

  const parts = [
      ['get', `name_${langCode}`],
      ['get', `name:${langCode}`],
      ['get', 'name'],
  ]

  layer.layout['text-field'] = [
    'case',
    ['has', 'name:nonlatin'],
    ['concat', ['get', 'name:latin'], separator, ['get', 'name:nonlatin']],
    ['coalesce', ...parts],
  ]
}

@tbodt
Copy link
Contributor Author

tbodt commented Sep 28, 2024

This works decently well, but there are two problems

  • This way of fiddling with the style JSON seems very brittle, surely it would be best for there to be some way for the style to like include a variable which can be filled in at runtime with the correct language?
  • It doesn't include every language that has data in OSM/wikidata. For example: Europe has names in 82 languages on OSM, and 343 on Wikidata, but if you download and parse the PBF file for 0/0/0 you only find 76...and somehow a bunch of these 76 are not on OSM? how did that happen

@hyperknot
Copy link
Owner

This is basically the official solution. Actually the official solution is way more basic, I've spent a lot of time polishing it till I got to the version I posted. Have a look at the official example:
https://maplibre.org/maplibre-gl-js/docs/examples/language-switch/

About fiddling with JSON, basically all Mapbox/Maplibre styles are just that, JSON. I might set up a nginx function for this, but at the end of the day it'll just be a JSON with strings.

Now, about the mismatch of between OSM and OpenMapTiles, it's outside the scope of this project. You can see what's exactly in the data by going to the inspector mode of Maputnik:
https://maputnik.github.io/editor?style=https://tiles.openfreemap.org/styles/bright

image

@tbodt
Copy link
Contributor Author

tbodt commented Sep 29, 2024

Thanks for the link to maputnik, it's definitely easier to use than manually deserializing pbfs.

What I don't understand, though, is the difference between OSM and OpenMapTiles, and why they would get out of sync. AIUI this project, OpenFreeMap, runs Planetiler, and then Planetiler fetches its data from OSM to generate all the tiles. Where does OpenMapTiles come in? What actually is OpenMapTiles?

@hyperknot
Copy link
Owner

Wrote a document for debugging international names:
https://github.com/hyperknot/openfreemap/blob/main/docs/debugging_names.md

@tbodt
Copy link
Contributor Author

tbodt commented Sep 29, 2024

Turns out the list of languages in OpenMapTiles is equivalent to the list of languages in OpenFreeMaps, so that explains where it's coming from. What I don't understand is, why does planetiler use OpenMapTiles and not OSM directly?

@tbodt
Copy link
Contributor Author

tbodt commented Sep 29, 2024

Would it be feasible to simply include all languages with data in wikidata in the tileset, or would that cost too much disk space?

@hyperknot
Copy link
Owner

So for your questions:

Why does planetiler use OpenMapTiles and not OSM directly?

Because OSM is just a database dump, it's not usable on it's own. You need to make a schema which can later describe geometries in the vector tiles. One such schema is OpenMapTiles. Other is https://shortbread-tiles.org/

Would it be feasible to simply include all languages with data in wikidata in the tileset, or would that cost too much disk space?

I don't know the answer for this, you should ask it in the Planetiler repo.

@tbodt
Copy link
Contributor Author

tbodt commented Sep 29, 2024

Makes sense. Going to figure out what would need to change in Planetiler.

@hyperknot
Copy link
Owner

I've opened a ticket to render a full planet on the other two OpenMapTiles, so we can compare their implementation. This way we can see if something is planetiler specific, or is present in the other two implementation as well.
#25

@tbodt
Copy link
Contributor Author

tbodt commented Sep 29, 2024

My assessment is:

  • OpenMapTiles reference implementation has a list of languages defined in openmaptiles.yaml, and a surprising amount of dependencies on that list (e.g. using it to generate a list of columns for SQL statements) that would seem to make it hard to convince it to fetch labels for all languages.
  • tilemaker just doesn't fetch labels in any other language by default, and has no support for wikidata - doesn't seem very useful without significant work.
  • Planetiler's default task is to run with a configuration that matches OMT reference as closely as possible, apparently implemented as a Java file autogenerated from the aforementioned openmaptiles.yaml, thus inheriting the list of languages. There's no flag that will make it include all languages. But this is implemented by first fetching all labels for all languages from OSM and Wikidata, then filtering them - so it would be easy to make it just not, modulo bikeshedding over command line flags.

I'm currently looking into the feasibility of these Planetiler changes, hopefully I'll be able to do a full planet run to test the difference in output size.

@hyperknot
Copy link
Owner

Sounds great, thank you for digging into this!

These are the command line options I'm using for planetiler on a 128 GB machine. It takes about 5 hours to run:

command = [
'java',
f'-Xmx{java_memory_gb}g',
'-jar',
config.planetiler_path,
f'--area={area}',
'--download',
'--download-threads=10',
'--download-chunk-size-mb=1000',
'--fetch-wikidata',
'--output=tiles.mbtiles',
'--storage=mmap',
'--force',
]
if area == 'planet':
command.append('--nodemap-type=array')
command.append('--bounds=planet')
if area == 'monaco':
command.append('--nodemap-type=sortedtable')

@tbodt
Copy link
Contributor Author

tbodt commented Oct 1, 2024

Looks like I can do a run on my 8GB machine in about 20 hours. Surprisingly this is fast enough for me for the moment since I have other things to do.

@tbodt
Copy link
Contributor Author

tbodt commented Oct 3, 2024

So, the planetiler output without the language filter is about 2GB larger.

-rw-r--r--  1 tbodt  staff  93248712704 Oct  3 11:26 data/planet-all-langs.mbtiles
-rw-r--r--  1 tbodt  staff  91267948544 Oct  1 04:15 data/planet-osm-langs.mbtiles

I think this is worth it, what do you think? I'll work on the pull request for planetiler soon.

@hyperknot
Copy link
Owner

That's a great work, thank you! I think definitely open a PR in Planetiler and let them decide.

But before that, I'd make a very clear example of before-after for a few items, to understand what's missing from the old version.

@tbodt
Copy link
Contributor Author

tbodt commented Oct 3, 2024

In onthegomap/planetiler#1043 they've said that it would make sense to have a flag, but they wouldn't make it the default since the default should match OMT as closely as possible. The question for you is whether you would set this flag for OFM.

I can post some before/afters here for you if you like.

@hyperknot
Copy link
Owner

Yes please. I mean especially include languages which you believe should be included. I'm not sure the right choice is to have hundreds of languages if no one would use them.

Also, could you compare the size totals of 4-6 tiles which are normally loaded for some popular view, say London or New York? I'm afraid that the 2 GB of size growth isn't distributed evenly, where every tile is 2% bigger, but some popular areas being 10-15% bigger, but it's just a guess from my side.

@tbodt
Copy link
Contributor Author

tbodt commented Oct 3, 2024

I believe we should include every language indiscriminately - if a language has data, that means someone cared enough about it to type in labels.

@hyperknot
Copy link
Owner

Before making a decision, I'd be curious about other map platforms choice is on this.

@hyperknot
Copy link
Owner

hyperknot commented Oct 3, 2024

Maptiler offers these languages:

  1. English
  2. Local
  3. Albanian
  4. Amharic
  5. Arabic
  6. Armenian
  7. Azerbaijani
  8. Basque
  9. Belarusian
  10. Bengali
  11. Bosnian
  12. Breton
  13. Bulgarian
  14. Catalan
  15. Chinese
  16. Corsican
  17. Croatian
  18. Czech
  19. Danish
  20. Dutch
  21. English (listed twice)
  22. Esperanto
  23. Estonian
  24. Finnish
  25. French
  26. Georgian
  27. German
  28. Greek
  29. Hebrew
  30. Hindi
  31. Hungarian
  32. Icelandic
  33. Indonesian
  34. Irish
  35. Italian
  36. Japanese
  37. Japanese (Kana)
  38. Japanese (Latin 2018)
  39. Japanese (Latin)
  40. Japanese Hiragana form
  41. Kannada
  42. Kazakh
  43. Korean
  44. Korean (Latin)
  45. Kurdish
  46. Latin
  47. Latvian
  48. Lithuanian
  49. Luxembourgish
  50. Macedonian
  51. Malayalam
  52. Maltese
  53. Norwegian
  54. Occitan
  55. Polish
  56. Portuguese
  57. Romania
  58. Romansh
  59. Russian
  60. Scottish Gaelic
  61. Serbian (Cyrillic)
  62. Serbian (Latin)
  63. Slovak
  64. Slovene
  65. Spanish
  66. Swedish
  67. Tamil
  68. Telugu
  69. Thai
  70. Turkish
  71. Ukrainian
  72. Vietnamese
  73. Welsh
  74. Western Frisian

@tbodt
Copy link
Contributor Author

tbodt commented Oct 4, 2024

A list of maps with localization is at https://wiki.openstreetmap.org/wiki/Map_internationalization. Of those my language "tok" is only supported by Wikimedia maps, which seems to include every language indiscriminately, but the terms of use allow use only for Wikimedia projects.

@ImreSamu
Copy link

ImreSamu commented Oct 4, 2024

Although there are thousands of unique name: tags in OpenStreetMap (OSM),
many of them do not match the correct language code definitions.
https://taginfo.openstreetmap.org/search?q=name%3A#keys

Wikimedia recognizes around 710 language codes ( https://www.wikidata.org/wiki/Help:Wikimedia_language_codes/lists/all ), so ideally, this number of languages should be included to ensure all known languages are represented.
By clicking on the "WDQS" query, you can download the language codes in CSV format, which can be used after some cleaning.

name:tok - "Toki Pona"

@tbodt :

my language "tok" is only supported by Wikimedia maps,
which seems to include every language indiscriminately, but the terms of use allow use only for Wikimedia projects.

name:tok ( https://en.wikipedia.org/wiki/Toki_Pona )
Has only 323 objects in the OpenStreetMap : https://taginfo.openstreetmap.org/keys/name%3Atok#overview

The hard part is - creating a perfect map with 'Sitelen Pona' labels won't be easy due to the ~120 hieroglyphic characters.

Transliteration isn't straightforward, as many non-Latin scripts (Arabic, Chinese, Japanese, Hebrew, etc.) first need to be converted into Latin characters before the Toki Pona transliteration can work.

image

@tbodt
Copy link
Contributor Author

tbodt commented Oct 4, 2024

Happily, transliteration to sitelen pona is not really necessary. Latin script is more commonly used for toki pona anyway, so I would be happy with that on the map. If toki pona with sitelen pona was an option it would be its own language code with its own labels defined, just like how there are multiple language codes for Japanese. Automatically transliterating everything is not really worth doing.

@hyperknot
Copy link
Owner

Thank you for the research @ImreSamu. I thought about this and would like to choose the following decision forward:

  • I'd like to stick with OpenMapTiles decisions for default languages
  • But if there is individual need for adding a single language, I'm happy to include "tok" for example.

@tbodt if you submit a PR to planetiler to add individual languages and not all, then I'm happy to include "tok" in OpenFreeMap.

For your proposal of including all languages, why don't you convert your render to PMTiles and host that on Cloudflare? I mean, you can make an EveryLanguageMap or similar, I think it'd be a very interesting project!

@tbodt
Copy link
Contributor Author

tbodt commented Oct 4, 2024

Ultimately I'm asking to add every language here instead of creating my own because it's much easier for you to do than for me, regarding the cost of serving and generation: a few % extra for you, an entire new project to maintain for me. I don't really understand why not. Yes, I'm here to make a Toki Pona map, but in the process it started to look like it would be just as easy to fix this for every language instead of just mine, and I like the idea of not leaving anyone out.

That said, the idea of adding individual languages suggests a good idea for designing the planetiler flags, I'll see what I can implement there.

@hyperknot
Copy link
Owner

hyperknot commented Oct 4, 2024

I understand your point, but there are two big reasons why I think it't not a good idea for this project:

  1. Once we add something we have to support it. For example have a look at Some Japanese place names are transliterated to their alphabetical form using Chinese phonetic readings. #24. What if we include those languages which no one used in OpenMapTiles before, and someone opens a ticket that language x is displayed incorrectly. We cannot just say that sorry, we don't care, we'd need to invest time into trying to solve that issue for that particular language.
  2. I believe the size growth is not 2% universally, but close to 0% on most of the world and up-to 10% in some dense areas. Making the map load 10% slower in popular areas is not a good idea.

Your map could be a perfect candidate for PMTiles + Cloudflare R2 hosting, you can literally host it for free. And about generation, you've just made a full planet run, if you don't want to update it frequently then you are basically done! I really mean you can set up your EveryLanguageMap in like a few hours and it'd be a super nice open source project.

@tbodt
Copy link
Contributor Author

tbodt commented Oct 4, 2024

Thanks for the explanation. Indeed the cost is doing map updates - to keep the map up to date essentially requires scheduling and monitoring reruns forever.

I do plan to look at which tiles grow the most, will let you know what I find.

@1ec5
Copy link

1ec5 commented Nov 3, 2024

By the way, the OSMUS Tile Service (which powers OSM Americana, among other things) passes a very long list of languages into Planetiler (unfortunately not including tok, but feel free to open an issue about that). It would be great to not have to pass in language codes explicitly.

@tbodt
Copy link
Contributor Author

tbodt commented Nov 25, 2024

I extended the languages flag in planetiler to allow adding specific languages to the default, among other things: onthegomap/planetiler#1111

@tbodt
Copy link
Contributor Author

tbodt commented Nov 25, 2024

It looks like this project builds planetiler from source at HEAD, so the updated flag can used immediately. Is that right?

c.sudo(
f'git clone --recurse-submodules https://github.com/onthegomap/planetiler.git {PLANETILER_SRC}'
)

@hyperknot
Copy link
Owner

hyperknot commented Dec 3, 2024

Thanks for the PR!

It looks like this project builds planetiler from source at HEAD, so the updated flag can used immediately. Is that right?

Its pinned to a commit in the next line:

sudo_cmd(c, f'cd {PLANETILER_SRC} && git checkout {PLANETILER_COMMIT}')

@tbodt
Copy link
Contributor Author

tbodt commented Dec 3, 2024

Ah, I see. I've submitted #46 to bump the pin and add toki pona.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants