Skip to content

Commit

Permalink
Pronunciation for names and destinations (valhalla#3132)
Browse files Browse the repository at this point in the history
* added GetPronunciationsMap() and only_pronunciations to GetTaggedNames and pronunciation logic.

* updated getsign logic for pronunciations

* updated call to addedgeinfo

* added logic to process pronunciation_file and pronunciations.

* updated call to .GetNamesAndTypes

* updated call to addedgeinfo

* added logic to write out pronunciations.  Using tagged bit on signs now.

* updated call to addedgeinfo

* added logic to process pronunciations for ways.

* added logic to parse pronunciations.  Nodes data saved to existing node file.  Way data is saved to new pronunciation file

* updated call to addedgeinfo

* updated call to addedgeinfo

* just temp printing out pronunciations

* updated call to addedgeinfo

* added new pronunciation file

* added new kVerbal type.

* tagged bit is now in use.

* added is_tagged_, has_phoneme_, phoneme_start_index_, and phoneme_count_.  A sign can have multiple phoneme types.

* updated comment

* added pronunciations to AddEdgeInfo

* added pronunciation indexes.

* added pronunciation_file

* init copy

* init copy

* lint

* Added verbal type and value to street name and sign element

* clean up temp files

* added Jeita.

* updated GetPronunciationsMap().  Data is now separated by a #.  Updated GetNamesAndTypes to include tagged types but not verbals

* updated to kJeita.

* getsigns should of returned a unordered_multimap.  delimeter is now #.

* delimeter is now #.

* added name and ref set/get for jeita.

* added logic for jeita.

* added jeita indexes and accessor functions.

* added logic to set the jeita information.

* added types

* added logic to parse jeita information.

* clean up

* added logic to reference tagged names from the pronunciation map

* updated to use new function signatures.

* allow name nodes for netherlands only for testing

* updated to use new function signatures.

* new tests.

* removed verbal info from sign

* Added new Pronunciation message
Added Pronunciation to StreetName and TripSignElement

* refactored naming to match

* format fix

* refactrored for name consistency

* name refactor

* rename GetPronunciationsMulitMap

* missed one

* added GetPronunciationsMap

* updated to use edgeinfo.GetPronunciationsMap

* reordered.

* refactor

* Assign the street names pronunciations if they exist

* Updated StreetNamesToString and StreetNamesToParameterString

* updated test routes

* Added the GraphTile::GetSigns method using unordered_map for the index_pronunciation_map

* Changed the name of the internal GetSigns method to ProcessSigns

* update comment

* fix conflict

* Added the PopulateSignElement method which will populate sign pronunciation values
Refactored to use the PopulateSignElement method

* Added debug output for sign_element pronunciation

* GetPronunciationsMultiMap() no longer needed.  emplace does not allow updates.

* added logic for signs_on_node check.

* fixed bug where destination_ref_to were not getting added with phonemes

* updated and added more tests.

* updated comment

* route num = true means phoneme is for the node.

* fixed xsampa bug.

* lint

* changed to nt-sampa

* lint

* Added pronunciation class
Added pronunciation class to the street name and sign classes

* added maneuver street name debug output

* Assigned pronunciation to the signs

* update for street names debug

* Assigned pronunciation to street names in the maneuvers builder

* clan-tidy updates.

* clang tidy updates.

* clang tidy updates.

* Updated the DirectionsBuilder to assign the street name and sign pronunciations
Added gurka test to verify the street name and sign pronunciations

* Added another RAD test

* lint

* Update graphtilebuilder.cc

* Update graphtilebuilder.cc

* jeita --> x-jeita and katakana --> x-katakana

* Added MarkupFormatter class
Added phoneme markup for street names

* fixed format

* lint

* added config options for markup_formatter

* Added markup support for signs

* Verify toward sign pronunciation instructions
Also, fixed location for right side of street driving

* Test exit number, exit onto street, and street name pronunciation instructions

* Test ref and destination:ref pronunciation

* Added tests for destination:street:to and destination:ref:to

* added the CheckGuideSigns test

* Added the CheckJunctionName test

* fixed typo

* cleanup refactor

* Updated the quotes format and gurka tests

* refactor for multicue

* refactor markup formatter

* refactor FormatPhonemeElement

* Added 3132 to changelog

* update submodules

* example for how to handle nuls in strings by computing the width of each n ame info entry

* eagle eye duane

* so much simpler

* debugging wrong data in hierarchybuilder

* added logic to get the pronunciations

* updated init of the struct

* updated output.

* added pronunciation to the osrm serializer

* fixed issue where we were adding extra null chars

* added kNone.

* moved language to the top as kNone is set to 1.

* added kNone for PronunciationAlphabet Language

* format

* fixed debug output

* removed kNone from proto pronunciation alphabet

* only read to end of the pronunciation using the header.length_

* refactored to use new structure for phonemes

* save data out in new phoneme structure.

* updated to support new structure

* removed name.size() > 1 logic and clean up

* added comment

* reduced some code.  moved logic to AddPronunciation

* cleanup

* added AddPronunciation and renamed BuildPronunciations

* lint

* lint

* should start at IPA

* refactor PronunciationAlphabetToString

* Refactor Pronunciation_Alphabet_Name

* account for empty phonemes

* cp paste fail

* counts and begin indexes were off for pronunciations for signs when there are blank pronunciations.../valhalla/mjolnir/graphbuilder.h

* added count

* added tests for missing pronunciations for signs.

* removed unused code.

* adding more name check for blank names

* added TODO and addressed index issue.

* removed debug statements.

* added blank name testing.

* updated to include OSM pronunciation tags

* updated.

* cleanup

Co-authored-by: Duane Gearhart <[email protected]>
Co-authored-by: Kevin Kreiser <[email protected]>
  • Loading branch information
3 people authored Oct 7, 2021
1 parent ac824e5 commit 3652aca
Show file tree
Hide file tree
Showing 89 changed files with 6,520 additions and 618 deletions.
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@
* **Removed**
* **Bug Fix**
* **Enhancement**
* CHANGED: Pronunciation for names and destinations [#3132](https://github.com/valhalla/valhalla/pull/3132)

## Release Date: 2021-10-07 Valhalla 3.1.4
* **Removed**
Expand Down
24 changes: 13 additions & 11 deletions proto/sign.proto
Original file line number Diff line number Diff line change
@@ -1,21 +1,23 @@
syntax = "proto2";
option optimize_for = LITE_RUNTIME;
package valhalla;
import public "tripcommon.proto";

message TripSignElement {
optional string text = 1; // The actual sign element text, examples: I 95 North or Derry Street
optional bool is_route_number = 2; // true if sign element is a reference route number such as: I 81 South or US 322 West
optional uint32 consecutive_count = 3; // The frequency of this sign element within a set a consecutive signs
optional string text = 1; // The actual sign element text, examples: I 95 North or Derry Street
optional bool is_route_number = 2; // true if sign element is a reference route number such as: I 81 South or US 322 West
optional uint32 consecutive_count = 3; // The frequency of this sign element within a set a consecutive signs
optional Pronunciation pronunciation = 4; // The pronunciation associated with this sign element
}

message TripSign {
repeated TripSignElement exit_numbers = 1; // The list of exit numbers, example: 67B
repeated TripSignElement exit_onto_streets = 2; // The list of exit branch street names, examples: I 95 North or Baltimore-Washington Parkway
repeated TripSignElement exit_toward_locations = 3; // The list of exit toward locations, examples: New York or I 395 South
repeated TripSignElement exit_names = 4; // The list of exit names - not used much in US, example: Gettysburg Pike
repeated TripSignElement guide_onto_streets = 5; // The list of guide branch street names, examples: US 22 West or Baltimore-Washington Parkway
repeated TripSignElement guide_toward_locations = 6; // The list of guide toward locations, examples: Lewistown or US 15
repeated TripSignElement junction_names = 7; // The list of junction names, examples: 万年橋東 or Mannenbashi East
repeated TripSignElement guidance_view_junctions = 8; // The list of guidance view junctions, examples: AB12345;1 or AB12345;E
repeated TripSignElement exit_numbers = 1; // The list of exit numbers, example: 67B
repeated TripSignElement exit_onto_streets = 2; // The list of exit branch street names, examples: I 95 North or Baltimore-Washington Parkway
repeated TripSignElement exit_toward_locations = 3; // The list of exit toward locations, examples: New York or I 395 South
repeated TripSignElement exit_names = 4; // The list of exit names - not used much in US, example: Gettysburg Pike
repeated TripSignElement guide_onto_streets = 5; // The list of guide branch street names, examples: US 22 West or Baltimore-Washington Parkway
repeated TripSignElement guide_toward_locations = 6; // The list of guide toward locations, examples: Lewistown or US 15
repeated TripSignElement junction_names = 7; // The list of junction names, examples: 万年橋東 or Mannenbashi East
repeated TripSignElement guidance_view_junctions = 8; // The list of guidance view junctions, examples: AB12345;1 or AB12345;E
repeated TripSignElement guidance_view_signboards = 9; // The list of guidance view signboards, examples: SI_721701166;1 or SI_721701166;2
}
16 changes: 14 additions & 2 deletions proto/tripcommon.proto
Original file line number Diff line number Diff line change
Expand Up @@ -129,9 +129,21 @@ message TransitPlatformInfo {
optional string station_name = 9; // The station name of the platform
}

message Pronunciation {
enum Alphabet {
kIpa = 1;
kXKatakana = 2;
kXJeita = 3;
kNtSampa = 4;
}
optional Alphabet alphabet = 1 [default = kIpa];
optional string value = 2;
}

message StreetName {
optional string value = 1; // The actual street name value, examples: I 95 North or Derry Street
optional bool is_route_number = 2; // true if the street name is a reference route number such as: I 81 South or US 322 West
optional string value = 1; // The actual street name value, examples: I 95 North or Derry Street
optional bool is_route_number = 2; // true if the street name is a reference route number such as: I 81 South or US 322 West
optional Pronunciation pronunciation = 3; // The pronunciation associated with this street name
}

message TurnLane {
Expand Down
8 changes: 8 additions & 0 deletions scripts/valhalla_build_config
Original file line number Diff line number Diff line change
Expand Up @@ -106,6 +106,10 @@ config = {
},
'service': {
'proxy': 'ipc:///tmp/odin'
},
'markup_formatter': {
'markup_enabled': False,
'phoneme_format': '<TEXTUAL_STRING> (<span class=<QUOTES>phoneme<QUOTES>>/<VERBAL_STRING>/</span>)'
}
},
'meili': {
Expand Down Expand Up @@ -368,6 +372,10 @@ help_text = {
},
'service': {
'proxy': 'IPC linux domain socket file location'
},
'markup_formatter': {
'markup_enabled': 'Boolean flag to use markup formatting',
'phoneme_format': 'The phoneme format string that will be used by street names and signs'
}
},
'meili': {
Expand Down
1 change: 1 addition & 0 deletions src/baldr/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -54,6 +54,7 @@ set(sources
location.cc
pathlocation.cc
predictedspeeds.cc
pronunciation.cc
tilehierarchy.cc
turn.cc
shortcut_recovery.h
Expand Down
128 changes: 104 additions & 24 deletions src/baldr/edgeinfo.cc
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@ using namespace valhalla::baldr;
namespace {

// should return true for any tags which we should consider "named"
// do not return TaggedValue::kPronunciation
bool IsNameTag(char ch) {
static const std::unordered_set<TaggedValue> kNameTags = {TaggedValue::kBridge,
TaggedValue::kTunnel};
Expand Down Expand Up @@ -78,34 +79,68 @@ NameInfo EdgeInfo::GetNameInfo(uint8_t index) const {

// Get a list of names
std::vector<std::string> EdgeInfo::GetNames() const {
return GetTaggedValuesOrNames(false);
}
// Get each name
std::vector<std::string> names;
names.reserve(name_count());
const NameInfo* ni = name_info_list_;
for (uint32_t i = 0; i < name_count(); i++, ni++) {
if (ni->tagged_)
continue;

std::vector<std::string> EdgeInfo::GetTaggedValues() const {
return GetTaggedValuesOrNames(true);
if (ni->name_offset_ < names_list_length_) {
names.push_back(names_list_ + ni->name_offset_);
} else {
throw std::runtime_error("GetNames: offset exceeds size of text list");
}
}
return names;
}

std::vector<std::string> EdgeInfo::GetTaggedValuesOrNames(bool only_tagged_values) const {
// Get a list of tagged names
std::vector<std::string> EdgeInfo::GetTaggedValues(bool only_pronunciations) const {
// Get each name
std::vector<std::string> names;
names.reserve(name_count());
const NameInfo* ni = name_info_list_;
for (uint32_t i = 0; i < name_count(); i++, ni++) {
if ((only_tagged_values && !ni->tagged_) || (!only_tagged_values && ni->tagged_)) {
if (!ni->tagged_)
continue;
}

if (ni->name_offset_ < names_list_length_) {
names.emplace_back(names_list_ + ni->name_offset_);
const auto* name = names_list_ + ni->name_offset_;
try {
TaggedValue tv = static_cast<baldr::TaggedValue>(name[0]);
if (tv == baldr::TaggedValue::kPronunciation) {
if (!only_pronunciations)
continue;

size_t pos = 1;
while (pos < strlen(name)) {
const auto& header = *reinterpret_cast<const linguistic_text_header_t*>(name + pos);
pos += 3;
names.emplace_back((std::string(reinterpret_cast<const char*>(&header), 3) +
std::string((name + pos), header.length_)));

pos += header.length_;
}

} else if (!only_pronunciations) {
names.push_back(name);
}
} catch (const std::invalid_argument& arg) {
LOG_DEBUG("invalid_argument thrown for name: " + std::string(name));
}
} else {
throw std::runtime_error("GetNames: offset exceeds size of text list");
throw std::runtime_error("GetTaggedNames: offset exceeds size of text list");
}
}
return names;
}

// Get a list of names
std::vector<std::pair<std::string, bool>>
EdgeInfo::GetNamesAndTypes(bool include_tagged_values) const {
EdgeInfo::GetNamesAndTypes(std::vector<uint8_t>& types, bool include_tagged_values) const {

// Get each name
std::vector<std::pair<std::string, bool>> name_type_pairs;
name_type_pairs.reserve(name_count());
Expand All @@ -118,17 +153,19 @@ EdgeInfo::GetNamesAndTypes(bool include_tagged_values) const {
if (ni->tagged_) {
if (ni->name_offset_ < names_list_length_) {
std::string name = names_list_ + ni->name_offset_;
if (name.size() > 1 && IsNameTag(name[0])) {
try {
try {
if (IsNameTag(name[0])) {
name_type_pairs.push_back({name.substr(1), false});
} catch (const std::invalid_argument& arg) {
LOG_DEBUG("invalid_argument thrown for name: " + name);
types.push_back(static_cast<uint8_t>(name.at(0)));
}
} catch (const std::invalid_argument& arg) {
LOG_DEBUG("invalid_argument thrown for name: " + name);
}
} else
throw std::runtime_error("GetNamesAndTypes: offset exceeds size of text list");
} else if (ni->name_offset_ < names_list_length_) {
name_type_pairs.push_back({names_list_ + ni->name_offset_, ni->is_route_num_});
types.push_back(0);
} else {
throw std::runtime_error("GetNamesAndTypes: offset exceeds size of text list");
}
Expand All @@ -149,27 +186,70 @@ const std::multimap<TaggedValue, std::string>& EdgeInfo::GetTags() const {
if (ni->tagged_) {
if (ni->name_offset_ < names_list_length_) {
std::string name = names_list_ + ni->name_offset_;
if (name.size() > 1) {
uint8_t num = 0;
try {
num = static_cast<uint8_t>(name.at(0));
tag_cache_.emplace(static_cast<TaggedValue>(num), name.substr(1));
} catch (const std::logic_error& arg) {
LOG_DEBUG("logic_error thrown for name: " + name);
}
}
try {
TaggedValue tv = static_cast<baldr::TaggedValue>(name[0]);
if (tv != baldr::TaggedValue::kPronunciation)
tag_cache_.emplace(tv, name.substr(1));
} catch (const std::logic_error& arg) { LOG_DEBUG("logic_error thrown for name: " + name); }
} else {
throw std::runtime_error("GetTags: offset exceeds size of text list");
}
}
}

tag_cache_ready_ = true;
if (tag_cache_.size())
tag_cache_ready_ = true;
}

return tag_cache_;
}

std::unordered_map<uint8_t, std::pair<uint8_t, std::string>> EdgeInfo::GetPronunciationsMap() const {
std::unordered_map<uint8_t, std::pair<uint8_t, std::string>> index_pronunciation_map;
index_pronunciation_map.reserve(name_count());
const NameInfo* ni = name_info_list_;
for (uint32_t i = 0; i < name_count(); i++, ni++) {
if (!ni->tagged_)
continue;

if (ni->name_offset_ < names_list_length_) {
const auto* name = names_list_ + ni->name_offset_;
try {
TaggedValue tv = static_cast<baldr::TaggedValue>(name[0]);
if (tv == baldr::TaggedValue::kPronunciation) {
size_t pos = 1;
while (pos < strlen(name)) {
const auto& header = *reinterpret_cast<const linguistic_text_header_t*>(name + pos);
pos += 3;
std::unordered_map<uint8_t, std::pair<uint8_t, std::string>>::iterator iter =
index_pronunciation_map.find(header.name_index_);

if (iter == index_pronunciation_map.end())
index_pronunciation_map.emplace(
std::make_pair(header.name_index_,
std::make_pair(header.phonetic_alphabet_,
std::string((name + pos), header.length_))));
else {
if (header.phonetic_alphabet_ > (iter->second).first) {
iter->second = std::make_pair(header.phonetic_alphabet_,
std::string((name + pos), header.length_));
}
}

pos += header.length_;
}
}
} catch (const std::invalid_argument& arg) {
LOG_DEBUG("invalid_argument thrown for name: " + std::string(name));
}
} else {
throw std::runtime_error("GetPronunciationsMap: offset exceeds size of text list");
}
}

return index_pronunciation_map;
}

// Get the types. Are these names route numbers or not?
uint16_t EdgeInfo::GetTypes() const {
// Get the types.
Expand Down
Loading

0 comments on commit 3652aca

Please sign in to comment.