Closest Match for Punjabi (Pakistan) Not Resolving Match #59

joe-sciame-wm · 2022-11-28T00:07:23Z

I'm attempting to match a language code 'pa' with another language code 'pa-PK'.

def test_language_less_than():
    spoken_language_1 = 'pa'
    spoken_language_2 = 'pa-PK'
    match = closest_match(spoken_language_1, [spoken_language_2])
    print(match)
    self.assertEqual(0, match[1])`

def test_language_more_than(self):
    spoken_language_1 = 'pa-PK'
    spoken_language_2 = 'pa'
    match = closest_match(spoken_language_1, [spoken_language_2])
    print(match)
    self.assertEqual(0, match[1])`

This returns

('und', 1000)
('und', 1000)

I would expect this to return a match and not None. When I debug the library, I see the following which returns 54 from the tuple_distance_cached function.

desired_triple = ('pa', 'Arab', 'PK')
supported_triple = ('pa', 'Guru', 'IN')

The text was updated successfully, but these errors were encountered:

joe-sciame-wm · 2022-11-28T01:59:48Z

I believe the issue here is that the maximize() language function is resolving pa and pa-PK to different maximized languages. I'm not a linguistic expert so I don't know if this is correct or not.

'pa': 'pa-Guru-IN',
'pa-PK': 'pa-Arab-PK',

BrightXiaoHan · 2023-03-08T06:40:11Z

Similar issue here.

In [4]: langcodes.get("ko").language_name()
Out[4]: 'Korean'

In [5]: langcodes.get("kor_Hang").language_name()
Out[5]: 'Korean'

In [6]: langcodes.closest_match("ko", ["kor_Hang"])
Out[6]: ('und', 1000)

georgkrause · 2024-04-08T11:38:26Z

@BrightXiaoHan @joe-sciame-wm Thank you for the input! There is likely something to improve here. If I had to guess, I think the reason for this commit was exactly the problem you are describing: georgkrause@59326f8

Some formal hint: I took over the package and I am working on updating it here: https://github.com/georgkrause/langcodes
Sadly I cannot move issues, so I created a new one and maybe we can proceed the discussion there.

zhu · 2024-04-11T02:10:26Z

I think script tag is unnecessary when matching spoken languages.
Maybe add a ignore_script argument to closest_match function?

georgkrause mentioned this issue Apr 8, 2024

Closest Match for Punjabi (Pakistan) Not Resolving Match georgkrause/langcodes#5

Closed

mtrd3v mentioned this issue Jul 27, 2024

Added ignore_script and tested it. georgkrause/langcodes#17

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Closest Match for Punjabi (Pakistan) Not Resolving Match #59

Closest Match for Punjabi (Pakistan) Not Resolving Match #59

joe-sciame-wm commented Nov 28, 2022 •

edited

Loading

joe-sciame-wm commented Nov 28, 2022

BrightXiaoHan commented Mar 8, 2023

georgkrause commented Apr 8, 2024

zhu commented Apr 11, 2024

Closest Match for Punjabi (Pakistan) Not Resolving Match #59

Closest Match for Punjabi (Pakistan) Not Resolving Match #59

Comments

joe-sciame-wm commented Nov 28, 2022 • edited Loading

joe-sciame-wm commented Nov 28, 2022

BrightXiaoHan commented Mar 8, 2023

georgkrause commented Apr 8, 2024

zhu commented Apr 11, 2024

joe-sciame-wm commented Nov 28, 2022 •

edited

Loading