Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LikelySubtags: - minimizeFavorScript --> unsupported. Fixes many errors in ICU4C, ICU4J, and Node #371

Merged
merged 11 commits into from
Jan 7, 2025

Conversation

sven-oly
Copy link
Collaborator

@sven-oly sven-oly commented Jan 2, 2025

Reclassifying any with favor script to unsupported. Also setting result for all "qaa" locale items to "FAIL" because these are note valid language tags.

Note that there are still errors reported in #205 that need to be examined.

Also, ICU4X still has issues.

@sven-oly
Copy link
Collaborator Author

sven-oly commented Jan 2, 2025

#205

@sven-oly sven-oly requested review from echeran and sffc January 3, 2025 00:29
@sven-oly
Copy link
Collaborator Author

sven-oly commented Jan 3, 2025

Note that this now fixes all likely subtags in ICU4X 2.0, and almost all in ICU4X 1.3 - 1.5.

Please take a look!

// Check for special case of reserved codes for local use
const char* lang_code = displayLocale.getLanguage();
const string lang_code_string(lang_code);
if (lang_code_string >= "qaa" && lang_code_string <= "qtz") {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Issue: It's not correct for the executor to specifically sniff for qaa. There are many subtags that could cause this type of behavior, not just qaa. The executor should sniff for some other signal of a failure status

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I'll change the test generator to remove this data range.

@@ -1441,7 +1441,7 @@ und-Latn-MQ ; fr-Latn-MQ ; fr-MQ ;
und-Latn-MR ; fr-Latn-MR ; fr-MR ;
und-Latn-MS ; en-Latn-MS ; en-MS ;
und-Latn-MT ; mt-Latn-MT ; mt ;
und-Latn-MU ; en-Latn-MU ; en-MU ;
und-Latn-MU ; mfe-Latn-MU ; mfe ;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question: Why did this data change?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This data had not been updated correctly to maint/maint-76. This is now the correct test file.

@echeran
Copy link
Collaborator

echeran commented Jan 6, 2025

If qaa and other language codes are reserved, then we should consider whether they are valid test data to begin with. Given that the original source file with test data coming from CLDR (via ICU) just says FAIL in the place of the expected value when the input contains qaa, it's a sign that an error is expected. And I think it doesn't make sense to expect real world data about scripts and regions to interact with a reserved language code, we should remove that from the test data, I think.

@sven-oly
Copy link
Collaborator Author

sven-oly commented Jan 6, 2025

OK, I agree that we could/should removed the non-legal language codes in the test generator.

@sven-oly
Copy link
Collaborator Author

sven-oly commented Jan 6, 2025

Another question about this: should we mark minimize preferring script as "unsupported" for C and J?

There's no public API way to support this, although ICU4X does indeed allow preferring the script or the region.

@sven-oly
Copy link
Collaborator Author

sven-oly commented Jan 7, 2025

Note the qaa codes just generate the same data as the input. This is one way to fix this problem.

Comment on lines +40 to +42
output.error_type = "unsupported";
output.unsupported = e.getMessage();
output.error_detail = e.getMessage();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Issue: don't mark a test as unsupported without constraining the error type

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. I've updated the catch to be specific.

@sven-oly
Copy link
Collaborator Author

sven-oly commented Jan 7, 2025

PTAL

@sven-oly sven-oly merged commit cd5604f into unicode-org:main Jan 7, 2025
8 checks passed
@sven-oly sven-oly linked an issue Jan 10, 2025 that may be closed by this pull request
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Error handling for expected likely subtag errors
3 participants