Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Datetime date from CLDR data #329

Merged
merged 13 commits into from
Nov 15, 2024

Conversation

sven-oly
Copy link
Collaborator

@sven-oly sven-oly commented Oct 11, 2024

Ready for review.

@sven-oly sven-oly marked this pull request as draft October 11, 2024 18:46
@sven-oly
Copy link
Collaborator Author

sven-oly commented Oct 11, 2024

I need some help converting these date/time data into UTC "Z" format. Then we can compute the offset seconds too!

Right now, testing this in ICU75 causes all ICU4C, ICU4J and NodeJS tests to fail or have errors.

@sven-oly
Copy link
Collaborator Author

Note that the CLDR expected test results may be incorrect in some cases:

  1. date/time styles full/full have "at" between the date and time output in ICU4C. The expected result isn't correct.

Expected: July 2, 2001, 5:44:15 PM GMT+4:30
Actual: July 2, 2001 at 5:44:15 PM GMT+4:30

{
"dateLength": "long",
"calendar": "gregorian",
"locale": "en-US",
"input": "2001-07-02T17:44:15+04:30[Asia/Tehran]",
"expected": "July 2, 2001, 5:44:15 PM GMT+4:30"
},

  1. In some cases , there's GMT in the expected
    March 7, 2024, 11:30:01 AM GMT+3:30
    March 7, 2024 at 11:30 AM

{
"dateLength": "long",
"calendar": "gregorian",
"locale": "en-US",
"input": "2024-03-07T11:30:01+03:30[Asia/Tehran]",
"expected": "March 7, 2024, 11:30:01 AM GMT+3:30"
},

{
"timeLength": "long",
"calendar": "gregorian",
"locale": "en-US",
"input": "2024-03-07T11:30:01+03:30[Asia/Tehran]",
"expected": "March 7, 2024, 11:30:01 AM GMT+3:30"
},

@sven-oly sven-oly marked this pull request as ready for review November 14, 2024 01:26
@sven-oly
Copy link
Collaborator Author

This now uses simple logic to identify the "known issue" of " at" replacing "," in some cases. Similarly, some other substitutions are caught.

Note also that the test data shows an ASCII comma as expected instead of the Arabic comma. These show up as test failures, but maybe it's a bug in the CLDR data.

replacements = {',': ' at', '،': ' في',
}

sm = SequenceMatcher(None, expected, actual)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a much more complicated known-issue catcher than I was expecting. I was expecting that we would put a blanket "known issue" on all failing inputs where the dateTimeFormatType is "standard".

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I'll take a look at the failures to see if this will capture the substitution of "," with " at" and similar replacements in other languages.

@sven-oly
Copy link
Collaborator Author

Categorize failing date time tests with 'dateTimeFormatType" == 'standard' to be a "Known Issue".

@sven-oly
Copy link
Collaborator Author

Ready for another look!

@sven-oly
Copy link
Collaborator Author

This classifies most of the date time format problems as "known issue" except for the different beween ASCII comma and arabic comma.

@@ -731,7 +731,8 @@ def characterize_datetime_tests(self, test_list, results):
if 'skeleton' in test['input_data']:
skeleton_str = 'skeleton: ' + test['input_data']['skeleton']
results.setdefault(skeleton_str, []).append(label)

if 'dateTimeFormatType' in test:
results.setdefault('dateTimeFormatType: ' + test['dateTimeFormatType'], [])
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you need to chain a .append(label) to this operation? otherwise, as it is, .setdefault() doesn't actually do anything to the map, it just gives a default value operate on when the key doesn't yet exist

Copy link
Collaborator

@echeran echeran left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@sven-oly sven-oly merged commit 2300fe3 into unicode-org:main Nov 15, 2024
6 checks passed
@sven-oly sven-oly deleted the datetime_date_from_cldr branch November 15, 2024 21:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants