Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Do not add "-" and "." as universal date separators #152

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

idpaterson
Copy link
Collaborator

In response to #150, the "-" and "." characters are no longer added to the locale date separator list for all of the expressions that make use of the date separators. The "-" character is still used for the yyyy-mm-dd pattern only, which as discussed previously is an international format.

I noticed that "." is a date separator in the base locale so this confusion between decimals and dates will continue to affect most users. That is really a domain issue – if the data you're processing is likely to always contain a date you would want to avoid false negatives, while someone parsing a date out of freeform text would prefer to avoid false positives. Fortunately after this change the '.' separator can be disabled by the developer if necessary.

This also required the removal of a few tests in the Australian locale since it does not define "." as a date separator.

Review on Reviewable

This character is not defined as a date separator in the locale config.
Restricts addition of "-" to yyyy-mm-dd format only.
@idpaterson idpaterson self-assigned this Dec 24, 2015
@idpaterson
Copy link
Collaborator Author

It seems like the simple datetime tests also fail under ICU since "." is not a date separator according to ICU. There are many countries that use "." as a date separator so I understand its inclusion in the base locale. However, the ICU tests are using the US locale which does not use that as a date separator.

Any thoughts on the best way to proceed here?

@bear
Copy link
Owner

bear commented Dec 28, 2015

I've always thought of ICU as the definitive answer to what should be considered the default for processing date/time text -- if "." is not shown at all to be a separator by ICU then we should not have it in the base.

It also sounds like we should change the tests to use something non US for the "with ICU" tests.

Note - i'm coming off of being sick over the holidays so apologies if I am mangling the context of your thoughts above

@idpaterson
Copy link
Collaborator Author

I see that there is a separate en_US locale that inherits from the base locale, so I updated it to use / only as the date separator. I think there is value in leaving . in the base locale since there seems to be about a 50/50 split (not accounting for relative population) of countries that use . and /. My non-definitive data source on this topic does not clarify whether these symbols are used only in full yyyy + mm + dd formatting or also in "shorthand" dates.

Perhaps the date separators are not universal enough to consider a single category of date separators. We use the same dateSep to match both full dates like yyyy-mm-dd and shorthand dates like mm/dd. The main problem is with the shorthand format so maybe having both a dateSep and shorthandDateSep? I don't know if that would be beneficial for any locales or if removing . from base is better. Not feeling so well here either 😞 sorry if this does not make sense.

@bear
Copy link
Owner

bear commented Dec 28, 2015

I am +1 for having a shorthandDateSep -- it gives the maximum flexibility to anyone adding to the locale support and doesn't add any more complexity IMO (heck, pdt is complex enough by default ;)

thanks for looking into this!

@bear
Copy link
Owner

bear commented Mar 17, 2016

@idpaterson i've lost track on what is needed for the PR to move forward to be honest - when you get some free time can you give it a quick look

@idpaterson idpaterson added this to the 3.0 milestone Nov 27, 2016
@idpaterson
Copy link
Collaborator Author

This needs to be revisited for v3 once the tests are sorted out. I think that the best way to move forward would be to use this as implemented above but rebased on v3.0, then separately consider whether the base locale should be modified to be less favorable toward US English and more toward worldwide majority formats.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants