Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parsing zero-padded numbers #169

Open
kklingenberg opened this issue Mar 17, 2024 · 6 comments
Open

Parsing zero-padded numbers #169

kklingenberg opened this issue Mar 17, 2024 · 6 comments

Comments

@kklingenberg
Copy link
Contributor

kklingenberg commented Mar 17, 2024

While parsing zero-padded numbers I came across this minor issue. This is a minimal example:

$ echo "0012" | jaq .
0
0
12

Whereas jq yields just 12.

This is serde_json at work, which in turn is probably following JSON's spec (is my guess). This is another view at the issue:

$ echo "0012" | jaq -R fromjson 
Error: cannot parse 0012 as JSON: end of file expected

Also, the lexer rejects these numbers too (which is fine, and consistent with the JSON parser). jq is also consistent with its lenient parser:

$ jaq -n '0012'
Error: Unexpected token, expected as, *, +=, /=, %=, >=, /, ?, %, and, =, or, +, -, |, [, end of input, ==, -=, |=, //, *=, <=, ., !=, ,, >, <
   ╭─[<unknown>:1:2]
   │
 1 │ 0012
   │  ┬  
   │  ╰── Unexpected token 0
───╯

$ jq -n '0012'
12

Anyway, while attempting to work with these numbers one could hope to use the tonumber filter, but that's also implemented in terms of fromjson, so no luck there.

My suggestion is to either:

  • document the non-leniency of the JSON parser, and the difference with jq's
  • provide a tonumber filter that's more tolerant
@kklingenberg
Copy link
Contributor Author

An example of another side-effect of the current implementation of tonumber:

$ echo '"{}"' | jaq tonumber
{}

@kklingenberg kklingenberg changed the title Parsing of zero-padded numbers Parsing zero-padded numbers Mar 17, 2024
@wader
Copy link
Contributor

wader commented Mar 17, 2024

Related jqlang/jq#3055 jq used to allow whitespaces for tonumber but not anymore

@kklingenberg
Copy link
Contributor Author

Interesting. So yet another side effect of tonumber just being fromjson is that it tolerates whitespace:

$ echo ' 12 ' | jaq -Rc '[., tonumber]'
[" 12 ",12]

@pkoppstein
Copy link

@kklingenberg - Good catch re jaq -n '"{}"|tonumber'. That's a bug that needs fixing.

Since different dialects of jq have and will probably continue to have very different implementations of tonumber,
I think it would be good if jaq could lead the way with respect to a non-strict version, and in that spirit
I'd like to propose that tonumber(regex) be defined using match/1, perhaps along the following lines:

def tonumber(regex): match(regex).string | sub("^00*"; "0") | strict_tonumber;

it being understood that strict_tonumber is a strict version of tonumber, i.e. it would result in an error if its string input does not conform to the JSON specification of a number.

@01mf02
Copy link
Owner

01mf02 commented Apr 9, 2024

Regarding the "weird" number parsing behaviour for "0012": This is unfortunate, I agree, but it stems from the fact that sequences of JSON values are not standardised (I believe). First, JSON numbers cannot have multiple leading 0s, as we can see by the JSON spec, so as soon as a leading 0 is not followed by [1-9] or [.eE], we know that we are dealing with just the number 0, and everything else is part of a new value. Next, jq allows values to be concatenated without whitespace, such as [1][2]. So I generalised this to allowing concatenation of any JSON values without whitespace. That includes numbers, and this is responsible for the behaviour exposed by parsing "0012".
I'm not saying that this behaviour is very intuitive. But I think that it is consistent.

@01mf02
Copy link
Owner

01mf02 commented Apr 9, 2024

Regarding tonumber, I still have to think a bit about how to do this best ...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants