-
-
Notifications
You must be signed in to change notification settings - Fork 327
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
v.univar: add JSON support #3784
Conversation
Special naming of keys would be hard to use from another tool. It might be better to have a list, but since we need the value and the key, it would need a mapping, so something like that? : {
...
"percentile": {
"5": 0.536736,
"90": 0.7237272,
"98": 0.863662721,
"99.5": 0.916363
},
...
} |
A good exercise to know if the json output is really usable from a tool (to be able to be used in an automated way), is to try to create a real jsonschema. If it's too hard to describe the format with a json schema, then it probably isn't usable, and can't be validated against that schema. |
I tried with an online tool, https://jsonschema.net/app/schemas/456991 to see what it looks like (by generating the skeleton of a jsonschema). I had to change your example to be a valid json first, I took: {
"n": 55,
"missing": 55,
"nnull": 55,
"min": 1.234,
"max": 1.2345,
"range": 1.23456,
"sum": 1.234569,
"mean": 1.234567,
"mean_abs": 1.234568,
"population_stddev": 2.3456,
"population_variance": 2.34567,
"population_coeff_variation": 2.345678,
"sample_stddev": 4.5556,
"sample_variance": 6.7777,
"kurtosis": 3.4455,
"skewness": 6.8888,
"first_quartile": 8.8888,
"median": 7.777,
"third_quartile": 8.8882,
"percentile_90": 6.7776
} Gives the jsonschema: {
"$schema": "https://json-schema.org/draft/2019-09/schema",
"$id": "http://example.com/example.json",
"type": "object",
"default": {},
"title": "Root Schema",
"required": [
"n",
"missing",
"nnull",
"min",
"max",
"range",
"sum",
"mean",
"mean_abs",
"population_stddev",
"population_variance",
"population_coeff_variation",
"sample_stddev",
"sample_variance",
"kurtosis",
"skewness",
"first_quartile",
"median",
"third_quartile",
"percentile_90"
],
"properties": {
"n": {
"type": "integer",
"default": 0,
"title": "The n Schema",
"examples": [
55
]
},
"missing": {
"type": "integer",
"default": 0,
"title": "The missing Schema",
"examples": [
55
]
},
"nnull": {
"type": "integer",
"default": 0,
"title": "The nnull Schema",
"examples": [
55
]
},
"min": {
"type": "number",
"default": 0.0,
"title": "The min Schema",
"examples": [
1.234
]
},
"max": {
"type": "number",
"default": 0.0,
"title": "The max Schema",
"examples": [
1.2345
]
},
"range": {
"type": "number",
"default": 0.0,
"title": "The range Schema",
"examples": [
1.23456
]
},
"sum": {
"type": "number",
"default": 0.0,
"title": "The sum Schema",
"examples": [
1.234569
]
},
"mean": {
"type": "number",
"default": 0.0,
"title": "The mean Schema",
"examples": [
1.234567
]
},
"mean_abs": {
"type": "number",
"default": 0.0,
"title": "The mean_abs Schema",
"examples": [
1.234568
]
},
"population_stddev": {
"type": "number",
"default": 0.0,
"title": "The population_stddev Schema",
"examples": [
2.3456
]
},
"population_variance": {
"type": "number",
"default": 0.0,
"title": "The population_variance Schema",
"examples": [
2.34567
]
},
"population_coeff_variation": {
"type": "number",
"default": 0.0,
"title": "The population_coeff_variation Schema",
"examples": [
2.345678
]
},
"sample_stddev": {
"type": "number",
"default": 0.0,
"title": "The sample_stddev Schema",
"examples": [
4.5556
]
},
"sample_variance": {
"type": "number",
"default": 0.0,
"title": "The sample_variance Schema",
"examples": [
6.7777
]
},
"kurtosis": {
"type": "number",
"default": 0.0,
"title": "The kurtosis Schema",
"examples": [
3.4455
]
},
"skewness": {
"type": "number",
"default": 0.0,
"title": "The skewness Schema",
"examples": [
6.8888
]
},
"first_quartile": {
"type": "number",
"default": 0.0,
"title": "The first_quartile Schema",
"examples": [
8.8888
]
},
"median": {
"type": "number",
"default": 0.0,
"title": "The median Schema",
"examples": [
7.777
]
},
"third_quartile": {
"type": "number",
"default": 0.0,
"title": "The third_quartile Schema",
"examples": [
8.8882
]
},
"percentile_90": {
"type": "number",
"default": 0.0,
"title": "The percentile_90 Schema",
"examples": [
6.7776
]
}
},
"examples": [{
"n": 55,
"missing": 55,
"nnull": 55,
"min": 1.234,
"max": 1.2345,
"range": 1.23456,
"sum": 1.234569,
"mean": 1.234567,
"mean_abs": 1.234568,
"population_stddev": 2.3456,
"population_variance": 2.34567,
"population_coeff_variation": 2.345678,
"sample_stddev": 4.5556,
"sample_variance": 6.7777,
"kurtosis": 3.4455,
"skewness": 6.8888,
"first_quartile": 8.8888,
"median": 7.777,
"third_quartile": 8.8882,
"percentile_90": 6.7776
}]
} From this, we see that having hardcoded percentile keys probably isn't what we want. What about percentile 99.5 (a common percentile in environmental legislation) |
I was actually thinking I used two lists in db.univar, but it turns out I used whatever was in the shell scriptstyle format, so list of keys which depends on the input. It depends on the input, so you can figure the keys out. I think that was my logic there. It is terrible for writing a schema and that's a valid criticism. I think we can change than in a way which would be backwards compatible and make it consistent across all percentile outputs at the same time if we decide to go different direction. |
Was the output of db.univar with the two lists released yet? If not, it's not a breaking change yet. |
Yes, it was, in 8.3.0. 120f198 I think the situation now is: 1) We should do whatever is right here not to spread a bad design from one tool to another. 2) We might be able to add the better percentile output to v.db.univar depending on the keys we choose here. So, let's decide the percentiles here. I guess the following is lengthy, but easy to express in a schema: {
"percentiles": [
{"percentile": 95, "value": 200.03},
{"percentile": 99.9, "value": 220.01}
]
} |
It’s lengthy, yes, but I like the fact that a list is used here, as you can repeat the same percentile if needed, and it allows to keep the same order of percentiles requested in input. |
I checked v.db.univar/db.univar again and I used percentiles as two lists, not a mapping (I must have looked at a wrong piece of code before) nested together with everything else under statistics. The they give: {
"statistics": {
"percentiles": [95, 99.9],
"percentile_values": [200.03, 220.01]
}
} I still prefer to do it right rather than the same as in v.db.univar. Is the right solution one list of mappings rather than two lists? It seems that it is easier to just say there is a list rather than saying there are two lists of the same length. |
What do you this @kritibirda26, does the list of dictionaries look good to you? |
Hi @wenzeslaus and @echoix, Sorry for the delay in the response. The dictionaries for percentiles make sense to me. I'll update the format. |
Are you referring to the schema suggested by @echoix?
This is the approach I think we should take. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking good. Please update the percentiles, tests, and docs.
Yes, on it. |
Use parson to add json output format support to the v.univar module.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good!
@cwhite911 Should I make similar changes to PR updating r.univar as well? |
@wenzeslaus Hi! Can you also review this PR so that it can be merged? |
* Add JSON support to v.univar Use parson to add json output format support to the v.univar module. * update percentile format * rename test file
Use parson to add json output format support to the v.univar module.
Expected JSON schema:
Root is a JSON object. The percentile option allows the user to request specific percentile which is written as a percentile_%d key to JSON object.