v.univar: add JSON support #3784

kritibirda26 · 2024-06-07T14:44:43Z

Use parson to add json output format support to the v.univar module.

Expected JSON schema:
Root is a JSON object. The percentile option allows the user to request specific percentile which is written as a percentile_%d key to JSON object.

{
    "n": <int>,
    "missing": <int>,
    "nnull": <int>,
    "min": <double>,
    "max": <double>,
    "range": <double>,
    "sum": <double>,
    "mean": <double>,
    "mean_abs": <double>,
    "population_stddev": <double>,
    "population_variance": <double>,
    "population_coeff_variation": <double>,
    "sample_stddev": <double>,
    "sample_variance": <double>,
    "kurtosis": <double>,
    "skewness": <double>,
    "first_quartile": <double>,
    "median": <double>,
    "third_quartile": <double>,
    "percentile_90": <double>
}

echoix · 2024-06-07T15:04:55Z

Special naming of keys would be hard to use from another tool. It might be better to have a list, but since we need the value and the key, it would need a mapping, so something like that? :

{
  ...
  "percentile": { 
    "5":  0.536736,
    "90": 0.7237272,
    "98": 0.863662721,
    "99.5": 0.916363
   },
  ...
}

echoix · 2024-06-07T15:07:31Z

A good exercise to know if the json output is really usable from a tool (to be able to be used in an automated way), is to try to create a real jsonschema. If it's too hard to describe the format with a json schema, then it probably isn't usable, and can't be validated against that schema.

echoix · 2024-06-07T15:21:06Z

I tried with an online tool, https://jsonschema.net/app/schemas/456991 to see what it looks like (by generating the skeleton of a jsonschema). I had to change your example to be a valid json first, I took:

{
    "n": 55,
    "missing": 55,
    "nnull": 55,
    "min": 1.234,
    "max": 1.2345,
    "range": 1.23456,
    "sum": 1.234569,
    "mean": 1.234567,
    "mean_abs": 1.234568,
    "population_stddev": 2.3456,
    "population_variance": 2.34567,
    "population_coeff_variation": 2.345678,
    "sample_stddev": 4.5556,
    "sample_variance": 6.7777,
    "kurtosis": 3.4455,
    "skewness": 6.8888,
    "first_quartile": 8.8888,
    "median": 7.777,
    "third_quartile": 8.8882,
    "percentile_90": 6.7776
}

Gives the jsonschema:

{
    "$schema": "https://json-schema.org/draft/2019-09/schema",
    "$id": "http://example.com/example.json",
    "type": "object",
    "default": {},
    "title": "Root Schema",
    "required": [
        "n",
        "missing",
        "nnull",
        "min",
        "max",
        "range",
        "sum",
        "mean",
        "mean_abs",
        "population_stddev",
        "population_variance",
        "population_coeff_variation",
        "sample_stddev",
        "sample_variance",
        "kurtosis",
        "skewness",
        "first_quartile",
        "median",
        "third_quartile",
        "percentile_90"
    ],
    "properties": {
        "n": {
            "type": "integer",
            "default": 0,
            "title": "The n Schema",
            "examples": [
                55
            ]
        },
        "missing": {
            "type": "integer",
            "default": 0,
            "title": "The missing Schema",
            "examples": [
                55
            ]
        },
        "nnull": {
            "type": "integer",
            "default": 0,
            "title": "The nnull Schema",
            "examples": [
                55
            ]
        },
        "min": {
            "type": "number",
            "default": 0.0,
            "title": "The min Schema",
            "examples": [
                1.234
            ]
        },
        "max": {
            "type": "number",
            "default": 0.0,
            "title": "The max Schema",
            "examples": [
                1.2345
            ]
        },
        "range": {
            "type": "number",
            "default": 0.0,
            "title": "The range Schema",
            "examples": [
                1.23456
            ]
        },
        "sum": {
            "type": "number",
            "default": 0.0,
            "title": "The sum Schema",
            "examples": [
                1.234569
            ]
        },
        "mean": {
            "type": "number",
            "default": 0.0,
            "title": "The mean Schema",
            "examples": [
                1.234567
            ]
        },
        "mean_abs": {
            "type": "number",
            "default": 0.0,
            "title": "The mean_abs Schema",
            "examples": [
                1.234568
            ]
        },
        "population_stddev": {
            "type": "number",
            "default": 0.0,
            "title": "The population_stddev Schema",
            "examples": [
                2.3456
            ]
        },
        "population_variance": {
            "type": "number",
            "default": 0.0,
            "title": "The population_variance Schema",
            "examples": [
                2.34567
            ]
        },
        "population_coeff_variation": {
            "type": "number",
            "default": 0.0,
            "title": "The population_coeff_variation Schema",
            "examples": [
                2.345678
            ]
        },
        "sample_stddev": {
            "type": "number",
            "default": 0.0,
            "title": "The sample_stddev Schema",
            "examples": [
                4.5556
            ]
        },
        "sample_variance": {
            "type": "number",
            "default": 0.0,
            "title": "The sample_variance Schema",
            "examples": [
                6.7777
            ]
        },
        "kurtosis": {
            "type": "number",
            "default": 0.0,
            "title": "The kurtosis Schema",
            "examples": [
                3.4455
            ]
        },
        "skewness": {
            "type": "number",
            "default": 0.0,
            "title": "The skewness Schema",
            "examples": [
                6.8888
            ]
        },
        "first_quartile": {
            "type": "number",
            "default": 0.0,
            "title": "The first_quartile Schema",
            "examples": [
                8.8888
            ]
        },
        "median": {
            "type": "number",
            "default": 0.0,
            "title": "The median Schema",
            "examples": [
                7.777
            ]
        },
        "third_quartile": {
            "type": "number",
            "default": 0.0,
            "title": "The third_quartile Schema",
            "examples": [
                8.8882
            ]
        },
        "percentile_90": {
            "type": "number",
            "default": 0.0,
            "title": "The percentile_90 Schema",
            "examples": [
                6.7776
            ]
        }
    },
    "examples": [{
        "n": 55,
        "missing": 55,
        "nnull": 55,
        "min": 1.234,
        "max": 1.2345,
        "range": 1.23456,
        "sum": 1.234569,
        "mean": 1.234567,
        "mean_abs": 1.234568,
        "population_stddev": 2.3456,
        "population_variance": 2.34567,
        "population_coeff_variation": 2.345678,
        "sample_stddev": 4.5556,
        "sample_variance": 6.7777,
        "kurtosis": 3.4455,
        "skewness": 6.8888,
        "first_quartile": 8.8888,
        "median": 7.777,
        "third_quartile": 8.8882,
        "percentile_90": 6.7776
    }]
}

From this, we see that having hardcoded percentile keys probably isn't what we want. What about percentile 99.5 (a common percentile in environmental legislation)

wenzeslaus · 2024-06-07T20:04:44Z

Special naming of keys would be hard to use from another tool. It might be better to have a list, but since we need the value and the key, it would need a mapping...

I was actually thinking I used two lists in db.univar, but it turns out I used whatever was in the shell scriptstyle format, so list of keys which depends on the input. It depends on the input, so you can figure the keys out. I think that was my logic there. It is terrible for writing a schema and that's a valid criticism. I think we can change than in a way which would be backwards compatible and make it consistent across all percentile outputs at the same time if we decide to go different direction.

echoix · 2024-06-07T20:12:58Z

I was actually thinking I used two lists in db.univar, but it turns out I used whatever was in the shell scriptstyle format, so list of keys which depends on the input. It depends on the input, so you can figure the keys out. I think that was my logic there. It is terrible for writing a schema and that's a valid criticism. I think we can change than in a way which would be backwards compatible and make it consistent across all percentile outputs at the same time if we decide to go different direction.

Was the output of db.univar with the two lists released yet? If not, it's not a breaking change yet.

wenzeslaus · 2024-06-11T00:43:41Z

Was the output of db.univar with the two lists released yet?

Yes, it was, in 8.3.0. 120f198

I think the situation now is: 1) We should do whatever is right here not to spread a bad design from one tool to another. 2) We might be able to add the better percentile output to v.db.univar depending on the keys we choose here.

So, let's decide the percentiles here.

I guess the following is lengthy, but easy to express in a schema:

{
"percentiles": [
    {"percentile": 95, "value": 200.03},
    {"percentile": 99.9, "value": 220.01}
]
}

echoix · 2024-06-11T00:50:39Z

Was the output of db.univar with the two lists released yet?

Yes, it was, in 8.3.0. 120f198

I think the situation now is: 1) We should do whatever is right here not to spread a bad design from one tool to another. 2) We might be able to add the better percentile output to v.db.univar depending on the keys we choose here.

So, let's decide the percentiles here.

I guess the following is lengthy, but easy to express in a schema:
{
"percentiles": [
    {"percentile": 95, "value": 200.03},
    {"percentile": 99.9, "value": 220.01}
]
}

It’s lengthy, yes, but I like the fact that a list is used here, as you can repeat the same percentile if needed, and it allows to keep the same order of percentiles requested in input.
I didn’t have an idea how to keep the ordering in my first example.

wenzeslaus · 2024-06-11T01:21:13Z

I checked v.db.univar/db.univar again and I used percentiles as two lists, not a mapping (I must have looked at a wrong piece of code before) nested together with everything else under statistics. The they give:

{
  "statistics": {
    "percentiles": [95, 99.9],
    "percentile_values": [200.03, 220.01]
  }
}

I still prefer to do it right rather than the same as in v.db.univar. Is the right solution one list of mappings rather than two lists? It seems that it is easier to just say there is a list rather than saying there are two lists of the same length.

wenzeslaus · 2024-06-11T02:21:19Z

What do you this @kritibirda26, does the list of dictionaries look good to you?

kritibirda26 · 2024-06-11T20:26:44Z

Hi @wenzeslaus and @echoix, Sorry for the delay in the response. The dictionaries for percentiles make sense to me. I'll update the format.

cwhite911 · 2024-06-15T09:24:45Z

Hi @wenzeslaus and @echoix, Sorry for the delay in the response. The dictionaries for percentiles make sense to me. I'll update the format.

Are you referring to the schema suggested by @echoix?

{
"percentiles": [
    {"percentile": 95, "value": 200.03},
    {"percentile": 99.9, "value": 220.01}
]
}

This is the approach I think we should take.

cwhite911

Looking good. Please update the percentiles, tests, and docs.

vector/v.univar/main.c

vector/v.univar/v.univar.html

kritibirda26 · 2024-06-17T14:09:17Z

Yes, on it.

Use parson to add json output format support to the v.univar module.

cwhite911

Looks good!

kritibirda26 · 2024-06-20T18:42:01Z

@cwhite911 Should I make similar changes to PR updating r.univar as well?

kritibirda26 · 2024-06-28T15:23:49Z

@wenzeslaus Hi! Can you also review this PR so that it can be merged?

vector/v.univar/testsuite/v_univar_test.py

* Add JSON support to v.univar Use parson to add json output format support to the v.univar module. * update percentile format * rename test file

github-actions bot added vector Related to vector data processing Python Related code is in Python C Related code is in C HTML Related code is in HTML module docs tests Related to Test Suite labels Jun 7, 2024

cwhite911 mentioned this pull request Jun 15, 2024

Add JSON and YAML C library dependency #3020

Closed

46 tasks

cwhite911 suggested changes Jun 17, 2024

View reviewed changes

vector/v.univar/main.c Outdated Show resolved Hide resolved

vector/v.univar/v.univar.html Outdated Show resolved Hide resolved

kritibirda26 added 2 commits June 17, 2024 20:12

Add JSON support to v.univar

89aa2cd

Use parson to add json output format support to the v.univar module.

update percentile format

9d2cb81

kritibirda26 force-pushed the v.univar branch from 332fce8 to 9d2cb81 Compare June 17, 2024 19:21

cwhite911 approved these changes Jun 19, 2024

View reviewed changes

Merge branch 'main' into v.univar

f315010

cwhite911 suggested changes Jul 1, 2024

View reviewed changes

vector/v.univar/testsuite/v_univar_test.py Outdated Show resolved Hide resolved

rename test file

d58c77e

kritibirda26 requested a review from cwhite911 July 2, 2024 12:05

Merge branch 'main' into v.univar

f3b4a59

cwhite911 approved these changes Jul 3, 2024

View reviewed changes

echoix approved these changes Jul 3, 2024

View reviewed changes

echoix merged commit cc75269 into OSGeo:main Jul 3, 2024
26 checks passed

echoix added this to the 8.5.0 milestone Jul 4, 2024

kritibirda26 deleted the v.univar branch July 6, 2024 13:19

a0x8o pushed a commit to a0x8o/grass that referenced this pull request Jul 23, 2024

v.univar: add JSON support (OSGeo#3784)

74fb85b

* Add JSON support to v.univar Use parson to add json output format support to the v.univar module. * update percentile format * rename test file

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v.univar: add JSON support #3784

v.univar: add JSON support #3784

kritibirda26 commented Jun 7, 2024 •

edited by echoix

Loading

echoix commented Jun 7, 2024

echoix commented Jun 7, 2024

echoix commented Jun 7, 2024 •

edited

Loading

wenzeslaus commented Jun 7, 2024

echoix commented Jun 7, 2024

wenzeslaus commented Jun 11, 2024

echoix commented Jun 11, 2024

wenzeslaus commented Jun 11, 2024

wenzeslaus commented Jun 11, 2024

kritibirda26 commented Jun 11, 2024 •

edited

Loading

cwhite911 commented Jun 15, 2024

cwhite911 left a comment

kritibirda26 commented Jun 17, 2024

cwhite911 left a comment

kritibirda26 commented Jun 20, 2024

kritibirda26 commented Jun 28, 2024

v.univar: add JSON support #3784

v.univar: add JSON support #3784

Conversation

kritibirda26 commented Jun 7, 2024 • edited by echoix Loading

echoix commented Jun 7, 2024

echoix commented Jun 7, 2024

echoix commented Jun 7, 2024 • edited Loading

wenzeslaus commented Jun 7, 2024

echoix commented Jun 7, 2024

wenzeslaus commented Jun 11, 2024

echoix commented Jun 11, 2024

wenzeslaus commented Jun 11, 2024

wenzeslaus commented Jun 11, 2024

kritibirda26 commented Jun 11, 2024 • edited Loading

cwhite911 commented Jun 15, 2024

cwhite911 left a comment

Choose a reason for hiding this comment

kritibirda26 commented Jun 17, 2024

cwhite911 left a comment

Choose a reason for hiding this comment

kritibirda26 commented Jun 20, 2024

kritibirda26 commented Jun 28, 2024

kritibirda26 commented Jun 7, 2024 •

edited by echoix

Loading

echoix commented Jun 7, 2024 •

edited

Loading

kritibirda26 commented Jun 11, 2024 •

edited

Loading