Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

v.univar: add JSON support #3784

Merged
merged 5 commits into from
Jul 3, 2024
Merged

v.univar: add JSON support #3784

merged 5 commits into from
Jul 3, 2024

Conversation

kritibirda26
Copy link
Contributor

@kritibirda26 kritibirda26 commented Jun 7, 2024

Use parson to add json output format support to the v.univar module.

Expected JSON schema:
Root is a JSON object. The percentile option allows the user to request specific percentile which is written as a percentile_%d key to JSON object.

{
    "n": <int>,
    "missing": <int>,
    "nnull": <int>,
    "min": <double>,
    "max": <double>,
    "range": <double>,
    "sum": <double>,
    "mean": <double>,
    "mean_abs": <double>,
    "population_stddev": <double>,
    "population_variance": <double>,
    "population_coeff_variation": <double>,
    "sample_stddev": <double>,
    "sample_variance": <double>,
    "kurtosis": <double>,
    "skewness": <double>,
    "first_quartile": <double>,
    "median": <double>,
    "third_quartile": <double>,
    "percentile_90": <double>
}

@github-actions github-actions bot added vector Related to vector data processing Python Related code is in Python C Related code is in C HTML Related code is in HTML module docs tests Related to Test Suite labels Jun 7, 2024
@echoix
Copy link
Member

echoix commented Jun 7, 2024

Special naming of keys would be hard to use from another tool. It might be better to have a list, but since we need the value and the key, it would need a mapping, so something like that? :

{
  ...
  "percentile": { 
    "5":  0.536736,
    "90": 0.7237272,
    "98": 0.863662721,
    "99.5": 0.916363
   },
  ...
}

@echoix
Copy link
Member

echoix commented Jun 7, 2024

A good exercise to know if the json output is really usable from a tool (to be able to be used in an automated way), is to try to create a real jsonschema. If it's too hard to describe the format with a json schema, then it probably isn't usable, and can't be validated against that schema.

@echoix
Copy link
Member

echoix commented Jun 7, 2024

I tried with an online tool, https://jsonschema.net/app/schemas/456991 to see what it looks like (by generating the skeleton of a jsonschema). I had to change your example to be a valid json first, I took:

{
    "n": 55,
    "missing": 55,
    "nnull": 55,
    "min": 1.234,
    "max": 1.2345,
    "range": 1.23456,
    "sum": 1.234569,
    "mean": 1.234567,
    "mean_abs": 1.234568,
    "population_stddev": 2.3456,
    "population_variance": 2.34567,
    "population_coeff_variation": 2.345678,
    "sample_stddev": 4.5556,
    "sample_variance": 6.7777,
    "kurtosis": 3.4455,
    "skewness": 6.8888,
    "first_quartile": 8.8888,
    "median": 7.777,
    "third_quartile": 8.8882,
    "percentile_90": 6.7776
}

Gives the jsonschema:

{
    "$schema": "https://json-schema.org/draft/2019-09/schema",
    "$id": "http://example.com/example.json",
    "type": "object",
    "default": {},
    "title": "Root Schema",
    "required": [
        "n",
        "missing",
        "nnull",
        "min",
        "max",
        "range",
        "sum",
        "mean",
        "mean_abs",
        "population_stddev",
        "population_variance",
        "population_coeff_variation",
        "sample_stddev",
        "sample_variance",
        "kurtosis",
        "skewness",
        "first_quartile",
        "median",
        "third_quartile",
        "percentile_90"
    ],
    "properties": {
        "n": {
            "type": "integer",
            "default": 0,
            "title": "The n Schema",
            "examples": [
                55
            ]
        },
        "missing": {
            "type": "integer",
            "default": 0,
            "title": "The missing Schema",
            "examples": [
                55
            ]
        },
        "nnull": {
            "type": "integer",
            "default": 0,
            "title": "The nnull Schema",
            "examples": [
                55
            ]
        },
        "min": {
            "type": "number",
            "default": 0.0,
            "title": "The min Schema",
            "examples": [
                1.234
            ]
        },
        "max": {
            "type": "number",
            "default": 0.0,
            "title": "The max Schema",
            "examples": [
                1.2345
            ]
        },
        "range": {
            "type": "number",
            "default": 0.0,
            "title": "The range Schema",
            "examples": [
                1.23456
            ]
        },
        "sum": {
            "type": "number",
            "default": 0.0,
            "title": "The sum Schema",
            "examples": [
                1.234569
            ]
        },
        "mean": {
            "type": "number",
            "default": 0.0,
            "title": "The mean Schema",
            "examples": [
                1.234567
            ]
        },
        "mean_abs": {
            "type": "number",
            "default": 0.0,
            "title": "The mean_abs Schema",
            "examples": [
                1.234568
            ]
        },
        "population_stddev": {
            "type": "number",
            "default": 0.0,
            "title": "The population_stddev Schema",
            "examples": [
                2.3456
            ]
        },
        "population_variance": {
            "type": "number",
            "default": 0.0,
            "title": "The population_variance Schema",
            "examples": [
                2.34567
            ]
        },
        "population_coeff_variation": {
            "type": "number",
            "default": 0.0,
            "title": "The population_coeff_variation Schema",
            "examples": [
                2.345678
            ]
        },
        "sample_stddev": {
            "type": "number",
            "default": 0.0,
            "title": "The sample_stddev Schema",
            "examples": [
                4.5556
            ]
        },
        "sample_variance": {
            "type": "number",
            "default": 0.0,
            "title": "The sample_variance Schema",
            "examples": [
                6.7777
            ]
        },
        "kurtosis": {
            "type": "number",
            "default": 0.0,
            "title": "The kurtosis Schema",
            "examples": [
                3.4455
            ]
        },
        "skewness": {
            "type": "number",
            "default": 0.0,
            "title": "The skewness Schema",
            "examples": [
                6.8888
            ]
        },
        "first_quartile": {
            "type": "number",
            "default": 0.0,
            "title": "The first_quartile Schema",
            "examples": [
                8.8888
            ]
        },
        "median": {
            "type": "number",
            "default": 0.0,
            "title": "The median Schema",
            "examples": [
                7.777
            ]
        },
        "third_quartile": {
            "type": "number",
            "default": 0.0,
            "title": "The third_quartile Schema",
            "examples": [
                8.8882
            ]
        },
        "percentile_90": {
            "type": "number",
            "default": 0.0,
            "title": "The percentile_90 Schema",
            "examples": [
                6.7776
            ]
        }
    },
    "examples": [{
        "n": 55,
        "missing": 55,
        "nnull": 55,
        "min": 1.234,
        "max": 1.2345,
        "range": 1.23456,
        "sum": 1.234569,
        "mean": 1.234567,
        "mean_abs": 1.234568,
        "population_stddev": 2.3456,
        "population_variance": 2.34567,
        "population_coeff_variation": 2.345678,
        "sample_stddev": 4.5556,
        "sample_variance": 6.7777,
        "kurtosis": 3.4455,
        "skewness": 6.8888,
        "first_quartile": 8.8888,
        "median": 7.777,
        "third_quartile": 8.8882,
        "percentile_90": 6.7776
    }]
}

From this, we see that having hardcoded percentile keys probably isn't what we want. What about percentile 99.5 (a common percentile in environmental legislation)

@wenzeslaus
Copy link
Member

Special naming of keys would be hard to use from another tool. It might be better to have a list, but since we need the value and the key, it would need a mapping...

I was actually thinking I used two lists in db.univar, but it turns out I used whatever was in the shell scriptstyle format, so list of keys which depends on the input. It depends on the input, so you can figure the keys out. I think that was my logic there. It is terrible for writing a schema and that's a valid criticism. I think we can change than in a way which would be backwards compatible and make it consistent across all percentile outputs at the same time if we decide to go different direction.

@echoix
Copy link
Member

echoix commented Jun 7, 2024

I was actually thinking I used two lists in db.univar, but it turns out I used whatever was in the shell scriptstyle format, so list of keys which depends on the input. It depends on the input, so you can figure the keys out. I think that was my logic there. It is terrible for writing a schema and that's a valid criticism. I think we can change than in a way which would be backwards compatible and make it consistent across all percentile outputs at the same time if we decide to go different direction.

Was the output of db.univar with the two lists released yet? If not, it's not a breaking change yet.

@wenzeslaus
Copy link
Member

Was the output of db.univar with the two lists released yet?

Yes, it was, in 8.3.0. 120f198

I think the situation now is: 1) We should do whatever is right here not to spread a bad design from one tool to another. 2) We might be able to add the better percentile output to v.db.univar depending on the keys we choose here.

So, let's decide the percentiles here.

I guess the following is lengthy, but easy to express in a schema:

{
"percentiles": [
    {"percentile": 95, "value": 200.03},
    {"percentile": 99.9, "value": 220.01}
]
}

@echoix
Copy link
Member

echoix commented Jun 11, 2024

Was the output of db.univar with the two lists released yet?

Yes, it was, in 8.3.0. 120f198

I think the situation now is: 1) We should do whatever is right here not to spread a bad design from one tool to another. 2) We might be able to add the better percentile output to v.db.univar depending on the keys we choose here.

So, let's decide the percentiles here.

I guess the following is lengthy, but easy to express in a schema:

{
"percentiles": [
    {"percentile": 95, "value": 200.03},
    {"percentile": 99.9, "value": 220.01}
]
}

It’s lengthy, yes, but I like the fact that a list is used here, as you can repeat the same percentile if needed, and it allows to keep the same order of percentiles requested in input.
I didn’t have an idea how to keep the ordering in my first example.

@wenzeslaus
Copy link
Member

I checked v.db.univar/db.univar again and I used percentiles as two lists, not a mapping (I must have looked at a wrong piece of code before) nested together with everything else under statistics. The they give:

{
  "statistics": {
    "percentiles": [95, 99.9],
    "percentile_values": [200.03, 220.01]
  }
}

I still prefer to do it right rather than the same as in v.db.univar. Is the right solution one list of mappings rather than two lists? It seems that it is easier to just say there is a list rather than saying there are two lists of the same length.

@wenzeslaus
Copy link
Member

What do you this @kritibirda26, does the list of dictionaries look good to you?

@kritibirda26
Copy link
Contributor Author

kritibirda26 commented Jun 11, 2024

Hi @wenzeslaus and @echoix, Sorry for the delay in the response. The dictionaries for percentiles make sense to me. I'll update the format.

@cwhite911
Copy link
Contributor

Hi @wenzeslaus and @echoix, Sorry for the delay in the response. The dictionaries for percentiles make sense to me. I'll update the format.

Are you referring to the schema suggested by @echoix?

{
"percentiles": [
    {"percentile": 95, "value": 200.03},
    {"percentile": 99.9, "value": 220.01}
]
}

This is the approach I think we should take.

Copy link
Contributor

@cwhite911 cwhite911 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good. Please update the percentiles, tests, and docs.

vector/v.univar/main.c Outdated Show resolved Hide resolved
vector/v.univar/v.univar.html Outdated Show resolved Hide resolved
@kritibirda26
Copy link
Contributor Author

Yes, on it.

Use parson to add json output format support to the v.univar module.
Copy link
Contributor

@cwhite911 cwhite911 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!

@kritibirda26
Copy link
Contributor Author

@cwhite911 Should I make similar changes to PR updating r.univar as well?

@kritibirda26
Copy link
Contributor Author

@wenzeslaus Hi! Can you also review this PR so that it can be merged?

@kritibirda26 kritibirda26 requested a review from cwhite911 July 2, 2024 12:05
@echoix echoix merged commit cc75269 into OSGeo:main Jul 3, 2024
26 checks passed
@echoix echoix added this to the 8.5.0 milestone Jul 4, 2024
@kritibirda26 kritibirda26 deleted the v.univar branch July 6, 2024 13:19
a0x8o pushed a commit to a0x8o/grass that referenced this pull request Jul 23, 2024
* Add JSON support to v.univar

Use parson to add json output format support to the v.univar module.

* update percentile format

* rename test file
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C Related code is in C docs HTML Related code is in HTML module Python Related code is in Python tests Related to Test Suite vector Related to vector data processing
Projects
Development

Successfully merging this pull request may close these issues.

4 participants