Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New command: statistics #41

Open
jpmckinney opened this issue May 5, 2023 · 3 comments
Open

New command: statistics #41

jpmckinney opened this issue May 5, 2023 · 3 comments
Labels
enhancement New feature or request
Milestone

Comments

@jpmckinney
Copy link
Member

It can be useful to report out some order statistics and distributions that are relevant to indicators. For example:

  • Distribution of procurementMethod codes: so that the user can evaluate if the distribution of open, selective, limited, direct conforms to their knowledge of the procurement market
    • From user research: "A methodology should also come with clear risk warnings for instance the use of certain fields. Are there some fields that we know are problematic when it comes to bias in the data? (e.g. Could there be a bias toward using 'selective' instead of 'limited' in procurementMethod?)"

We might also consider reporting:

  • Some priority quality issues (e.g. incoherent dates).
  • Outliers. If there is demand, we can also change indicators command to ignore outliers.
  • Order statistics (possibly per procurement method) to assist the user in setting threshold values
@jpmckinney
Copy link
Member Author

jpmckinney commented May 5, 2023

Edit: Integrated this comment into #55


Thinking through how users will edit the configuration file while preparing the data:

  1. Edit configuration file with known fixes.
  2. Run prepare command.
  3. See remaining warnings in the output.
  4. Repeat 1-3 until acceptable number of warnings.

So, some (or all) of this command should be part of the prepare command.

@jpmckinney jpmckinney added robustness documentation Improvements or additions to documentation and removed robustness labels May 18, 2023
@jpmckinney
Copy link
Member Author

Having worked on and thought through prepare some more, we want to keep it focused on reporting things that the user can correct via configuration.

So for this issue, we can solve part of it via documentation (e.g. the example from the user research). Otherwise, it's not yet clear whether these statistics should be done in the library, in supporting notebooks, in BI, etc. So, setting as Post-MVP.

@jpmckinney jpmckinney added this to the Post-MVP milestone May 18, 2023
@jpmckinney jpmckinney removed the documentation Improvements or additions to documentation label May 27, 2023
@jpmckinney
Copy link
Member Author

Partly documented in general workflow, so removing that label. Can add more later.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant