Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add ability to generate basic statistics for the individual datasets #217

Open
ymahlich opened this issue Oct 2, 2024 · 1 comment
Open
Assignees
Labels
enhancement New feature or request package

Comments

@ymahlich
Copy link
Collaborator

ymahlich commented Oct 2, 2024

Create a suite of scripts (maybe as part of the coderdata.utils) that can generate basic statistics on the datasets, specifically in regards to metrics that are vital to balancing train/test/validate splits.

Should output basic info as tables and plots.

@ymahlich ymahlich self-assigned this Oct 2, 2024
@ymahlich ymahlich added enhancement New feature or request package labels Oct 2, 2024
@ymahlich ymahlich moved this to In progress in CoderData Oct 2, 2024
@ymahlich
Copy link
Collaborator Author

ymahlich commented Oct 3, 2024

Two new functions have been implemented in a new module called utils.stats:

  • summarize_respones_metric
  • plot_response_metric

Both are helper scripts to perform operations on the experiments portion of a CoderData object.
For example uses a new Jupyter notebook has been added to /notebooks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request package
Projects
Status: In progress
Development

When branches are created from issues, their pull requests are automatically linked.

1 participant