-
Notifications
You must be signed in to change notification settings - Fork 119
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Stats Timing #986
Stats Timing #986
Conversation
Current implementation on commit d19392d builds upon existing code base of It takes a similar approach to
An example use case:
Concerns:Now I am unsure how to change the query from user_id to function_id/make a new db collection for it within timeseries; the metadata is right but the schema is not. I cannot access it on the notebook due to a keyerror: Output of the Find:
Questions:How can I update the location/not use the same as pipeline time or is it ok? Do I need extra steps for aggregate? Is this what you had envisioned? I believe the payload/schema should be different, but I'm not too sure in what form you'd suggest |
@TeachMeTW let us please add this to the project board so that I can see it |
I am not sure why we are focusing on instrumenting one function at a time using a decorator. I believe that the issue that @JGreenlee brought up was that we don't always want to instrument a whole function, but may want to parts of code that are part of a function. That's why I suggested that we use the It seems like you have created a new Maybe it would be easier if I could write down what I expect you to do in greater detail.
You do not need to edit the existing pipeline. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@TeachMeTW @JGreenlee please see my comments in e-mission/op-admin-dashboard#141 (review) for an example of how to use the newly created function.
To clarify, the suggestions below are intended to be guidelines to show you what I expect. I have not tested them, and I don't promise that they will work without any changes.
Co-authored-by: K. Shankari <[email protected]>
Co-authored-by: K. Shankari <[email protected]>
Co-authored-by: K. Shankari <[email protected]>
Co-authored-by: K. Shankari <[email protected]>
Co-authored-by: K. Shankari <[email protected]>
@TeachMeTW not sure if you consider these changes done. To clarify, the suggestions above are intended to be guidelines to show you what I expect. I have not tested them, and I don't promise that they will work without any changes. |
@shankari No not done yet; just added your suggestions -- I'm still missing the main part of the implementation. And sounds good I will move back to ready for review once done. |
Dashboard time now is in Stage_analysis_timeseries; not too sure how to get it to
I've updated the test and some code; see latest commit |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Dashboard time now is in Stage_analysis_timeseries; not too sure how to get it to stats/dashboard/time
I am not sure what that statement means; let's unpack it a little.
- Dashboard time should not be in the analysis_timeseries after this change. I do not see any references to the analysis_timeseries here. Have you cleared out the database from previous runs?
- I don't understand what
stats/dashboard/time
you are referring to. Can you clarify where you expect to see that?
I added the non_user_timeseries as part of #509 to support storing an aggregate random forest model across all the users in a program. We abandoned that approach, in favor of the GIS-based mode detection, so we currently don't generate and store any aggregate models (only user specific models). However, we may resurrect it when it comes to the behavior/demographic based models. So the question is: should we store the dashboard timing data into non_user_timeseries?
What we really need is non_user_raw and non_user_analysed timeseries. However, I will point out that we already store non-user (although not really aggregate) raw data into the timeseries. The disadvantage of this is that we cannot make any assertions that the timeseries data will always have a user_id So let's just store this in the timeseries for now; it will also require less plumbing from @TeachMeTW |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After the revelations in #986 (comment) we are not going to use non_user_timeseries_db
after all, instead we will store in timeseries_db
with user_id = None
.
So you can revert the changes listed here.
@shankari @JGreenlee Since we are no longer using
I also verified with:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are some also unaddressed changes from before.
Co-authored-by: Jack Greenlee <[email protected]>
Co-authored-by: Jack Greenlee <[email protected]>
Tests now run like this:
Lines cut from 400 -> 200 |
Simpler stats timing
@TeachMeTW After you have adjusted comments and docstrings, can you provide an overall summary of the changes in this PR? |
@shankari @JGreenlee Add function timing and error logging with unit tests
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, great work!
I think the amount of comments is still more verbose than I would have done, but I'll get off my soapbox and let @shankari state a preference (if any)
Yes your right it does seem a bit trivial on the comments side; looking back now it seems repetitive; i am open to remove some of them based on @shankari's preference |
I just merged this, but didn't squash before merging. Argh! |
@shankari for squash message
|
Add Functionality and Unit Tests for Function Timing and Error Handling
Overview
Introduces new functionality for tracking execution times and error handling in the dashboard through the
store_dashboard_time
andstore_dashboard_error
decorators. These functions store timing and error data for various operations. Additionally, unit tests are added to verify that execution times and errors are correctly logged.Changes
1. New Functionality in
stats_query.py
store_dashboard_time
:stats/dashboard_time
.Timer
object and stores it via thestore_stats_entry
method.store_dashboard_error
:stats/dashboard_error
.2. New in
TestStatsQuery
New Methods:
setUpClass
: Initializes the timeseries database before any tests are executed.tearDown
: Ensures that database entries related to the test are cleared after each run to maintain isolation.verify_stats_entries
: Verifies that expected entries (execution times or errors) are stored correctly in the database.Unit Tests:
test_single_function_timing
:test_multiple_functions_timing
:test_faulty_function_timing
:Impact