Stats Timing #986

TeachMeTW · 2024-10-02T20:10:45Z

Add Functionality and Unit Tests for Function Timing and Error Handling

Overview

Introduces new functionality for tracking execution times and error handling in the dashboard through the store_dashboard_time and store_dashboard_error decorators. These functions store timing and error data for various operations. Additionally, unit tests are added to verify that execution times and errors are correctly logged.

Changes

1. New Functionality in `stats_query.py`

store_dashboard_time:
- This function records the execution time of a code fragment or function and stores it in the database under the key stats/dashboard_time.
- It captures the timestamp and the elapsed time using a Timer object and stores it via the store_stats_entry method.
store_dashboard_error:
- This function captures and logs the time at which an error occurs during the execution of a code fragment or function.
- It stores the error data in the database under the key stats/dashboard_error.

2. New in `TestStatsQuery`

New Methods:
- setUpClass: Initializes the timeseries database before any tests are executed.
- tearDown: Ensures that database entries related to the test are cleared after each run to maintain isolation.
- verify_stats_entries: Verifies that expected entries (execution times or errors) are stored correctly in the database.
Unit Tests:
- test_single_function_timing:
  - Times the execution of a single function and stores the timing data in the database.
  - Verifies that the correct execution time is logged.
- test_multiple_functions_timing:
  - Times and verifies two separate functions and their combined execution time.
  - Checks that individual and combined timings are accurately logged.
- test_faulty_function_timing:
  - Simulates a faulty function that raises an exception.
  - Ensures that even when a function fails, the timing and error data are correctly logged as an error in the database.

Impact

This PR introduces essential functionality to log and track the performance and failures of code fragments or functions.
The added unit tests ensure that these features are working correctly and provide robust coverage for timing and error handling.

TeachMeTW · 2024-10-05T17:18:32Z

@shankari

Current implementation on commit d19392d builds upon existing code base of store_stats_entry where store_function_time calls store_stats_entry(user_id, "stats/function_time", stage_string, ts, reading)

It takes a similar approach to run_intake_pipeline from intake_stage.py but instead of user_id; it takes a function:

def run_function_pipeline_step(func, skip_if_no_new_data):
    """
    Execute a single step in the function pipeline.

    :param func: The function to execute
    :param skip_if_no_new_data: Flag to determine if the function should be skipped based on custom logic
    :return: None
    """
    func_name = func.__name__

    with ect.Timer() as timer:
        logging.info(f"********** Function {func_name}: Starting execution **********")
        print(f"{arrow.now()} ********** Function {func_name}: Starting execution **********")
        result = func()

    # Store the execution time
    esds.store_function_time(func_name, "EXECUTION",
                             time.time(), timer.elapsed)

    if skip_if_no_new_data and not result:
        print(f"No new data for {func_name}, and skip_if_no_new_data = {skip_if_no_new_data}, skipping the rest of the pipeline")
        return
    else:
        print(f"Function {func_name} executed with result = {result} and skip_if_no_new_data = {skip_if_no_new_data}, continuing")

    logging.info(f"********** Function {func_name}: Completed execution **********")

An example use case:

# Define test functions
def test_function_1():
    logging.info("Executing test_function_1")
    time.sleep(1)  # Simulate processing time
    return True  # Indicate successful execution

def test_function_2():
    logging.info("Executing test_function_2")
    time.sleep(1)
    return True

def test_function_faulty():
    logging.info("Executing test_function_faulty")
    time.sleep(1)
    raise ValueError("Simulated error in test_function_faulty")

def test_function_3():
    logging.info("Executing test_function_3")
    time.sleep(1)
    return True

if __name__ == "__main__":
    # Ensure the logs directory exists
    os.makedirs("logs", exist_ok=True)
    
    # Define the list of test functions, including the faulty one
    function_list = [
        test_function_1,
        test_function_2,
        test_function_faulty,  # This will raise an exception
        test_function_3  # This should execute normally after the faulty function
    ]
    
    # Run the pipeline with process number 1 and skip_if_no_new_data set to True
    run_function_pipeline(process_number=1, function_list=function_list, skip_if_no_new_data=True)

Concerns:

Now I am unsure how to change the query from user_id to function_id/make a new db collection for it within timeseries; the metadata is right but the schema is not. I cannot access it on the notebook due to a keyerror: KeyError: 'stats/function_time' but I can access it on the mongo via: db.Stage_timeseries.find({ "user_id": "test_function_1" }).pretty()

Output of the Find:

{
        "_id" : ObjectId("6700f706b37230e8be74d445"),
        "user_id" : "test_function_1",
        "metadata" : {
                "key" : "stats/function_time",
                "platform" : "server",
                "write_ts" : 1728116486.4357288,
                "time_zone" : "America/Los_Angeles",
                "write_local_dt" : {
                        "year" : 2024,
                        "month" : 10,
                        "day" : 5,
                        "hour" : 1,
                        "minute" : 21,
                        "second" : 26,
                        "weekday" : 5,
                        "timezone" : "America/Los_Angeles"
                },
                "write_fmt_time" : "2024-10-05T01:21:26.435729-07:00"
        },
        "data" : {
                "name" : "EXECUTION",
                "ts" : 1728116486.4328,
                "reading" : 1.001198583
        }
}

Questions:

How can I update the location/not use the same as pipeline time or is it ok? Do I need extra steps for aggregate? Is this what you had envisioned?

I believe the payload/schema should be different, but I'm not too sure in what form you'd suggest

shankari · 2024-10-09T14:21:42Z

@TeachMeTW let us please add this to the project board so that I can see it

shankari · 2024-10-12T04:10:39Z

@TeachMeTW

I am not sure why we are focusing on instrumenting one function at a time using a decorator. I believe that the issue that @JGreenlee brought up was that we don't always want to instrument a whole function, but may want to parts of code that are part of a function. That's why I suggested that we use the with... pattern and pointed to the pipeline as an example.

It seems like you have created a new time_functions.py module that is similar to the pipeline module? I am not quite sure what the new function is supposed to do. Are you suggesting that we should replace the existing pipeline function with this code?

Maybe it would be easier if I could write down what I expect you to do in greater detail.

You should create a new function called store_dashboard_time in emission/storage/decorations/stats_queries.py
You should use it to create stats/dashboard/time entries in the database
You should use that function in the dashboard code for timing using the with pattern

You do not need to edit the existing pipeline.
Let me see if a detailed code review makes this any easier.

shankari

@TeachMeTW @JGreenlee please see my comments in e-mission/op-admin-dashboard#141 (review) for an example of how to use the newly created function.

To clarify, the suggestions below are intended to be guidelines to show you what I expect. I have not tested them, and I don't promise that they will work without any changes.

conf/log/function_pipeline.conf

emission/core/wrapper/entry.py

emission/functions/time_functions.py

emission/pipeline/intake_stage.py

emission/storage/decorations/stats_queries.py

emission/storage/timeseries/builtin_timeseries.py

emission/tests/funcTests/TestFunctionTiming.py

emission/storage/decorations/stats_queries.py

Co-authored-by: K. Shankari <[email protected]>

shankari · 2024-10-13T16:26:34Z

@TeachMeTW not sure if you consider these changes done.
If you think that they are done, please move them back to "Ready for review"
Note also that the tests are failing so I don't consider them as actually done
Before you move to "Ready for review" please indicate testing done for the new functionality

To clarify, the suggestions above are intended to be guidelines to show you what I expect. I have not tested them, and I don't promise that they will work without any changes.

TeachMeTW · 2024-10-13T17:21:29Z

@shankari No not done yet; just added your suggestions -- I'm still missing the main part of the implementation. And sounds good I will move back to ready for review once done.

TeachMeTW · 2024-10-13T20:51:32Z

@shankari

Dashboard time now is in Stage_analysis_timeseries; not too sure how to get it to stats/dashboard/time

> db.Stage_analysis_timeseries.find({ "metadata.key": "stats/dashboard_time" }).pretty()
{
	"_id" : ObjectId("670c31a0ff3d9aa6f3653e41"),
	"user_id" : null,
	"metadata" : {
		"key" : "stats/dashboard_time",
		"platform" : "server",
		"write_ts" : 1728852384.882551,
		"time_zone" : "America/Los_Angeles",
		"write_local_dt" : {
			"year" : 2024,
			"month" : 10,
			"day" : 13,
			"hour" : 13,
			"minute" : 46,
			"second" : 24,
			"weekday" : 6,
			"timezone" : "America/Los_Angeles"
		},
		"write_fmt_time" : "2024-10-13T13:46:24.882551-07:00"
	},
	"data" : {
		"name" : "test_function_1",
		"ts" : 1728852383.862375,
		"reading" : 1005.1576660000001
	}
}
{
	"_id" : ObjectId("670c31a3ff3d9aa6f3653e43"),
	"user_id" : null,
	"metadata" : {
		"key" : "stats/dashboard_time",
		"platform" : "server",
		"write_ts" : 1728852387.351635,
		"time_zone" : "America/Los_Angeles",
		"write_local_dt" : {
			"year" : 2024,
			"month" : 10,
			"day" : 13,
			"hour" : 13,
			"minute" : 46,
			"second" : 27,
			"weekday" : 6,
			"timezone" : "America/Los_Angeles"
		},
		"write_fmt_time" : "2024-10-13T13:46:27.351635-07:00"
	},
	"data" : {
		"name" : "test_function_2",
		"ts" : 1728852385.34658,
		"reading" : 2004.730167
	}
}
{
	"_id" : ObjectId("670c31a6ff3d9aa6f3653e44"),
	"user_id" : null,
	"metadata" : {
		"key" : "stats/dashboard_time",
		"platform" : "server",
		"write_ts" : 1728852390.402445,
		"time_zone" : "America/Los_Angeles",
		"write_local_dt" : {
			"year" : 2024,
			"month" : 10,
			"day" : 13,
			"hour" : 13,
			"minute" : 46,
			"second" : 30,
			"weekday" : 6,
			"timezone" : "America/Los_Angeles"
		},
		"write_fmt_time" : "2024-10-13T13:46:30.402445-07:00"
	},
	"data" : {
		"name" : "test_function_3",
		"ts" : 1728852387.399441,
		"reading" : 3002.8265420000002
	}
}

I've updated the test and some code; see latest commit

shankari

Dashboard time now is in Stage_analysis_timeseries; not too sure how to get it to stats/dashboard/time

I am not sure what that statement means; let's unpack it a little.

Dashboard time should not be in the analysis_timeseries after this change. I do not see any references to the analysis_timeseries here. Have you cleared out the database from previous runs?
I don't understand what stats/dashboard/time you are referring to. Can you clarify where you expect to see that?

emission/storage/decorations/stats_queries.py

emission/storage/timeseries/builtin_timeseries.py

emission/storage/timeseries/non_user_timeseries.py

emission/tests/funcTests/TestFunctionTiming.py

shankari · 2024-10-14T20:07:01Z

I added the non_user_timeseries as part of #509 to support storing an aggregate random forest model across all the users in a program. We abandoned that approach, in favor of the GIS-based mode detection, so we currently don't generate and store any aggregate models (only user specific models). However, we may resurrect it when it comes to the behavior/demographic based models.

So the question is: should we store the dashboard timing data into non_user_timeseries?

Pro: it is not associated with a user
Con: we will be storing analysed data in the non_user_timeseries, and that is what it was designed for, so we will end up with a mixture of raw and analysed data in the non_user_timeseries.

What we really need is non_user_raw and non_user_analysed timeseries.

However, I will point out that we already store non-user (although not really aggregate) raw data into the timeseries.
Timing data from the API for calls through the webservice are stored with user_id = None
So at least for now, we can be consistent with this, and just store the dashboard timing stats with user_id = None

The disadvantage of this is that we cannot make any assertions that the timeseries data will always have a user_id
So for clarity, it may make sense to pull out the non user data into a separate collection.
But we can do that later in a cleanup phase and do it consistently for both the API calls and these timing calls.

So let's just store this in the timeseries for now; it will also require less plumbing from @TeachMeTW

shankari

After the revelations in #986 (comment) we are not going to use non_user_timeseries_db after all, instead we will store in timeseries_db with user_id = None.

So you can revert the changes listed here.

emission/storage/decorations/stats_queries.py

emission/storage/timeseries/non_user_timeseries.py

emission/tests/funcTests/TestFunctionTiming.py

TeachMeTW · 2024-10-14T20:48:04Z

@shankari @JGreenlee Since we are no longer using non_user_timeseries, I revered those changes and fixed the code that was requested. I also modified the logging and added verification. It is now stored in Stage_timeseries. Am I in the right direction?

> db.Stage_timeseries.find({ "metadata.key": "stats/dashboard_time" }).pretty()
{
	"_id" : ObjectId("670d8267fb380e467d11478f"),
	"user_id" : null,
	"metadata" : {
		"key" : "stats/dashboard_time",
		"platform" : "server",
		"write_ts" : 1728938599.210308,
		"time_zone" : "America/Los_Angeles",
		"write_local_dt" : {
			"year" : 2024,
			"month" : 10,
			"day" : 14,
			"hour" : 13,
			"minute" : 43,
			"second" : 19,
			"weekday" : 0,
			"timezone" : "America/Los_Angeles"
		},
		"write_fmt_time" : "2024-10-14T13:43:19.210308-07:00"
	},
	"data" : {
		"name" : "test_function_1",
		"ts" : 1728938598.1843588,
		"reading" : 1005.0427499999998
	}
}

I also verified with:

total_documents = gdb.get_timeseries_db().count_documents({
    "metadata.key": "stats/dashboard_time"
})
print(f"Total documents in Stage_timeseries with metadata.key 'stats/dashboard_time': {total_documents}")

shankari

There are some also unaddressed changes from before.

emission/storage/timeseries/non_user_timeseries.py

emission/tests/funcTests/TestFunctionTiming.py

Co-authored-by: Jack Greenlee <[email protected]>

TeachMeTW · 2024-10-16T22:42:41Z

Tests now run like this:

########### Starting test_code_block_instrumentation ###########
Starting timing for 'sample_code_block'.
Executing sample_code_block with simulated operations
Step 1: Data retrieval from mock service
Data retrieved: {'user_id': 123, 'action': 'login'}
Step 2: Data processing
Processed data: {'user_id': '123', 'action': 'LOGIN'}
Step 3: Storing processed data
Processed data stored successfully
'sample_code_block' executed successfully in 2014.81 ms.
Verifying entry for 'sample_code_block' under key 'stats/dashboard_time'.
Stored Document for 'sample_code_block': reading=2014.8075409999997 ms, name=sample_code_block
'sample_code_block': Expected elapsed time = 2014.81 ms, Stored reading = 2014.8075409999997 ms.
########### Finished test_code_block_instrumentation ###########

Cleaned up database entries after test based on metadata keys.
.
########### Starting test_faulty_function_timing ###########
Starting timing for 'faulty_function'.
Executing faulty_function
'faulty_function' failed after 1001.40 ms with error: Simulated error in faulty_function
Verifying entry for 'faulty_function' under key 'stats/dashboard_error'.
Stored Document for 'faulty_function': reading=1001.4026669999998 ms, name=faulty_function
'faulty_function': Expected elapsed time = 1001.40 ms, Stored reading = 1001.4026669999998 ms.
########### Finished test_faulty_function_timing ###########

Cleaned up database entries after test based on metadata keys.
.
########### Starting test_multiple_functions_timing ###########
Starting timing for 'function_one'.
Executing function_one
'function_one' executed successfully in 1004.34 ms.
Verifying entry for 'function_one' under key 'stats/dashboard_time'.
Stored Document for 'function_one': reading=1004.3434590000002 ms, name=function_one
'function_one': Expected elapsed time = 1004.34 ms, Stored reading = 1004.3434590000002 ms.
Starting timing for 'function_two'.
Executing function_two
'function_two' executed successfully in 1500.24 ms.
Verifying entry for 'function_two' under key 'stats/dashboard_time'.
Stored Document for 'function_two': reading=1500.2447080000004 ms, name=function_two
'function_two': Expected elapsed time = 1500.24 ms, Stored reading = 1500.2447080000004 ms.
########### Finished test_multiple_functions_timing ###########

Cleaned up database entries after test based on metadata keys.
.
########### Starting test_single_function_timing ###########
Starting timing for 'sample_function'.
Executing sample_function
'sample_function' executed successfully in 2005.10 ms.
Verifying entry for 'sample_function' under key 'stats/dashboard_time'.
Stored Document for 'sample_function': reading=2005.0953330000007 ms, name=sample_function
'sample_function': Expected elapsed time = 2005.10 ms, Stored reading = 2005.0953330000007 ms.
########### Finished test_single_function_timing ###########

Cleaned up database entries after test based on metadata keys.
.
----------------------------------------------------------------------
Ran 4 tests in 7.690s

OK

Lines cut from 400 -> 200

JGreenlee · 2024-10-17T14:52:55Z

TeachMeTW@a24b03b

Simpler stats timing

JGreenlee · 2024-10-17T19:50:20Z

@TeachMeTW After you have adjusted comments and docstrings, can you provide an overall summary of the changes in this PR?
Since this PR is 40+ commits I fully anticipate that we will squash merge. Shankari can then copy your summary into the commit message of the squash so that we have a clean and meaningful git history.

TeachMeTW · 2024-10-17T21:06:05Z

@shankari @JGreenlee
Modified the PR description and heres an overview:

Add function timing and error logging with unit tests

Implemented store_dashboard_time and store_dashboard_error functions to log execution times and errors in the database.
Added unit tests in TestStatsQuery to verify correct timing and error logging for individual, multiple, and faulty function executions.

JGreenlee

LGTM, great work!

I think the amount of comments is still more verbose than I would have done, but I'll get off my soapbox and let @shankari state a preference (if any)

TeachMeTW · 2024-10-17T21:48:55Z

Yes your right it does seem a bit trivial on the comments side; looking back now it seems repetitive; i am open to remove some of them based on @shankari's preference

shankari · 2024-10-18T18:27:41Z

I just merged this, but didn't squash before merging. Argh!
Trying to fix this now by modifying the git history (since I don't want to have commit messages with resolve

JGreenlee · 2024-10-18T18:33:09Z

@shankari for squash message

Add function timing and error logging with unit tests

Implemented store_dashboard_time and store_dashboard_error functions to log execution times and errors in the database.

Added unit tests in TestStatsQuery to verify correct timing and error logging for individual, multiple, and faulty function executions.

Typo Fix

acf7f3e

TeachMeTW mentioned this pull request Oct 2, 2024

Add Logging Statements for Performance Analysis e-mission/op-admin-dashboard#137

Closed

TeachMeTW added 2 commits October 4, 2024 11:57

Added Func Time Collection?

553a45d

Fixed per feedback, removed decorator

d19392d

TeachMeTW marked this pull request as ready for review October 9, 2024 15:36

shankari requested changes Oct 12, 2024

View reviewed changes

shankari mentioned this pull request Oct 12, 2024

Performance Improvements in Home Page Processing e-mission/op-admin-dashboard#126

Closed

TeachMeTW and others added 6 commits October 12, 2024 13:44

Update emission/storage/decorations/stats_queries.py

05bac58

Co-authored-by: K. Shankari <[email protected]>

Update emission/storage/decorations/stats_queries.py

e23d4b6

Co-authored-by: K. Shankari <[email protected]>

Update emission/storage/decorations/stats_queries.py

7fc3f81

Co-authored-by: K. Shankari <[email protected]>

Update emission/storage/timeseries/builtin_timeseries.py

e7e50da

Co-authored-by: K. Shankari <[email protected]>

Update emission/storage/decorations/stats_queries.py

dd253dd

Co-authored-by: K. Shankari <[email protected]>

Review Changes

8957a06

Changes

f428069

shankari requested changes Oct 14, 2024

View reviewed changes

TeachMeTW added 3 commits October 14, 2024 13:16

Reverted some

4f636a9

Added verification

b2e3cb5

resolve

a1ff54e

TeachMeTW added 3 commits October 14, 2024 13:53

Fixed whitespace commits

12d5e86

Removed unused imports

cf5b5a1

Removed unused imports

2b5cefd

shankari requested changes Oct 16, 2024

View reviewed changes

emission/storage/timeseries/non_user_timeseries.py Outdated Show resolved Hide resolved

TeachMeTW added 2 commits October 15, 2024 22:15

Reverted Non_user

4c7b80b

Added more tests

e843169