Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stats Timing #986

Merged
merged 42 commits into from
Oct 18, 2024
Merged

Stats Timing #986

merged 42 commits into from
Oct 18, 2024

Conversation

TeachMeTW
Copy link
Contributor

@TeachMeTW TeachMeTW commented Oct 2, 2024

Add Functionality and Unit Tests for Function Timing and Error Handling

Overview

Introduces new functionality for tracking execution times and error handling in the dashboard through the store_dashboard_time and store_dashboard_error decorators. These functions store timing and error data for various operations. Additionally, unit tests are added to verify that execution times and errors are correctly logged.

Changes

1. New Functionality in stats_query.py

  • store_dashboard_time:

    • This function records the execution time of a code fragment or function and stores it in the database under the key stats/dashboard_time.
    • It captures the timestamp and the elapsed time using a Timer object and stores it via the store_stats_entry method.
  • store_dashboard_error:

    • This function captures and logs the time at which an error occurs during the execution of a code fragment or function.
    • It stores the error data in the database under the key stats/dashboard_error.

2. New in TestStatsQuery

  • New Methods:

    • setUpClass: Initializes the timeseries database before any tests are executed.
    • tearDown: Ensures that database entries related to the test are cleared after each run to maintain isolation.
    • verify_stats_entries: Verifies that expected entries (execution times or errors) are stored correctly in the database.
  • Unit Tests:

    • test_single_function_timing:

      • Times the execution of a single function and stores the timing data in the database.
      • Verifies that the correct execution time is logged.
    • test_multiple_functions_timing:

      • Times and verifies two separate functions and their combined execution time.
      • Checks that individual and combined timings are accurately logged.
    • test_faulty_function_timing:

      • Simulates a faulty function that raises an exception.
      • Ensures that even when a function fails, the timing and error data are correctly logged as an error in the database.

Impact

  • This PR introduces essential functionality to log and track the performance and failures of code fragments or functions.
  • The added unit tests ensure that these features are working correctly and provide robust coverage for timing and error handling.

@TeachMeTW
Copy link
Contributor Author

TeachMeTW commented Oct 5, 2024

@shankari

Current implementation on commit d19392d builds upon existing code base of store_stats_entry where store_function_time calls store_stats_entry(user_id, "stats/function_time", stage_string, ts, reading)

It takes a similar approach to run_intake_pipeline from intake_stage.py but instead of user_id; it takes a function:

def run_function_pipeline_step(func, skip_if_no_new_data):
    """
    Execute a single step in the function pipeline.

    :param func: The function to execute
    :param skip_if_no_new_data: Flag to determine if the function should be skipped based on custom logic
    :return: None
    """
    func_name = func.__name__

    with ect.Timer() as timer:
        logging.info(f"********** Function {func_name}: Starting execution **********")
        print(f"{arrow.now()} ********** Function {func_name}: Starting execution **********")
        result = func()

    # Store the execution time
    esds.store_function_time(func_name, "EXECUTION",
                             time.time(), timer.elapsed)

    if skip_if_no_new_data and not result:
        print(f"No new data for {func_name}, and skip_if_no_new_data = {skip_if_no_new_data}, skipping the rest of the pipeline")
        return
    else:
        print(f"Function {func_name} executed with result = {result} and skip_if_no_new_data = {skip_if_no_new_data}, continuing")

    logging.info(f"********** Function {func_name}: Completed execution **********")

An example use case:

# Define test functions
def test_function_1():
    logging.info("Executing test_function_1")
    time.sleep(1)  # Simulate processing time
    return True  # Indicate successful execution

def test_function_2():
    logging.info("Executing test_function_2")
    time.sleep(1)
    return True

def test_function_faulty():
    logging.info("Executing test_function_faulty")
    time.sleep(1)
    raise ValueError("Simulated error in test_function_faulty")

def test_function_3():
    logging.info("Executing test_function_3")
    time.sleep(1)
    return True

if __name__ == "__main__":
    # Ensure the logs directory exists
    os.makedirs("logs", exist_ok=True)
    
    # Define the list of test functions, including the faulty one
    function_list = [
        test_function_1,
        test_function_2,
        test_function_faulty,  # This will raise an exception
        test_function_3  # This should execute normally after the faulty function
    ]
    
    # Run the pipeline with process number 1 and skip_if_no_new_data set to True
    run_function_pipeline(process_number=1, function_list=function_list, skip_if_no_new_data=True)

Concerns:

Now I am unsure how to change the query from user_id to function_id/make a new db collection for it within timeseries; the metadata is right but the schema is not. I cannot access it on the notebook due to a keyerror: KeyError: 'stats/function_time' but I can access it on the mongo via: db.Stage_timeseries.find({ "user_id": "test_function_1" }).pretty()

Output of the Find:

{
        "_id" : ObjectId("6700f706b37230e8be74d445"),
        "user_id" : "test_function_1",
        "metadata" : {
                "key" : "stats/function_time",
                "platform" : "server",
                "write_ts" : 1728116486.4357288,
                "time_zone" : "America/Los_Angeles",
                "write_local_dt" : {
                        "year" : 2024,
                        "month" : 10,
                        "day" : 5,
                        "hour" : 1,
                        "minute" : 21,
                        "second" : 26,
                        "weekday" : 5,
                        "timezone" : "America/Los_Angeles"
                },
                "write_fmt_time" : "2024-10-05T01:21:26.435729-07:00"
        },
        "data" : {
                "name" : "EXECUTION",
                "ts" : 1728116486.4328,
                "reading" : 1.001198583
        }
}

Questions:

How can I update the location/not use the same as pipeline time or is it ok? Do I need extra steps for aggregate? Is this what you had envisioned?

I believe the payload/schema should be different, but I'm not too sure in what form you'd suggest

@shankari
Copy link
Contributor

shankari commented Oct 9, 2024

@TeachMeTW let us please add this to the project board so that I can see it

@TeachMeTW TeachMeTW marked this pull request as ready for review October 9, 2024 15:36
@shankari
Copy link
Contributor

@TeachMeTW

I am not sure why we are focusing on instrumenting one function at a time using a decorator. I believe that the issue that @JGreenlee brought up was that we don't always want to instrument a whole function, but may want to parts of code that are part of a function. That's why I suggested that we use the with... pattern and pointed to the pipeline as an example.

It seems like you have created a new time_functions.py module that is similar to the pipeline module? I am not quite sure what the new function is supposed to do. Are you suggesting that we should replace the existing pipeline function with this code?

Maybe it would be easier if I could write down what I expect you to do in greater detail.

  • You should create a new function called store_dashboard_time in emission/storage/decorations/stats_queries.py
  • You should use it to create stats/dashboard/time entries in the database
  • You should use that function in the dashboard code for timing using the with pattern

You do not need to edit the existing pipeline.
Let me see if a detailed code review makes this any easier.

Copy link
Contributor

@shankari shankari left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@TeachMeTW @JGreenlee please see my comments in e-mission/op-admin-dashboard#141 (review) for an example of how to use the newly created function.

To clarify, the suggestions below are intended to be guidelines to show you what I expect. I have not tested them, and I don't promise that they will work without any changes.

conf/log/function_pipeline.conf Outdated Show resolved Hide resolved
emission/core/wrapper/entry.py Outdated Show resolved Hide resolved
emission/functions/time_functions.py Outdated Show resolved Hide resolved
emission/pipeline/intake_stage.py Outdated Show resolved Hide resolved
emission/storage/decorations/stats_queries.py Outdated Show resolved Hide resolved
emission/storage/timeseries/builtin_timeseries.py Outdated Show resolved Hide resolved
emission/tests/funcTests/TestFunctionTiming.py Outdated Show resolved Hide resolved
emission/storage/decorations/stats_queries.py Outdated Show resolved Hide resolved
emission/storage/decorations/stats_queries.py Show resolved Hide resolved
emission/storage/decorations/stats_queries.py Outdated Show resolved Hide resolved
@shankari
Copy link
Contributor

shankari commented Oct 13, 2024

@TeachMeTW not sure if you consider these changes done.
If you think that they are done, please move them back to "Ready for review"
Note also that the tests are failing so I don't consider them as actually done
Before you move to "Ready for review" please indicate testing done for the new functionality

To clarify, the suggestions above are intended to be guidelines to show you what I expect. I have not tested them, and I don't promise that they will work without any changes.

@TeachMeTW
Copy link
Contributor Author

TeachMeTW commented Oct 13, 2024

@shankari No not done yet; just added your suggestions -- I'm still missing the main part of the implementation. And sounds good I will move back to ready for review once done.

@TeachMeTW
Copy link
Contributor Author

@shankari

Dashboard time now is in Stage_analysis_timeseries; not too sure how to get it to stats/dashboard/time

> db.Stage_analysis_timeseries.find({ "metadata.key": "stats/dashboard_time" }).pretty()
{
	"_id" : ObjectId("670c31a0ff3d9aa6f3653e41"),
	"user_id" : null,
	"metadata" : {
		"key" : "stats/dashboard_time",
		"platform" : "server",
		"write_ts" : 1728852384.882551,
		"time_zone" : "America/Los_Angeles",
		"write_local_dt" : {
			"year" : 2024,
			"month" : 10,
			"day" : 13,
			"hour" : 13,
			"minute" : 46,
			"second" : 24,
			"weekday" : 6,
			"timezone" : "America/Los_Angeles"
		},
		"write_fmt_time" : "2024-10-13T13:46:24.882551-07:00"
	},
	"data" : {
		"name" : "test_function_1",
		"ts" : 1728852383.862375,
		"reading" : 1005.1576660000001
	}
}
{
	"_id" : ObjectId("670c31a3ff3d9aa6f3653e43"),
	"user_id" : null,
	"metadata" : {
		"key" : "stats/dashboard_time",
		"platform" : "server",
		"write_ts" : 1728852387.351635,
		"time_zone" : "America/Los_Angeles",
		"write_local_dt" : {
			"year" : 2024,
			"month" : 10,
			"day" : 13,
			"hour" : 13,
			"minute" : 46,
			"second" : 27,
			"weekday" : 6,
			"timezone" : "America/Los_Angeles"
		},
		"write_fmt_time" : "2024-10-13T13:46:27.351635-07:00"
	},
	"data" : {
		"name" : "test_function_2",
		"ts" : 1728852385.34658,
		"reading" : 2004.730167
	}
}
{
	"_id" : ObjectId("670c31a6ff3d9aa6f3653e44"),
	"user_id" : null,
	"metadata" : {
		"key" : "stats/dashboard_time",
		"platform" : "server",
		"write_ts" : 1728852390.402445,
		"time_zone" : "America/Los_Angeles",
		"write_local_dt" : {
			"year" : 2024,
			"month" : 10,
			"day" : 13,
			"hour" : 13,
			"minute" : 46,
			"second" : 30,
			"weekday" : 6,
			"timezone" : "America/Los_Angeles"
		},
		"write_fmt_time" : "2024-10-13T13:46:30.402445-07:00"
	},
	"data" : {
		"name" : "test_function_3",
		"ts" : 1728852387.399441,
		"reading" : 3002.8265420000002
	}
}

I've updated the test and some code; see latest commit

Copy link
Contributor

@shankari shankari left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dashboard time now is in Stage_analysis_timeseries; not too sure how to get it to stats/dashboard/time

I am not sure what that statement means; let's unpack it a little.

  1. Dashboard time should not be in the analysis_timeseries after this change. I do not see any references to the analysis_timeseries here. Have you cleared out the database from previous runs?
  2. I don't understand what stats/dashboard/time you are referring to. Can you clarify where you expect to see that?

emission/storage/decorations/stats_queries.py Outdated Show resolved Hide resolved
emission/storage/timeseries/non_user_timeseries.py Outdated Show resolved Hide resolved
emission/storage/timeseries/non_user_timeseries.py Outdated Show resolved Hide resolved
emission/storage/timeseries/non_user_timeseries.py Outdated Show resolved Hide resolved
emission/storage/timeseries/non_user_timeseries.py Outdated Show resolved Hide resolved
emission/tests/funcTests/TestFunctionTiming.py Outdated Show resolved Hide resolved
emission/tests/funcTests/TestFunctionTiming.py Outdated Show resolved Hide resolved
emission/tests/funcTests/TestFunctionTiming.py Outdated Show resolved Hide resolved
@shankari
Copy link
Contributor

I added the non_user_timeseries as part of #509 to support storing an aggregate random forest model across all the users in a program. We abandoned that approach, in favor of the GIS-based mode detection, so we currently don't generate and store any aggregate models (only user specific models). However, we may resurrect it when it comes to the behavior/demographic based models.

So the question is: should we store the dashboard timing data into non_user_timeseries?

  • Pro: it is not associated with a user
  • Con: we will be storing analysed data in the non_user_timeseries, and that is what it was designed for, so we will end up with a mixture of raw and analysed data in the non_user_timeseries.

What we really need is non_user_raw and non_user_analysed timeseries.

However, I will point out that we already store non-user (although not really aggregate) raw data into the timeseries.
Timing data from the API for calls through the webservice are stored with user_id = None
So at least for now, we can be consistent with this, and just store the dashboard timing stats with user_id = None

The disadvantage of this is that we cannot make any assertions that the timeseries data will always have a user_id
So for clarity, it may make sense to pull out the non user data into a separate collection.
But we can do that later in a cleanup phase and do it consistently for both the API calls and these timing calls.

So let's just store this in the timeseries for now; it will also require less plumbing from @TeachMeTW

Copy link
Contributor

@shankari shankari left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After the revelations in #986 (comment) we are not going to use non_user_timeseries_db after all, instead we will store in timeseries_db with user_id = None.

So you can revert the changes listed here.

emission/storage/decorations/stats_queries.py Outdated Show resolved Hide resolved
emission/storage/timeseries/non_user_timeseries.py Outdated Show resolved Hide resolved
emission/storage/timeseries/non_user_timeseries.py Outdated Show resolved Hide resolved
emission/tests/funcTests/TestFunctionTiming.py Outdated Show resolved Hide resolved
@TeachMeTW
Copy link
Contributor Author

@shankari @JGreenlee Since we are no longer using non_user_timeseries, I revered those changes and fixed the code that was requested. I also modified the logging and added verification. It is now stored in Stage_timeseries. Am I in the right direction?

> db.Stage_timeseries.find({ "metadata.key": "stats/dashboard_time" }).pretty()
{
	"_id" : ObjectId("670d8267fb380e467d11478f"),
	"user_id" : null,
	"metadata" : {
		"key" : "stats/dashboard_time",
		"platform" : "server",
		"write_ts" : 1728938599.210308,
		"time_zone" : "America/Los_Angeles",
		"write_local_dt" : {
			"year" : 2024,
			"month" : 10,
			"day" : 14,
			"hour" : 13,
			"minute" : 43,
			"second" : 19,
			"weekday" : 0,
			"timezone" : "America/Los_Angeles"
		},
		"write_fmt_time" : "2024-10-14T13:43:19.210308-07:00"
	},
	"data" : {
		"name" : "test_function_1",
		"ts" : 1728938598.1843588,
		"reading" : 1005.0427499999998
	}
}

I also verified with:

total_documents = gdb.get_timeseries_db().count_documents({
    "metadata.key": "stats/dashboard_time"
})
print(f"Total documents in Stage_timeseries with metadata.key 'stats/dashboard_time': {total_documents}")
image image

Copy link
Contributor

@shankari shankari left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are some also unaddressed changes from before.

emission/storage/timeseries/non_user_timeseries.py Outdated Show resolved Hide resolved
@TeachMeTW
Copy link
Contributor Author

Tests now run like this:

########### Starting test_code_block_instrumentation ###########
Starting timing for 'sample_code_block'.
Executing sample_code_block with simulated operations
Step 1: Data retrieval from mock service
Data retrieved: {'user_id': 123, 'action': 'login'}
Step 2: Data processing
Processed data: {'user_id': '123', 'action': 'LOGIN'}
Step 3: Storing processed data
Processed data stored successfully
'sample_code_block' executed successfully in 2014.81 ms.
Verifying entry for 'sample_code_block' under key 'stats/dashboard_time'.
Stored Document for 'sample_code_block': reading=2014.8075409999997 ms, name=sample_code_block
'sample_code_block': Expected elapsed time = 2014.81 ms, Stored reading = 2014.8075409999997 ms.
########### Finished test_code_block_instrumentation ###########

Cleaned up database entries after test based on metadata keys.
.
########### Starting test_faulty_function_timing ###########
Starting timing for 'faulty_function'.
Executing faulty_function
'faulty_function' failed after 1001.40 ms with error: Simulated error in faulty_function
Verifying entry for 'faulty_function' under key 'stats/dashboard_error'.
Stored Document for 'faulty_function': reading=1001.4026669999998 ms, name=faulty_function
'faulty_function': Expected elapsed time = 1001.40 ms, Stored reading = 1001.4026669999998 ms.
########### Finished test_faulty_function_timing ###########

Cleaned up database entries after test based on metadata keys.
.
########### Starting test_multiple_functions_timing ###########
Starting timing for 'function_one'.
Executing function_one
'function_one' executed successfully in 1004.34 ms.
Verifying entry for 'function_one' under key 'stats/dashboard_time'.
Stored Document for 'function_one': reading=1004.3434590000002 ms, name=function_one
'function_one': Expected elapsed time = 1004.34 ms, Stored reading = 1004.3434590000002 ms.
Starting timing for 'function_two'.
Executing function_two
'function_two' executed successfully in 1500.24 ms.
Verifying entry for 'function_two' under key 'stats/dashboard_time'.
Stored Document for 'function_two': reading=1500.2447080000004 ms, name=function_two
'function_two': Expected elapsed time = 1500.24 ms, Stored reading = 1500.2447080000004 ms.
########### Finished test_multiple_functions_timing ###########

Cleaned up database entries after test based on metadata keys.
.
########### Starting test_single_function_timing ###########
Starting timing for 'sample_function'.
Executing sample_function
'sample_function' executed successfully in 2005.10 ms.
Verifying entry for 'sample_function' under key 'stats/dashboard_time'.
Stored Document for 'sample_function': reading=2005.0953330000007 ms, name=sample_function
'sample_function': Expected elapsed time = 2005.10 ms, Stored reading = 2005.0953330000007 ms.
########### Finished test_single_function_timing ###########

Cleaned up database entries after test based on metadata keys.
.
----------------------------------------------------------------------
Ran 4 tests in 7.690s

OK

Lines cut from 400 -> 200

@JGreenlee
Copy link
Contributor

TeachMeTW@a24b03b

@JGreenlee
Copy link
Contributor

JGreenlee commented Oct 17, 2024

@TeachMeTW After you have adjusted comments and docstrings, can you provide an overall summary of the changes in this PR?
Since this PR is 40+ commits I fully anticipate that we will squash merge. Shankari can then copy your summary into the commit message of the squash so that we have a clean and meaningful git history.

@TeachMeTW
Copy link
Contributor Author

@shankari @JGreenlee
Modified the PR description and heres an overview:

Add function timing and error logging with unit tests

  • Implemented store_dashboard_time and store_dashboard_error functions to log execution times and errors in the database.
  • Added unit tests in TestStatsQuery to verify correct timing and error logging for individual, multiple, and faulty function executions.

Copy link
Contributor

@JGreenlee JGreenlee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, great work!

I think the amount of comments is still more verbose than I would have done, but I'll get off my soapbox and let @shankari state a preference (if any)

@TeachMeTW
Copy link
Contributor Author

TeachMeTW commented Oct 17, 2024

Yes your right it does seem a bit trivial on the comments side; looking back now it seems repetitive; i am open to remove some of them based on @shankari's preference

@shankari shankari merged commit 1551a30 into e-mission:master Oct 18, 2024
5 checks passed
@shankari
Copy link
Contributor

I just merged this, but didn't squash before merging. Argh!
Trying to fix this now by modifying the git history (since I don't want to have commit messages with resolve

@JGreenlee
Copy link
Contributor

@shankari for squash message

Add function timing and error logging with unit tests

  • Implemented store_dashboard_time and store_dashboard_error functions to log execution times and errors in the database.
  • Added unit tests in TestStatsQuery to verify correct timing and error logging for individual, multiple, and faulty function executions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Tasks completed
Development

Successfully merging this pull request may close these issues.

3 participants