Temporary Logging #141

TeachMeTW · 2024-10-09T16:22:00Z

Description

This pull request introduces a temporary logging setup to enhance debugging and monitor function execution times until the logging mechanism can be fully integrated into the emission server.

Changes

Configured the logging system to capture all log levels with detailed formatting.
Implemented a log_execution_time decorator to log the start and end times of function executions.

shankari

@TeachMeTW

In addition to the decorator here, you can also instrument sections of code. These functions are small, but for bigger functions with multiple steps, you can do something like: https://github.com/e-mission/op-admin-dashboard/pull/141/files#diff-44562706f07395bde42fb1c105fe16e41bbba72b174847e62d3b8f257212717aR204

    (start_date, end_date) = iso_to_date_only(start_date, end_date)
    with ect.Timer() as ctt:
        if not df.empty and has_permission('overview_trips_trend'):
           trend_df = compute_trips_trend(df, date_col = "trip_start_time_str")
    esds.store_dashboard_time("compute_trips_trend", time.time(), ctt.elapsed)
    with ect.Timer() as gbt:
        fig = generate_barplot(trend_df, x = 'date', y = 'count', title = f"Trips trend ({start_date} to {end_date})")
    esds.store_dashboard_time("generate_barplot", time.time(), gbt.elapsed)

Again, these are examples, I don't think this function is complex enough to warrant this level of fine-grained instrumentation. But it is an example for more complex functions, such as the ones in db_utils 😄

@JGreenlee for visibility

To clarify, the suggestions below are intended to be guidelines to show you what I expect. I have not tested them, and I don't promise that they will work without any changes.

pages/home.py

shankari

Please:

Indicate "Testing done"
Clean up imports

Are there are not any functions (e.g. in db_utils) that would benefit from instrumenting sub-parts of the function?

I can anticipate the benefits of being able to distinguish between (for example) the DB read time and the processing time, so we can see if (for example) converting a list from strings to UUID objects is really the bottleneck.

pages/home.py

TeachMeTW · 2024-10-19T06:44:21Z

@shankari Cleaned Up Imports;

Are there are not any functions (e.g. in db_utils) that would benefit from instrumenting sub-parts of the function?

I can anticipate the benefits of being able to distinguish between (for example) the DB read time and the processing time, so we can see if (for example) converting a list from strings to UUID objects is really the bottleneck.

We can do that, but I would have to implement things different; right now I would say its 'good enough', but I can work on a more precise function logger. I already track individual functions in the callbacks so that should narrow things down

TeachMeTW · 2024-10-19T16:33:12Z

@shankari

I believe I've made a way for it to be more precise; shall I proceed with this? What are your thoughts on this design?

@log_execution_time
def query_demographics():
    """
    Queries and processes demographic survey data, returning a dictionary of DataFrames
    keyed by survey IDs. Each DataFrame contains the processed survey responses.
    
    :return: Dictionary where keys are survey IDs and values are corresponding DataFrames.
    """
    
    logging.debug("Querying the demographics for (no date range)")
    
    # Stage 1: Fetch Aggregate Time Series
    with log_stage_time("Fetch Aggregate Time Series"):
        ts = esta.TimeSeries.get_aggregate_time_series()
    
    # Stage 2: Find Demographic Survey Entries
    with log_stage_time("Find Demographic Survey Entries"):
        entries = ts.find_entries(["manual/demographic_survey"])
        data = list(entries)
    
    # Stage 3: Organize Entries by Survey Key
    with log_stage_time("Organize Entries by Survey Key"):
        available_key = {}
        for entry in data:
            survey_key = list(entry['data']['jsonDocResponse'].keys())[0]
            if survey_key not in available_key:
                available_key[survey_key] = []
            available_key[survey_key].append(entry)

via:

def log_stage_time(stage_name: str):
    """
    Context manager to log and store the execution time of a specific stage within a function.
    Automatically prefixes the stage name with the parent function's name for clarity.
    
    :param stage_name (str): The name of the stage being timed.
    """
    parent_func = current_parent_function.get()
    if parent_func:
        full_stage_name = f"{parent_func} - {stage_name}"
    else:
        full_stage_name = stage_name

    logging.debug(f"Starting stage '{full_stage_name}'")
    logging.info("********** Starting stage: " + full_stage_name + " **********")
    timer = ect.Timer()
    timer.__enter__()
    try:
        yield
    finally:
        timer.__exit__()
        esds.store_dashboard_time(full_stage_name, timer)
        logging.info(f"Finished stage '{full_stage_name}' in {timer.elapsed:.2f} seconds")

It now has IN DETAIL stages of what parts are slow:

{
       "_id" : ObjectId("6713dece1b3e03ae8a444db7"),
       "user_id" : null,
       "metadata" : {
               "key" : "stats/dashboard_time",
               "platform" : "server",
               "write_ts" : 1729355470.9133275,
               "time_zone" : "America/Los_Angeles",
               "write_local_dt" : {
                       "year" : 2024,
                       "month" : 10,
                       "day" : 19,
                       "hour" : 9,
                       "minute" : 31,
                       "second" : 10,
                       "weekday" : 5,
                       "timezone" : "America/Los_Angeles"
               },
               "write_fmt_time" : "2024-10-19T09:31:10.913327-07:00"
       },
       "data" : {
               "name" : "query_demographics",
               "ts" : 1729355470.9132683,
               "reading" : 108.23929199250415
       }
}
{
       "_id" : ObjectId("6713dece1b3e03ae8a444db6"),
       "user_id" : null,
       "metadata" : {
               "key" : "stats/dashboard_time",
               "platform" : "server",
               "write_ts" : 1729355470.9114928,
               "time_zone" : "America/Los_Angeles",
               "write_local_dt" : {
                       "year" : 2024,
                       "month" : 10,
                       "day" : 19,
                       "hour" : 9,
                       "minute" : 31,
                       "second" : 10,
                       "weekday" : 5,
                       "timezone" : "America/Los_Angeles"
               },
               "write_fmt_time" : "2024-10-19T09:31:10.911493-07:00"
       },
       "data" : {
               "name" : "query_demographics - Clean and Modify DataFrames",
               "ts" : 1729355470.9114244,
               "reading" : 15.958874981151894
       }
}
{
       "_id" : ObjectId("6713dece1b3e03ae8a444db5"),
       "user_id" : null,
       "metadata" : {
               "key" : "stats/dashboard_time",
               "platform" : "server",
               "write_ts" : 1729355470.894045,
               "time_zone" : "America/Los_Angeles",
               "write_local_dt" : {
                       "year" : 2024,
                       "month" : 10,
                       "day" : 19,
                       "hour" : 9,
                       "minute" : 31,
                       "second" : 10,
                       "weekday" : 5,
                       "timezone" : "America/Los_Angeles"
               },
               "write_fmt_time" : "2024-10-19T09:31:10.894045-07:00"
       },
       "data" : {
               "name" : "query_demographics - Create DataFrames from Organized Entries",
               "ts" : 1729355470.8939605,
               "reading" : 17.122000019298866
       }
}

It comes in the form `main function - Stage`

TeachMeTW · 2024-10-19T16:55:42Z

Also detected discrepencies with testing already; for example with trajectories:
INFO:root:Finished 'query_trajectories' in 0.02 seconds
INFO:root:Finished 'query_trajectories' in 51.29 seconds

shankari · 2024-10-19T17:06:45Z

We can do that, but I would have to implement things different; right now I would say its 'good enough', but I can work on a more precise function logger. I already track individual functions in the callbacks so that should narrow things down

I am not sure why you need a more precise function logger. You can just use the timer directly for any block of code.
It does not need to be a function

    # Stage 1: Fetch Aggregate Time Series
    with ect.Timer() as dfat:
        ts = esta.TimeSeries.get_aggregate_time_series()
    esds.store_pipeline_time("db_utils/fetch_aggregate_time_series", dfat)
    
    # Stage 2: Find Demographic Survey Entries
    with ect.Timer() as dfde:
        entries = ts.find_entries(["manual/demographic_survey"])
        data = list(entries)
    esds.store_pipeline_time("db_utils/find_demographic_survey_entries", dfde)

Your approach works too, but seems like a bit of overkill given that you need to use internal functions of the timer and yield

I would suggest using the timer directly for now, and we can discuss the stage wrapper later.

Note also that you should not be using spaces in the names to avoid weirdnesses down the road.

TeachMeTW · 2024-10-19T17:10:11Z

Sounds good, I feel like it is a bit cleaner as opposed to having different timer names such as dfat and dfde and allows for additional logging/info; but yes, i'll use the current timer and we can discuss the stage wrapper at a later date once we get a mvp online

TeachMeTW · 2024-10-19T21:20:26Z

@shankari I did what you suggested, I also added mixing doc strings and moved comments in order to help my debugging and getting rid of plotly errors

TeachMeTW · 2024-10-19T21:21:40Z

It should be ready for db_utils, home and data pages for staging.

shankari · 2024-10-19T21:44:30Z

Sounds good, I feel like it is a bit cleaner as opposed to having different timer names such as dfat and dfde and allows for additional logging/info; but yes, i'll use the current timer and we can discuss the stage wrapper at a later date once we get a mvp online

Agreed and we can discuss this later - see
e-mission/e-mission-server#986 (comment)

shankari

One final fix
I am fine with this with one final fix. Since we want to reuse the dashboard stats for both the admin and public dashboards, let's preface all the names by admin. This will allow us to be forwards compatible and support both admin and public dashboards in the future.

pages/data.py

TeachMeTW · 2024-10-20T05:31:17Z

@shankari I've modified it

shankari

LGTM!

Squash merging again as usual given the commit churn. Please account for it properly while pulling the changes to build on them.

shankari · 2024-10-20T15:36:02Z

Actually, there isn't a lot of commit churn - I guess @TeachMeTW has started rebasing. But still squash merging so I can fix the commit message.

TeachMeTW mentioned this pull request Oct 9, 2024

Performance Improvements in Home Page Processing #126

Closed

shankari requested changes Oct 12, 2024

View reviewed changes

shankari mentioned this pull request Oct 12, 2024

Stats Timing e-mission/e-mission-server#986

Merged

shankari requested changes Oct 19, 2024

View reviewed changes

pages/home.py Outdated Show resolved Hide resolved

pages/home.py Outdated Show resolved Hide resolved

pages/home.py Outdated Show resolved Hide resolved

Added logging to data and home

10d5955

TeachMeTW force-pushed the feature/Temporary-Logging branch from cf0d415 to 10d5955 Compare October 19, 2024 21:16

shankari requested changes Oct 20, 2024

View reviewed changes

pages/data.py Outdated Show resolved Hide resolved

pages/data.py Outdated Show resolved Hide resolved

pages/data.py Outdated Show resolved Hide resolved

modified name

f12bd9a

shankari approved these changes Oct 20, 2024

View reviewed changes

shankari merged commit 7769769 into e-mission:master Oct 20, 2024

shankari mentioned this pull request Oct 20, 2024

Collect data on performance improvements to the dashboard #145

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Temporary Logging #141

Temporary Logging #141

TeachMeTW commented Oct 9, 2024

shankari left a comment •

edited

Loading

shankari left a comment

TeachMeTW commented Oct 19, 2024

TeachMeTW commented Oct 19, 2024 •

edited

Loading

TeachMeTW commented Oct 19, 2024

shankari commented Oct 19, 2024 •

edited

Loading

TeachMeTW commented Oct 19, 2024 •

edited

Loading

TeachMeTW commented Oct 19, 2024

TeachMeTW commented Oct 19, 2024

shankari commented Oct 19, 2024

shankari left a comment

TeachMeTW commented Oct 20, 2024

shankari left a comment

shankari commented Oct 20, 2024

Temporary Logging #141

Temporary Logging #141

Conversation

TeachMeTW commented Oct 9, 2024

Description

Changes

shankari left a comment • edited Loading

Choose a reason for hiding this comment

shankari left a comment

Choose a reason for hiding this comment

TeachMeTW commented Oct 19, 2024

TeachMeTW commented Oct 19, 2024 • edited Loading

It now has IN DETAIL stages of what parts are slow:

It comes in the form main function - Stage

TeachMeTW commented Oct 19, 2024

shankari commented Oct 19, 2024 • edited Loading

TeachMeTW commented Oct 19, 2024 • edited Loading

TeachMeTW commented Oct 19, 2024

TeachMeTW commented Oct 19, 2024

shankari commented Oct 19, 2024

shankari left a comment

Choose a reason for hiding this comment

TeachMeTW commented Oct 20, 2024

shankari left a comment

Choose a reason for hiding this comment

shankari commented Oct 20, 2024

shankari left a comment •

edited

Loading

TeachMeTW commented Oct 19, 2024 •

edited

Loading

It comes in the form `main function - Stage`

shankari commented Oct 19, 2024 •

edited

Loading

TeachMeTW commented Oct 19, 2024 •

edited

Loading