-
Notifications
You must be signed in to change notification settings - Fork 119
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Cleaned up duplicate code, log statements + Refactored export.py
Changes 1. Fetching only loc-like entries from the existing export data logic as the raw timeseries entries. - Found a lot of references that trip and place entries are a part of analysis timeseries database. Almost every place I’ve found uses data.start_ts for “analysis/*” metadata key entries In bin/debug/export_participants_trips_csv.py ``` ts = esta.TimeSeries.get_time_series(user_id) trip_time_query = estt.TimeQuery("data.start_ts", start_day_ts, end_day_ts) ct_df = ts.get_data_df("analysis/confirmed_trip", trip_time_query) ``` --------- In bin/debug/label_stats.py ``` for t in list(edb.get_analysis_timeseries_db().find({"metadata.key": "analysis/inferred_trip", "user_id": sel_uuid})): if t["data"]["inferred_labels"] != []: confirmed_trip = edb.get_analysis_timeseries_db().find_one({"user_id": t["user_id"], "metadata.key": "analysis/confirmed_trip", "data.start_ts": t["data"]["start_ts"]}) ``` Similarly for data.entry_ts. ----------------- On the other hand for data.ts, timeseries_db was used since “background/*” metadata key entries were used: In emission/analysis/intake/segmentation/section_segmentation.py ``` get_loc_for_ts = lambda time: ecwl.Location(ts.get_entry_at_ts("background/filtered_location", "data.ts", time)["data"]) trip_start_loc = get_loc_for_ts(trip_entry.data.start_ts) trip_end_loc = get_loc_for_ts(trip_entry.data.end_ts) ``` ---------------- In emission/analysis/intake/segmentation/trip_segmentation.py ``` untracked_start_loc = ecwe.Entry(ts.get_entry_at_ts("background/filtered_location", "data.ts", last_place_entry.data.enter_ts)).data ``` -------------------------------------- 2. Refactored emission/export/export.py - Added a separate function that returns exported entries so that this function can be reused in the purge pipeline code. - This helped to remove repeated code for re-fetching exported entries. - Also using databases parameter for exporting data from specific db. For the purge usecase, `databases` should only have 'timeseries_db' -------------------------------------- 3. Added raw_timeseries_only parameter to load_multi_timeline_for_range.py - If this argument is set, then pipeline_states will not be loaded since we don't want pipeline states to be restored during restoring raw timeseries data. -------------------------------------- 4. Cleaned up tests - Reduced repetitive code by moving assertion tests to functions that can be reused for both full and incremental export testing. -------------------------------------- 5. Removed export_timeseries.py and import_timeseries.py - No need to have duplicate code since now using existing scripts present in load_multi_timeline_for_range.py and export.py --------------------------------------
- Loading branch information
Mahadik, Mukul Chandrakant
authored and
Mahadik, Mukul Chandrakant
committed
Sep 16, 2024
1 parent
23734e5
commit 4703f04
Showing
8 changed files
with
228 additions
and
642 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file was deleted.
Oops, something went wrong.
Oops, something went wrong.