Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update clustering.py #37

Merged
merged 21 commits into from
Nov 25, 2023
Merged
Show file tree
Hide file tree
Changes from 3 commits
Commits
Show all changes
21 commits
Select commit Hold shift + click to select a range
431b33d
Update clustering.py
humbleOldSage Aug 11, 2023
36065b4
moving clustering_examples.ipynb to trip_model
humbleOldSage Aug 16, 2023
97406c4
Removing changes in builtimeseries.py
humbleOldSage Aug 16, 2023
88988d3
Changes to support TRB_Label_Assist
humbleOldSage Aug 20, 2023
3e19b32
suggestions
humbleOldSage Aug 20, 2023
0899ee4
Revert "suggestions"
humbleOldSage Aug 20, 2023
667ab24
Improving readability
humbleOldSage Aug 20, 2023
e2448c5
making `cluster_performance.ipynb`, `generate_figs_for_poster` and `…
humbleOldSage Aug 22, 2023
59c7c64
Unified Interface for fit function
humbleOldSage Aug 26, 2023
a34836f
Fixing `models.py` to support `regenerate_classification_performance_…
humbleOldSage Aug 30, 2023
7eefdb0
[PARTIALLY TESTED] Single database read and Code Cleanuo
humbleOldSage Sep 14, 2023
6a641db
Delete TRB_label_assist/first_trial_results/cv results DBSCAN+SVM (de…
humbleOldSage Sep 18, 2023
3a8bdc0
Reverting Notebook
humbleOldSage Sep 18, 2023
7606d3d
[Partially Tested]Handled Whitespaces
humbleOldSage Sep 19, 2023
bb404e9
[Partially Tested] Suggested changes implemented
humbleOldSage Nov 7, 2023
97475ef
Revert "[Partially Tested] Suggested changes implemented"
humbleOldSage Nov 7, 2023
452e454
[Partially Tested] Suggested changes implemented
humbleOldSage Nov 7, 2023
2a39b12
Minor variable fixes
humbleOldSage Nov 10, 2023
e0beb0e
[TESTED] All the notebooks and files are tested
humbleOldSage Nov 16, 2023
c8c3883
Minor Fixes
humbleOldSage Nov 22, 2023
9225572
Minor Fixes in models.py
humbleOldSage Nov 24, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
45 changes: 35 additions & 10 deletions TRB_label_assist/clustering.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,8 +16,9 @@
# our imports
# NOTE: this requires changing the branch of e-mission-server to
# eval-private-data-compatibility
import emission.analysis.modelling.tour_model_extended.similarity as eamts
# import emission.analysis.modelling.tour_model_extended.similarity as eamts
shankari marked this conversation as resolved.
Show resolved Hide resolved
import emission.storage.decorations.trip_queries as esdtq
import emission.analysis.modelling.trip_model.greedy_similarity_binning as eamtg

EARTH_RADIUS = 6371000
ALG_OPTIONS = [
Expand All @@ -28,9 +29,26 @@
'mean_shift'
]

def CleanEntryTypeData(loc_df,loc_entry):
humbleOldSage marked this conversation as resolved.
Show resolved Hide resolved

"""
Helps weed out entries from the list of entries who's which were removed from the df using
humbleOldSage marked this conversation as resolved.
Show resolved Hide resolved
esdtq.filter_labeled_trips() and esdtq.expand_userinputs()

loc_df : dataframe amde from entry type data
loc_entry : the entry type equivalent of loc_df ,
which was passed alongside the dataframe while loading the data

"""

ids_in_df=set(loc_df['_id'])
shankari marked this conversation as resolved.
Show resolved Hide resolved
filtered_loc_entry= [entry for entry in loc_entry if entry['_id'] in ids_in_df ]
humbleOldSage marked this conversation as resolved.
Show resolved Hide resolved
return filtered_loc_entry


def add_loc_clusters(
loc_df,
loc_entry,
radii,
loc_type,
alg,
Expand All @@ -53,6 +71,9 @@ def add_loc_clusters(
Args:
loc_df (dataframe): must have columns 'start_lat' and 'start_lon'
or 'end_lat' and 'end_lon'
loc_entry ( list of Entry/confirmedTrip): list consisting all entries from the
time data was loaded. loc_df was obtained from this by converting to df and
then filtering out labeled trips and expanding user_inputs
humbleOldSage marked this conversation as resolved.
Show resolved Hide resolved
radii (int list): list of radii to run the clustering algs with
loc_type (str): 'start' or 'end'
alg (str): 'DBSCAN', 'naive', 'OPTICS', 'SVM', 'fuzzy', or
Expand Down Expand Up @@ -98,19 +119,23 @@ def add_loc_clusters(
loc_df.loc[:, f"{loc_type}_DBSCAN_clusters_{r}_m"] = labels

elif alg == 'naive':

cleaned_loc_entry= CleanEntryTypeData(loc_df,loc_entry)

for r in radii:
# this is using a modified Similarity class that bins start/end
# points separately before creating trip-level bins
sim_model = eamts.Similarity(loc_df,
radius_start=r,
radius_end=r,
shouldFilter=False,
cutoff=False)
shankari marked this conversation as resolved.
Show resolved Hide resolved
# we only bin the loc_type points to speed up the alg. avoid
# unnecessary binning since this is really slow
sim_model.bin_helper(loc_type=loc_type)
labels = sim_model.data_df[loc_type + '_bin'].to_list()

model_config = {
"metric": "od_similarity",
"similarity_threshold_meters": r, # meters,
"apply_cutoff": False,
"incremental_evaluation": False
}

sim_model = eamtg.GreedySimilarityBinning(model_config)
sim_model.fit(cleaned_loc_entry)
labels = [int(l) for l in sim_model.tripLabels]
# # pd.Categorical converts the type from int to category (so
# # numerical operations aren't possible)
# loc_df.loc[:, f"{loc_type}_{alg}_clusters_{r}_m"] = pd.Categorical(
Expand Down
Loading