Adding Clustering via Extended Similarity Metrics #1049

drroe · 2023-09-13T14:33:03Z

In collaboration with @ramirandaq @lexin-chen, expand the cluster analysis capabilities of cpptraj by adding clustering via extended similarity metrics (and more).

Some background reading:

https://link.springer.com/article/10.1186/s13321-021-00505-3

https://link.springer.com/article/10.1007/s10822-022-00444-7

ramirandaq · 2023-09-22T20:25:05Z

Here #1051 (comment) it says "Calculate extended comparison similarity values for each trajectory frame." Is this the complementary similarity used to then find medoids and outliers in the trajectory?

drroe · 2023-09-25T14:20:04Z

Is this the complementary similarity used to then find medoids and outliers in the trajectory?

Yes - it's equivalent to the gen_sim_dict routine from src/tools/esim_modules.py in MDANCE.

ramirandaq · 2023-09-25T14:32:25Z

gen_sim_dict will take as an input a set of frames/conformations, and output a number (the extended similarity) for the whole set, not a number for every frame. To calculate the outliers and medoids, the function is calculate_comp_sim (in src/tools/bts.py). The complementary similarity does assign a number to every frame in a set, which can be used to rank the frames from high- to low-density.

drroe · 2023-09-25T15:47:50Z

gen_sim_dict will take as an input a set of frames/conformations, and output a number (the extended similarity) for the whole set, not a number for every frame.

Yes, I understand that. Let me be more clear.

The ExtendedSimilarity::Comparison() function is most like gen_sim_dict. The ExtendedSimilarity::CalculateCompSim() function (which is what the extendedcomp command, Exec_ExtendedComparison class) is using under the hood is more like calculate_comp_sim. Let me know if you have any more questions.

ramirandaq · 2023-09-25T15:52:57Z

Sounds great! The functionality in bts.py is a bit more general, because it accommodates extended indices and MSD in a more general way, but this is perfect.

drroe added the Feature Request Request for new features/functionality. label Sep 13, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adding Clustering via Extended Similarity Metrics #1049

Adding Clustering via Extended Similarity Metrics #1049

drroe commented Sep 13, 2023

ramirandaq commented Sep 22, 2023

drroe commented Sep 25, 2023

ramirandaq commented Sep 25, 2023

drroe commented Sep 25, 2023

ramirandaq commented Sep 25, 2023

Adding Clustering via Extended Similarity Metrics #1049

Adding Clustering via Extended Similarity Metrics #1049

Comments

drroe commented Sep 13, 2023

ramirandaq commented Sep 22, 2023

drroe commented Sep 25, 2023

ramirandaq commented Sep 25, 2023

drroe commented Sep 25, 2023

ramirandaq commented Sep 25, 2023