Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding Clustering via Extended Similarity Metrics #1049

Open
drroe opened this issue Sep 13, 2023 · 5 comments
Open

Adding Clustering via Extended Similarity Metrics #1049

drroe opened this issue Sep 13, 2023 · 5 comments
Labels
Feature Request Request for new features/functionality.

Comments

@drroe
Copy link
Contributor

drroe commented Sep 13, 2023

In collaboration with @ramirandaq @lexin-chen, expand the cluster analysis capabilities of cpptraj by adding clustering via extended similarity metrics (and more).

Some background reading:

https://link.springer.com/article/10.1186/s13321-021-00505-3

https://link.springer.com/article/10.1007/s10822-022-00444-7

@drroe drroe added the Feature Request Request for new features/functionality. label Sep 13, 2023
@ramirandaq
Copy link

Here #1051 (comment) it says "Calculate extended comparison similarity values for each trajectory frame." Is this the complementary similarity used to then find medoids and outliers in the trajectory?

@drroe
Copy link
Contributor Author

drroe commented Sep 25, 2023

Is this the complementary similarity used to then find medoids and outliers in the trajectory?

Yes - it's equivalent to the gen_sim_dict routine from src/tools/esim_modules.py in MDANCE.

@ramirandaq
Copy link

gen_sim_dict will take as an input a set of frames/conformations, and output a number (the extended similarity) for the whole set, not a number for every frame. To calculate the outliers and medoids, the function is calculate_comp_sim (in src/tools/bts.py). The complementary similarity does assign a number to every frame in a set, which can be used to rank the frames from high- to low-density.

@drroe
Copy link
Contributor Author

drroe commented Sep 25, 2023

gen_sim_dict will take as an input a set of frames/conformations, and output a number (the extended similarity) for the whole set, not a number for every frame.

Yes, I understand that. Let me be more clear.

The ExtendedSimilarity::Comparison() function is most like gen_sim_dict. The ExtendedSimilarity::CalculateCompSim() function (which is what the extendedcomp command, Exec_ExtendedComparison class) is using under the hood is more like calculate_comp_sim. Let me know if you have any more questions.

@ramirandaq
Copy link

Sounds great! The functionality in bts.py is a bit more general, because it accommodates extended indices and MSD in a more general way, but this is perfect.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Feature Request Request for new features/functionality.
Projects
None yet
Development

No branches or pull requests

2 participants