You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
It is common to use unsupervised learning algorithms to cluster a
trajectory by similarity to discover different states.
A typical approach is to first calculate the RMSD of a structure with
itself across all trajectory frames, forming a $T \times T$ symmetric
RMSD matrix where $T$ is the number of frames in the
trajectory. However, any other metric (or distance) instead of
RMSD will also work if you can produce a suitable similarity matrix.
One can then use any number of clustering algorithms to partition
the similarity matrix and thus assign different cluster numbers to the
different frames of the trajectory.
The mdaencore MDAKit for Ensemble Similarity Calculations
also implements some clustering methods (namely, Affinity Propogation and
DBSCAN/KMeans, via scikit-learn; as described in the docs here).
However, a general-use cluster analysis tool, featuring a larger
selection of clustering algorithms would likely be useful to many users.
Clustering methods could include -
scikit-learn clustering contains many clustering algorithms
that can be either used with a similarity matrix or directly with
trajectory data like coordinates.
The GROMOS clustering algorithm is widely used in biomolecular
simulations [Daura 1999]. (See Issue #2876.)
The following excerpt from Daura et al. describes the algorithm:
"To find clusters of structures in a trajectory the RMSD of atom
positions between all pairs of structures was determined. For each
structure the number of other structures for which the RMSD was
0.1 nm or less (backbone, residues 2 ± 6) for structure 1 or 0.08
nm or less (backbone, residues 2 ± 5) for structure 2 (neighbor
conformations) was calculated. The structure with the highest
number of neighbors was taken as the center of a cluster, and
formed together with all its neighbors a (first) cluster. The
structures of this cluster were thereafter eliminated from the
pool of structures. The process was repeated until the pool of
structures was empty. In this way, a series of nonoverlapping
clusters of structures was obtained."
Overview
It is common to use unsupervised learning algorithms to cluster a
trajectory by similarity to discover different states.
A typical approach is to first calculate the RMSD of a structure with$T \times T$ symmetric$T$ is the number of frames in the
itself across all trajectory frames, forming a
RMSD matrix where
trajectory. However, any other metric (or distance) instead of
RMSD will also work if you can produce a suitable similarity matrix.
One can then use any number of clustering algorithms to partition
the similarity matrix and thus assign different cluster numbers to the
different frames of the trajectory.
The mdaencore MDAKit for Ensemble Similarity Calculations
also implements some clustering methods (namely, Affinity Propogation and
DBSCAN/KMeans, via scikit-learn; as described
in the docs here).
However, a general-use cluster analysis tool, featuring a larger
selection of clustering algorithms would likely be useful to many users.
Clustering methods could include -
scikit-learn clustering contains many clustering algorithms
that can be either used with a similarity matrix or directly with
trajectory data like coordinates.
The GROMOS clustering algorithm is widely used in biomolecular
simulations [Daura 1999]. (See Issue
#2876.)
The following excerpt from Daura et al. describes the algorithm:
"To find clusters of structures in a trajectory the RMSD of atom
positions between all pairs of structures was determined. For each
structure the number of other structures for which the RMSD was
0.1 nm or less (backbone, residues 2 ± 6) for structure 1 or 0.08
nm or less (backbone, residues 2 ± 5) for structure 2 (neighbor
conformations) was calculated. The structure with the highest
number of neighbors was taken as the center of a cluster, and
formed together with all its neighbors a (first) cluster. The
structures of this cluster were thereafter eliminated from the
pool of structures. The process was repeated until the pool of
structures was empty. In this way, a series of nonoverlapping
clusters of structures was obtained."
Objectives
ClusterAnalysis
class that allows the user to run any ofthe scikit-learn clustering algorithms that can work on raw
data (such as K-means). Use the
AnalysisBase
framework to writethe analysis class (see the tutorial on writing your own
trajectory analysis.
ClusterAnalysis
available.[Daura 1999] (described above in more detail).
References
The text was updated successfully, but these errors were encountered: