Molecular Systems Automated Identification of Cooperativity

Docs • Features • Installation • Usage • FAQ

Molecular Systems Automated Identification of Cooperativity

MoSAIC is an unsupervised method for correlation analysis which automatically detects the collective motion in MD simulation data, while simultaneously identifying uncorrelated coordinates as noise. Hence, it can be used as a feature selection scheme for Markov state modeling or simply to obtain a detailed picture of the key coordinates driving a biomolecular process. It is based on the Leiden community detection algorithm which is used to bring a correlation matrix in a block-diagonal form.

The method was published in:

Correlation-Based Feature Selection to Identify Functional Dynamics in Proteins
G. Diez, D. Nagel, and G. Stock,
J. Chem. Theory Comput. 2022 18 (8), 5079-5088,
doi: 10.1021/acs.jctc.2c00337

If you use this software package, please cite the above mentioned paper.

Features

Intuitive usage via module and via CI
Sklearn-style API for fast integration into your Python workflow
No magic, only a single parameter which can be optimized via cross-validation
Extensive documentation and detailed discussion in publication
Step by step tutorial to follow

Installation

The package is called mosaic-clustering and is available via PyPI or conda. To install it, simply call:

python3 -m pip install --upgrade mosaic-clustering

or

conda install -c conda-forge mosaic-clustering

or for the latest dev version

# via ssh key
python3 -m pip install git+ssh://[email protected]/moldyn/MoSAIC.git

# or via password-based login
python3 -m pip install git+https://github.com/moldyn/MoSAIC.git

In case one wants to use the deprecated UMAPSimilarity or the module mosaic umap one needs to specify the extras_require='umap', so

python3 -m pip install --upgrade moldyn-mosaic[umap]

Shell Completion

Using the bash, zsh or fish shell click provides an easy way to provide shell completion, checkout the docs. In the case of bash you need to add following line to your ~/.bashrc

eval "$(_MOSAIC_COMPLETE=bash_source mosaic)"

Usage

In general one can call the module directly by its entry point $ MoSAIC or by calling the module $ python -m mosaic. The latter method is preferred to ensure using the desired python environment. For enabling the shell completion, the entry point needs to be used.

CI - Usage Directly from the Command Line

The module brings a rich CI using click. Each module and submodule contains a detailed help, which can be accessed by

$ python -m mosaic
Usage: python -m mosaic [OPTIONS] COMMAND [ARGS]...

  MoSAIC motion v0.4.1

  Molecular systems automated identification of collective motion, is
  a correlation based feature selection framework for MD data.
  Copyright (c) 2021-2023, Georg Diez and Daniel Nagel

Options:
  --help  Show this message and exit.

Commands:
  clustering  Clustering similarity matrix of coordinates.
  similarity  Creating similarity matrix of coordinates.
  tui         Open Textual TUI for interactive usage.

For more details on the submodule one needs to specify one of the two commands, or by opening the terminal user interface (tui).

A simple workflow example for clustering the input file input_file using correlation and Leiden with CPM and the default resolution parameter:

# creating correlation matrix
$ python -m mosaic similarity -i input_file -o output_similarity --metric correlation -v

MoSAIC SIMILARITY
~~~ Initialize similarity class
~~~ Load file input_file
~~~ Fit input
~~~ Store similarity matrix in output_similarity

# clustering with CPM and default resolution parameter
# the latter needs to be fine-tuned to each matrix
$ python -m mosaic clustering -i output_similarity -o output_clustering --plot -v

MoSAIC CLUSTERING
~~~ Initialize clustering class
~~~ Load file output_similarity
~~~ Fit input
~~~ Store output
~~~ Plot matrix

This will generate the similarity matrix stored in output_similarity, the plotted result in output_clustering.matrix.pdf, the raw data of the matrix in output_clustering.matrix and a file containing in each row the indices of a cluster.

Module - Inside a Python Script

import mosaic

# Load file
# X is np.ndarray of shape (n_samples, n_features)

sim = mosaic.Similarity(
    metric='correlation',  # or 'NMI', 'GY', 'JSD'
)
sim.fit(X)


# Cluster matrix
clust = mosaic.Clustering(
    mode='CPM',  # or 'modularity
)
clust.fit(sim.matrix_)

clusters = clust.clusters_
clusterd_X = clust.matrix_
...

Name		Name	Last commit message	Last commit date
Latest commit History 439 Commits
.github		.github
docs		docs
example		example
src/mosaic		src/mosaic
tests		tests
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
CITATION.cff		CITATION.cff
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.md		README.md
extra-requirements.txt		extra-requirements.txt
flake8-CI.cfg		flake8-CI.cfg
mkdocs.yml		mkdocs.yml
mypy.ini		mypy.ini
requirements.txt		requirements.txt
setup.cfg		setup.cfg
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Molecular Systems Automated Identification of Cooperativity

Features

Installation

Shell Completion

Usage

CI - Usage Directly from the Command Line

Module - Inside a Python Script

About

Releases 2

Contributors 4

Languages

License

moldyn/MoSAIC

Folders and files

Latest commit

History

Repository files navigation

Molecular Systems Automated Identification of Cooperativity

Features

Installation

Shell Completion

Usage

CI - Usage Directly from the Command Line

Module - Inside a Python Script

About

Resources

License

Stars

Watchers

Forks

Releases 2

Contributors 4

Languages