Skip to content

Latest commit

 

History

History
110 lines (102 loc) · 5.19 KB

TODO.md

File metadata and controls

110 lines (102 loc) · 5.19 KB

In Progress

  • Tests
    • Workflow test with example data
    • Trivial examples for each function
    • Unit tests for SSI
    • Unit tests for density features
  • Integrate DiffNets.
    • Lay out module structure in separate branch.
    • Copy core network from DiffNets repo.
    • Try to use existing featurization.
    • Include existing DiffNets featurization and compare.
  • exploratory analysis via correlation coefficients of the features
    • First tests --> not very promising.
    • Try different metric
    • Find useful application or leave it out.
  • Unified tutorial in documentation. Make one page for each subpackage
    • preprocessing
      • coordinates
      • densities
    • featurization
      • structure features
      • water features
      • atom features
    • comparison
    • dimensionality reduction
    • clusters (show how to cluster on PCs)
    • SSI

Plans

  • Try using MDAnalysis instead of biotite for water featurization
  • Integrate more options for features from PyEMMA (think carefully about how to make it more flexible)
  • More example tcl scripts for VMD
  • Facilitate calculation of JSD etc. on principal components
  • Facilitate calculation of SSI on results of joint clustering.
  • Weighted PCA/tICA? (to account for varying simulation lengths or uncertainty)
  • Feature comparison of more than two ensembles
    • with respect to the joint ensemble (all metrics)
    • with respect to a reference ensemble (will not always work for KLD)
  • Implement T-distributed Stochastic Neighbor Embedding (t-SNE)
    • Read up on t-SNE for molecular trajectories
    • See if we can import or adapt existing code.
    • First tests with (regular) t-SNE
    • Test time-lagged t-SNE. How to handle time-dependence across simulations/ensembles?
    • write module
    • write unit tests
  • Implement a clustering algorithem designed for structural ensembles
    • Read up about CLoNe
    • First tests
    • write module
    • write unit tests
  • Put shared functionality of PCA and TICA into shared functions.
  • Make file format (png/pdf?) for matplotlib optional.
  • Implement Linear Discriminant Analysis

Ideas

  • Logo
  • Hydrogen bonds as features
  • Contacts as features (can PyEMMA do this?)
  • Position deviations as features (similar to components of RMSD)
  • Estimate thresholds for significance of feature differences
    • Calculate correlation times within trajectories
    • modify p-value of KS test using correlation time
    • modify p-value of KS test using number of simulation runs per ensemble
  • Wasserstein distance to compare ensembles
  • Add options to save and load calculated features
  • Add option to whiten features
  • Featurizers for other molecule types
    • ligands
    • lipids
    • nucleic acids
  • Simplify adding hand-crafted features
  • Implement conformational entropy calculations
    • Read papers, e.g, 1, 2, 3
    • Test implementations, e.g., Xentropy to find the best way to do it.
  • Implement multi-dimensional scaling
  • Try to integrate functional mode analysis.
  • Try to integrate VAMPnets.
  • Try to integrate network analysis.

Done ✓

  • Colab Tutorial
    • Put Notebook on Colab and get it to run.
    • Add visualizations.
    • Fix installation via pip.
    • Fix animations (they only show white canvas).
    • Add TICA to Colab tutorial.
  • Include TICA in unit tests
  • Write "getting started" for documentation
  • Refactoring and fixes for release 0.2
    • Restructure modules to subpackages
    • Adapt README
    • Adapt API documentation
    • Include SSI to comparison example script
    • Numbering of principal component trajectories starts with 0, should start with 1
    • Axis labels and legend name for distance matrix plot
    • Function pca_features() does not have labels
    • Function compare_projections() does not have labels or legend
  • Slack channel for all developers and testers, and to provide support for the user community.
  • Implement clustering in principal component space

Abandoned

  • Frame classification via CNN on features
    • Prototype to classify simulation frames --> Diffnets probably more powerful.
    • Interpret weights as relevance of features
    • Write module
    • Write unit tests