Skip to content

First non-container cmd-line version of structman

Compare
Choose a tag to compare
@AlexanderGress AlexanderGress released this 02 Feb 10:54
· 32 commits to main since this release

With this initial release we add the development version history of structman:

Version 1.4.0
- Major update preparing the public version installable outside of a container
- Implemented a installation script to fully setup structman in a conda environment
- New outputs that support the direct usage of structguy after using structman
- Added support for sqlite3 as a local database option, hence removed the lite version
- Integrated MicroMiner into the pipeline (usage requires to configure an executable)
- Switched to ray 2.9.1
- Many smaller bug fixes

Version 1.3.0
- Needed to add an Interface hashing system to the database to retrace different interfaces from the same protein
- Updated the automated Modelling pipeline, improved the template selection to focus on mutated sites
- New functionality: now one can give tags to proteins that annotate a gene identifier (for example: "gene:P53"). Multiple protein isoforms with same gene tag will then lead to an isoform analysis.
- switched to ray 2.2.0
- Added support for Ensembl transcript identifiers

Version 1.2.2
- added the feature to parse HGVS protein mutation identifiers (for example: p.Arg12His)

Version 1.2.1
- updated Uniprot ID mapping service to REST API

Version 1.2.0
- added database support for interfaces, aggregated interface, protein-protein-interaction, position-position-interactions
- switched to ray 1.13.0

Version 1.1.1
- added aggregated contact matrix calculation
- added aggregated interface calculation

Version 1.1.0
- database makes now version checks
- switched to ray 1.11.0
- added model database support (specifically alphafold DB)

Version 1.0.3
- database create command now checks if already an instance exists first
- if database is not properly configured and structman got still called in db mode, now it switches automatically to lite mode

Version 1.0.2
- fasta input parsing checks now for irregular characters
- improved some bug logging
- added a possibility to run the alignment in serial mode without ray for debugging purposes
- switched to ray 1.10.0

Version 1.0.1
- removed some commented code
- slight changes of file structure to prevent circular imports
- multiple small bug fixes

Version 1.0.0
- first main version
- switched to ray 1.9.0
- Multiple smaller bug fixes

Version 0.9.2
- changed how imports are organized
- switched to ray 1.8.0

Version 0.9.1
- Various smaller bug fixes
- Multiple model structures (not NMR) are now fused internally into one model
- obsolete uniprot entries are now supported

Version 0.9.0
- Moved Modeller to version 10.1
- Fixed multiple modelling bugs
- Implemented aggregated indel analysis
- Indel analysis results are now a compressed row in the database
- Added automated script that creates/updates the mapping/sequence database
- mapping/sequence database got a new structure
- default mode now doesn't model indel structures. Can be activated by config or flag
- Updated disclaimers

Version 0.8.1
- The size of the database row for position data and alignment is now bigger
- Fixed a bug that prevented disordered region annotation to happen
- Upgraded disorder prediction method from IUpred2A to IUpred3
- Unpacking of position data in the ouput creation is now paralellized

Version 0.8.0
- Bigger data fields are now stored as binaries in the database

Version 0.7.3
- The update script now addtionally loads the .../data/status directory
- The RINdb update functions checks and writes (into meta.txt in the RINdb main directory) the date of the last update
- The RINdb compares the date of the last update with the dates of structure modifications in the /data/status directory of the PDB
- Added a function to split sequences of structures with unknown portions; to align them individually; and to fuse the alignments afterwards
- Configured custom output format of MMseqs2
- Use sequence length of the structure to better estimate the cost for an alignment

Version 0.7.2
- Added a nested parallelization for the classifcation. Triggers only, when we have many cores available and the classifcation task is large enough.
- updated the docker update script

Version 0.7.1
- Fixed a bug that kept the program using a local instance of the PDB
- Improved the protein splitting scheduling in the alignment parallelization, now works better with few proteins that are mapped to many structures
- Improved the performance of the packaging in the parallelized classification

Version 0.7.0
- Fixed some bugs regarding the parsing of asymmetric unit structures for cases them being given as input
- Parallelization if filter_structures is now chunked
- Improved chunking of alignment tasks
- Ray reinit for each major core loop to prevent memory leaks
- added custom object serialization functions to utils
- Fixed bugs regarding the parsing of PDB-type inputs
- added '--mem_limit [any%]' flag to limit maximal memory usage for testing purposes
- extensive overhaul of the parallelization of the annotation and the classification

Version 0.6.1
- Added output util 'RAS' that Retrieves all Annotated Structures (and their protein interaction partner) from a specific session.

Version 0.6.0:
- Added output util suggest command that ranks structural annotation templates
- Added associated database table Suggestion for rankings
- Add pandas requirement to structman conda environment file

Version 0.5.1
- increased radius for intra chain interface score sphere

Version 0.5.0:
- Location (formerly surface/core) gets now divided into surface/buried/core and into total/mainchain/sidechain
- Classification is based on sidechain location (means also a new class 'buried')
- Changed location and surface values had to be stored in the database -> version change to 0.5.x
- Added all types of locations and surface values to classification and feature table outputs
- Implemented an interface milieu analysis (inspired by core/rim interface distinction, but goes into more detail)

Version 0.4.1:
- Unspecified fasta inputs now add all positions as query

Version 0.4.0:
* Major Changes
- Add input checksums to see if input file has changed, triggering new session if so
- Adds new Checksum column to Session table in DB
- Separate output.py into multiple modules
* Enhancements
- Allow C++ style // line and trailing comments in smlf format
- Search user home folder for config file when no others found
- Added new structman.utils module (less structman specific than structman.lib.sdsc.utils)
* Misc Changes/Fixes
- Fix Mutated Sequence off-by-1 indexing error
- Move several functions from structman.lib.sdsc.utils to structman.utils
- Check that database_source_path exists before creating DB to avoid missing tables and ensuing failure of structman database create
- Delete duplicated sql script from /lib (resides in /scripts)
- Prefer os.path.join to avoid errors with double slashes in path
- Use package paths set in settings.py for ray_init()
- Use surface_threshold from config in database.getClassWT, database.getClass, database.diffSurfs
- Rename class Output_generator to OutputGenerator to conform to pep8
- Fix input/output issues when using relative paths
- Import fixes
- Autopep8 fixes
- Updated Readme

Version 0.3.2:
- Modelling now builds up a database of models that can be used to save computation costs for later runs
- Positions filtered by sanity checks now do not delete the whole protein
- Indel analysis can now be used with stored information from the database
- Added '--only_wt' option to run the pipeline without the generation of mutated proteins
- Multiple smaller bug fixes
- Improved remote classification

Version 0.3.1:
- Improved the store handling of remote classification
- Implemented a modelling output framework. Currently produced one model per input protein and model for each given mutation.
- Multiple smaller bug fixes

Version 0.3.0:
- Begin of changelog
- Acts as first master version of structman as pip package