Releases: sacdallago/biotrainer
Releases · sacdallago/biotrainer
v0.9.4
29.10.2024 - Version 0.9.4
Bug fixes
Maintenance
- Updating dependencies: removing python3.9 support
- Updating CI workflow to be compatible with Windows
Known problems
- Currently, there are compatibility problems with ONNX on some machines, please refer to the following issue: #111
v0.9.3
v0.9.2
v0.9.1
10.07.2024 - Version 0.9.1
Maintenance
- Fixing error in type checking for device
- Updating dependencies
- Updating inference examples
- Adding hint for version mismatch in inferencer
- Adding class weights to
out.yml
if they are calculated - Adding contributors file
Features
- Improving fallback mechanism of embedder models. Now, cpu mode is exited once there is enough
RAM again for shorter sequences - Changing model storage format from
.pt
to.safetensors
.
Safetensors is safer for model sharing. Legacy.pt
format is still supported, and can be converted via
from biotrainer.inference import Inferencer
inferencer, out_file = Inferencer.create_from_out_file(out_file_path="out.yml", allow_torch_pt_loading=True)
inferencer.convert_all_checkpoints_to_safetensors()
v0.9.0
16.06.2024 - Version 0.9.0
Maintenance
- Adding more extensive code documentation
- Optimizing imports
- Applying consistent file naming
- Updating dependencies. Note that
jupyter
was removed as a direct optional dependency.
You can always add it viapoetry add jupyter
. - Adding simple differentiation between t5 and esm tokenizer and models in
embedders
module
Features
- Adding new
residues_to_value
protocol.
Similar to the residues_to_class protocol,
this protocol predicts a value for each sequence, using per-residue embeddings. It might, in some situations, outperform
the sequence_to_value protocol.
Bug fixes
- For
huggingface_transformer_embedder.py
, all special tokens are now always deleted from the final embedding
(e.g. first/last for esm1b, last for t5)
v0.8.4
v0.8.3
04.05.2024 - Version 0.8.3
Maintenance
- Updating dependencies
Features
- Adding mps device for macOS. Use by setting the following configuration option:
device: mps
.
Note that MPS is still under development, use it at your responsibility. - Adding flags to the
compute_embedding
method ofEmbeddingService
force_output_dir
: Do not change the given output directory within the methodforce_recomputing
: Always re-compute the embeddings, even if an existing file is found
These changes are made to make the embedders module of biotrainer easier usable outside the biotrainer pipeline itself.
v0.8.2
Maintenance
- Updating dependencies
Features
- Adding option to ignore verification of files in
configurator.py
. This makes it possible to verify a biotrainer
configuration independently of the provided files. - Adding new compute_embeddings_from_list function to
embedding_service.py
. This allows to compute embeddings directly
from sequence strings.
v0.8.1
12.01.2024 - Version 0.8.1
Maintenance
- Updating dependencies after removing bio_embeddings, notably upgrading torch and adding accelerate
- Updating examples, documentation, config and test files for inferencer tests to match the new compile mode
- Replaced the exception with a warning if dropout_rate was set for a model that does not support it (e.g. LogReg)
Features
- Enable pytorch compile mode. The feature exists since torch 2.0 and is now available in biotrainer. It can be enabled via
disable_pytorch_compile: False
v0.8.0
04.01.2024 - Version 0.8.0
Maintenance
- Removing dependency on bio_embeddings entirely. bio_embeddings is not really maintained
anymore (last commit 2 years ago) and being dependent on a specific external module for embeddings calculation
shrinks the overall capabilities of biotrainer. Now, for example, adding LORA layers becomes much easier.
While bio_embeddings does have its advantages such as a well-defined pipeline and a lot of utilities, it also
provides a lot of functionalities that is not used by biotrainer. Therefore, a newembedders
module was introduced
to biotrainer that mimics some aspects of bio_embeddings and takes inspiration from it. However, it is built in a more
generic way and enables, in principle, all huggingface transformer embedders to be used by biotrainer. - Ankh custom embedder was removed, because it can now be used directly in biotrainer:
embedder_name: ElnaggarLab/ankh-large
- Adding new use_half_precision option for transformer embedders
- Adding missing device option
Bug fixes
- Fixed a minor problem for model saving in
Solver.py
:
If a new model was trained, and it does not improve untilearly_stop
is triggered, it was not saved as a checkpoint.