You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In addition to providing the dataset splits via the input FASTA files, it would be a nice feature to allow automatic dataset splitting within the biotrainer pipeline. The dataset splitting should not be done based on pure random splits, but should also feature checks for sequence similarity within the dataset splits, using mmseqs.
Steps include:
Create new config file option(s) for automatic dataset splitting
Add a step in the pipeline where the splitting is done
Run mmseqs and split afterwards
Provide feedback in the out.yml file and the logging about the splits
The text was updated successfully, but these errors were encountered:
In addition to providing the dataset splits via the input FASTA files, it would be a nice feature to allow automatic dataset splitting within the biotrainer pipeline. The dataset splitting should not be done based on pure random splits, but should also feature checks for sequence similarity within the dataset splits, using mmseqs.
Steps include:
The text was updated successfully, but these errors were encountered: