Annif 0.52
This release includes a new MLLM backend which is a Python implementation of the Maui-like Lexical Matching algorithm. It was inspired by the Maui algorithm (by Alyona Medelyan), but not a direct reimplementation. It is meant for long full-text documents and like Maui, it needs to be trained with a relatively small number (hundreds or thousands) of manually indexed documents so that the algorithm can choose the right mix of heuristics that achieves best results on a particular document collection. See the MLLM Wiki page for more information.
New features include the possibility to configure two project parameters:
min_token_length
can be set in the analyzer parameters; e.g. setting the value to 2 allows the word "UK" to pass to a backend, while with the default value (3) the word is filtered out by the analyzerlr
can be set in the neural-network ensemble project configuration to define the learning rate.
The STWFSA backend has been updated to use a newer version of the stwfsapy library. Old STWFSA models are not compatible with the new version so any STWFSA projects must be retrained. The release includes also several minor improvements and bug fixes.
New features:
#462 New lexical backend MLLM
#456/#468 Allow configuration of token min length (credit: mo-fu)
#475 Allow configuration of nn ensemble learning rate (credit: mo-fu)
Improvements:
#478/#479 Update stwfsa to 0.2.* (credit: mo-fu)
#472 Cleanup suggestion tests
#480 Optimize check for deprecated subject IDs using a set
Maintenance:
#474 Use GitHub Actions as CI service
Bug fixes:
#470/#471 Make sure suggestion scores are in the range 0.0-1.0
#477 Optimize the optimize command
#481 Backwards compatibility fix for the token_min_length setting
#482 MLLM fix: don't include use_hidden_labels in hyperopt, it won't have any effect