Skip to content

Annif 1.3

Latest
Compare
Choose a tag to compare
@juhoinkinen juhoinkinen released this 10 Feb 09:41
· 1 commit to main since this release
v1.3.0
6291f8d

This release introduces a new EstNLTK analyzer, improves the performance of the MLLM backend and fixes minor bugs.

The key enhancement of this release is the addition of a new analyzer for lemmatization using EstNLTK, which supports the Estonian language. This analyzer needs to be installed separately, see the Optional features and dependencies in Wiki. Note that the indirect dependencies of EstNLTK are quite large, requiring around 500 MB of libraries.

Another improvement is the optimization of the ambiguity feature calculation in the MLLM algorithm. Previously, the calculation could be slow, especially when dealing with a large number of matches when using a large vocabulary such as GND. This optimization addresses the quadratic nature of the ambiguity calculation, and is expected to greatly reduce the processing time of some documents.

This release also includes maintenance updates and bug fixes. The file permissions issue, where Annif did not adhere to the umask setting for data files, has been resolved, thus easing Annif use in multiuser environments.

Supported Python versions:

  • 3.9, 3.10, 3.11, and 3.12

Backward compatibility:

  • The projects trained with Annif v1.2 remain working.

Enhancements
#818/#831 Add a new EstNLTK analyzer
#822/#825/#834 Optimize MLLM ambiguity calculation to resolve slow processing of specific documents. Thanks to @RietdorfC (DNB) for reporting the issue and testing the optimized code.
#820 Smarter initialization of optional analyzers

Maintenance
#833 Update dependencies for v1.3 release
#821/#830 Bump the github-actions versions

Bug Fixes
#828 Fix Docker image builds with Poetry 2.0
#832/#829 Ensure file permissions respect the umask setting