Annif 0.50
This release introduces a setting to use only a part of the input text for subject indexing: the new input_limit
project parameter truncates the input text to the given character number. This can improve the quality of the suggestions as the beginning of a long document typically includes an abstract and introduction. The default value for input_limit
is zero, which means that truncation is not performed.
Improvements include better handling of cached data in nn_ensemble training and optimization of memory usage in evaluation by using sparse matrices for suggested subjects. Many dependencies have been updated and a few minor issues fixed.
New features:
#446 Add a backend paratemer to limit input characters in suggest
#452 Apply the input_limit backend parameter to texts in train & learn
Improvements:
#441 Sparse subjects (credit @mo-fu)
#443/#444 Allow use of cached data after cancelled training of nn_ensemble backend
Maintenance:
#448 Upgrade dependencies
#445 Upgrade LMDB dependency from 0.98 to 1.0.0
#449 Resolve DeprecationWarning: change warn to warning
Bug fixes:
#447 Fix missing default params in pav and nn ensemble