Skip to content

Annif 0.50

Compare
Choose a tag to compare
@juhoinkinen juhoinkinen released this 07 Dec 11:19

This release introduces a setting to use only a part of the input text for subject indexing: the new input_limit project parameter truncates the input text to the given character number. This can improve the quality of the suggestions as the beginning of a long document typically includes an abstract and introduction. The default value for input_limit is zero, which means that truncation is not performed.

Improvements include better handling of cached data in nn_ensemble training and optimization of memory usage in evaluation by using sparse matrices for suggested subjects. Many dependencies have been updated and a few minor issues fixed.

New features:
#446 Add a backend paratemer to limit input characters in suggest
#452 Apply the input_limit backend parameter to texts in train & learn

Improvements:
#441 Sparse subjects (credit @mo-fu)
#443/#444 Allow use of cached data after cancelled training of nn_ensemble backend

Maintenance:
#448 Upgrade dependencies
#445 Upgrade LMDB dependency from 0.98 to 1.0.0
#449 Resolve DeprecationWarning: change warn to warning

Bug fixes:
#447 Fix missing default params in pav and nn ensemble