Releases: alexklibisz/elastiknn
Releases · alexklibisz/elastiknn
0.1.0-PRE52
- Bumped Elasticsearch version to 7.10.0.
0.1.0-PRE51
- No substantive changes. Just testing out new release setup.
0.1.0-PRE50
- Bumped Elasticsearch version to 7.9.3.
0.1.0-PRE49
- Fixed the function score query implementation. The first pass was kind of buggy for exact queries and totally wrong for approximate queries.
- Addressed a perplexing edge case that was causing an out-of-bounds exception in the MatchHashesAndScoreQuery.
0.1.0-PRE48
- Added support for Function Score Queries
See the Common Patterns section of the docs.
0.1.0-PRE47
- Improved the Python ElastiknnModel's handling of empty query responses (i.e. no results).
Previously it threw an exception. Now it will just not populate the ID and distance arrays for that particular query.
0.1.0-PRE46
- Upgraded to Elasticsearch version 7.9.2. No changes to the API.
It did require quite a bit of internal refactoring, mostly to the way vector types are implemented. - Indices should be backwards compatible, however if you indexed on an earlier version, I'd recommend re-indexing and
setting theindex.elastiknn
setting totrue
.
0.1.0-PRE45
- Adds an index-level setting:
index.elastiknn = true|false
, which defaults tofalse
. Setting this to true tells Elastiknn to use a non-default storage format for doc values fields. Specifically, Elastiknn will use the latest Lucene formats for all fields except doc values, which will use theLucene70DocValuesFormat
. Using this specific doc values format is necessary to disable compression that makes Elastiknn extremely slow when upgraded past Elasticsearch version 7.6.x. Without this format, it's basically impossible to upgrade beyond 7.6.x. The root cause is a change that was made between Lucene 8.4.x and 8.5.x, which introduces more aggressive compression on binary doc values. This compression saves space, but becomes an extreme bottleneck for Elastiknn (40-100x slower queries), since Elastiknn stores vectors as binary doc values. Hopefully the Lucene folks will make this compression optional in the future. Read more here: https://issues.apache.org/jira/browse/LUCENE-9378
0.1.0-PRE44
- Introduces a shorthand alternative format for dense and sparse vectors that makes it easier to work with ES-connectors that don't allow nested docs.
- Dense vectors can be represented as a simple array:
{ "vec": [0.1, 0.2, 0.3, ...] }
is equivalent to{ "vec": { "values": [0.1, 0.2, 0.3] }}
. - Sparse vectors can be represented as an array where the first element is the array of true indices, and the second is the number of total indices:
{"vec": [[1, 3, 5, ...], 100] }
is equivalent to{ "vec": { "true_indices": [1,3,5,...], "total_indices": 100 }}
- Dense vectors can be represented as a simple array:
- Added a logger warning when the approximate query matches fewer candidates than the specified number of candidates.
- Subtle modification to the DocIdSetIterator created by the MatchHashesAndScoreQuery to address issues 180 and 181.
The gist of issue 180 is that the binary doc values iterator used to access vectors would attempt to visit the same
document twice, and on the second visit the call to advanceExact would fail.
The gist of the change is that the docID was previously initialized to be the smallest candidate docID.
Initializing it to -1 seems to be the correct convention, and it makes that problem go away. - Renamed all exceptions explicitly thrown by Elastiknn to ElastiknnFooException, e.g. ElastiknnIllegalArgumentException.
This just makes it a bit more obvious where to look when debugging exceptions and errors.
0.1.0-PRE43
- No longer caching the mapping for the field being queried. Instead, using the internal mapper service to retrieve the mapping.