Skip to content

Commit

Permalink
Add 6.0 parameters to documentation
Browse files Browse the repository at this point in the history
  • Loading branch information
davidmezzetti committed Aug 10, 2023
1 parent c85c907 commit 37d39ba
Show file tree
Hide file tree
Showing 6 changed files with 61 additions and 11 deletions.
29 changes: 25 additions & 4 deletions docs/embeddings/configuration/general.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,16 +2,37 @@

General configuration options that don't fit elsewhere.

## format
## keyword
```yaml
format: pickle|json
keyword: boolean
```
Sets the configuration storage format. Defaults to `pickle`.
Enables sparse keyword indexing for this embeddings.
## hybrid
```yaml
hybrid: boolean
```
Enables hybrid (sparse + dense) indexing for this embeddings.
## indexes
```yaml
indexes: dict
```
Key value pairs defining subindexes for this embeddings. Each key is the index name and the value is the full configuration. This configuration can use any of the available configurations in a standard embeddings instance.
## autoid
```yaml
format: int|uuid function
```
Sets the auto id generation method. When this is not set, an autogenerated numeric sequence is used. This also supports [UUID generation functions](https://docs.python.org/3/library/uuid.html#uuid.uuid1). For example, setting this value to `uuid4` will generate random UUIDs. Setting this to `uuid5` will generate deterministic UUIDs for each input data row.
Sets the auto id generation method. When this is not set, an autogenerated numeric sequence is used. This also supports [UUID generation functions](https://docs.python.org/3/library/uuid.html#uuid.uuid1). For example, setting this value to `uuid4` will generate random UUIDs. Setting this to `uuid5` will generate deterministic UUIDs for each input data row.

## format
```yaml
format: pickle|json
```

Sets the configuration storage format. Defaults to `pickle`.
4 changes: 4 additions & 0 deletions docs/embeddings/configuration/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -42,6 +42,10 @@ General configuration that doesn't fit elsewhere.

An accomplying graph index can be created with an embeddings database. This enables topic modeling, path traversal and more. [NetworkX](https://github.com/networkx/networkx) is the default graph index.

## [Scoring](./scoring)

Sparse keyword indexing and word vectors term weighting.

## [Vectors](./vectors)

Vector search is enabled by converting text and other binary data into embeddings vectors. These vectors are then stored in an ANN index. The vector model is optional and a default model is used when not provided.
30 changes: 30 additions & 0 deletions docs/embeddings/configuration/scoring.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
# Scoring

An embeddings instance can optionally have an associated scoring instance. This scoring instance can serve two purposes, depending on the settings.

One use case is building sparse/keyword indexes. This occurs when the `terms` parameter is set to `True`.

The other use case is with word vector term weighting. This feature has been available since the initial version but isn't quite as common anymore.

The following covers the available options

## method
```yaml
method: bm25|tfidf|sif
```
Sets the scoring method.
## terms
```yaml
terms: boolean
```
Enables term frequency sparse arrays for a scoring instance. This is the backend for sparse keyword indexes.
## normalize
```yaml
normalize: boolean
```
Enables normalized scoring (ranging from 0 to 1). When enabled, statistics from the index will be used to calculate normalized scores.
7 changes: 0 additions & 7 deletions docs/embeddings/configuration/vectors.md
Original file line number Diff line number Diff line change
Expand Up @@ -39,13 +39,6 @@ storevectors: boolean

Enables copying of a vectors model set in path into the embeddings models output directory on save. This option enables a fully encapsulated index with no external file dependencies.

#### scoring
```yaml
scoring: bm25|tfidf|sif
```

A scoring model builds weighted averages of word vectors for a given sentence. Supports BM25, TF-IDF and SIF (smooth inverse frequency) methods. If a scoring method is not provided, mean sentence embeddings are built.

#### pca
```yaml
pca: int
Expand Down
1 change: 1 addition & 0 deletions docs/examples.md
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,7 @@ Build semantic/similarity/vector/neural search applications.
| [Topic Modeling with BM25](https://github.com/neuml/txtai/blob/master/examples/39_Classic_Topic_Modeling_with_BM25.ipynb) | Topic modeling backed by a BM25 index | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/neuml/txtai/blob/master/examples/39_Classic_Topic_Modeling_with_BM25.ipynb) |
| [Embeddings in the Cloud](https://github.com/neuml/txtai/blob/master/examples/43_Embeddings_in_the_Cloud.ipynb) | Load and use an embeddings index from the Hugging Face Hub | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/neuml/txtai/blob/master/examples/43_Embeddings_in_the_Cloud.ipynb) |
| [Customize your own embeddings database](https://github.com/neuml/txtai/blob/master/examples/45_Customize_your_own_embeddings_database.ipynb) | Ways to combine vector indexes with relational databases | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/neuml/txtai/blob/master/examples/45_Customize_your_own_embeddings_database.ipynb) |
| [What's new in txtai 6.0](https://github.com/neuml/txtai/blob/master/examples/46_Whats_new_in_txtai_6_0.ipynb) | Sparse, hybrid and subindexes for embeddings, LLM improvements | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/neuml/txtai/blob/master/examples/46_Whats_new_in_txtai_6_0.ipynb) |

## LLM

Expand Down
1 change: 1 addition & 0 deletions mkdocs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -64,6 +64,7 @@ nav:
- Database: embeddings/configuration/database.md
- General: embeddings/configuration/general.md
- Graph: embeddings/configuration/graph.md
- Scoring: embeddings/configuration/scoring.md
- Vectors: embeddings/configuration/vectors.md
- Index Guide: embeddings/indexing.md
- Methods: embeddings/methods.md
Expand Down

0 comments on commit 37d39ba

Please sign in to comment.