Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

improve document docs #2359

Merged
merged 1 commit into from
Apr 22, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions src/functional_test.rs
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@

use crate::indexer::index_writer::MEMORY_BUDGET_NUM_BYTES_MIN;
use crate::schema::*;
#[allow(deprecated)]
use crate::{doc, schema, Index, IndexSettings, IndexSortByField, IndexWriter, Order, Searcher};

fn check_index_content(searcher: &Searcher, vals: &[u64]) -> crate::Result<()> {
Expand Down Expand Up @@ -80,9 +81,9 @@

let mut index_builder = Index::builder().schema(schema);
index_builder = index_builder.settings(IndexSettings {
sort_by_field: Some(IndexSortByField {

Check warning on line 84 in src/functional_test.rs

View workflow job for this annotation

GitHub Actions / clippy

use of deprecated struct `index::index_meta::IndexSortByField`: We plan to remove index sorting in `0.23`. If you need index sorting, please comment on the related issue https://github.com/quickwit-oss/tantivy/issues/2352 and explain your use case.

warning: use of deprecated struct `index::index_meta::IndexSortByField`: We plan to remove index sorting in `0.23`. If you need index sorting, please comment on the related issue https://github.com/quickwit-oss/tantivy/issues/2352 and explain your use case. --> src/functional_test.rs:84:29 | 84 | sort_by_field: Some(IndexSortByField { | ^^^^^^^^^^^^^^^^ | = note: `#[warn(deprecated)]` on by default
field: "id".to_string(),

Check warning on line 85 in src/functional_test.rs

View workflow job for this annotation

GitHub Actions / clippy

use of deprecated field `index::index_meta::IndexSortByField::field`: We plan to remove index sorting in `0.23`. If you need index sorting, please comment on the related issue https://github.com/quickwit-oss/tantivy/issues/2352 and explain your use case.

warning: use of deprecated field `index::index_meta::IndexSortByField::field`: We plan to remove index sorting in `0.23`. If you need index sorting, please comment on the related issue https://github.com/quickwit-oss/tantivy/issues/2352 and explain your use case. --> src/functional_test.rs:85:13 | 85 | field: "id".to_string(), | ^^^^^^^^^^^^^^^^^^^^^^^
order: Order::Desc,

Check warning on line 86 in src/functional_test.rs

View workflow job for this annotation

GitHub Actions / clippy

use of deprecated field `index::index_meta::IndexSortByField::order`: We plan to remove index sorting in `0.23`. If you need index sorting, please comment on the related issue https://github.com/quickwit-oss/tantivy/issues/2352 and explain your use case.

warning: use of deprecated field `index::index_meta::IndexSortByField::order`: We plan to remove index sorting in `0.23`. If you need index sorting, please comment on the related issue https://github.com/quickwit-oss/tantivy/issues/2352 and explain your use case. --> src/functional_test.rs:86:13 | 86 | order: Order::Desc, | ^^^^^^^^^^^^^^^^^^
}),
..Default::default()
});
Expand Down
26 changes: 15 additions & 11 deletions src/schema/document/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -5,22 +5,24 @@
//! - [Value] which provides tantivy with a way to access the document's values in a common way
//! without performing any additional allocations.
//! - [DocumentDeserialize] which implements the necessary code to deserialize the document from the
//! doc store.
//! doc store. If you are fine with fetching [TantivyDocument] from the doc store, you can skip
//! implementing this trait for your type.
//!
//! Tantivy provides a few out-of-box implementations of these core traits to provide
//! some simple usage if you don't want to implement these traits on a custom type yourself.
//!
//! # Out-of-box document implementations
//! - [Document] the old document type used by Tantivy before the trait based approach was
//! - [TantivyDocument] the old document type used by Tantivy before the trait based approach was
//! implemented. This type is still valid and provides all of the original behaviour you might
//! expect.
//! - `BTreeMap<Field, Value>` a mapping of field_ids to their relevant schema value using a
//! - `BTreeMap<Field, OwnedValue>` a mapping of field_ids to their relevant schema value using a
//! BTreeMap.
//! - `HashMap<Field, Value>` a mapping of field_ids to their relevant schema value using a HashMap.
//! - `HashMap<Field, OwnedValue>` a mapping of field_ids to their relevant schema value using a
//! HashMap.
Comment on lines -17 to +21
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be helpful to explain what these bring to the table, i.e. how the three built-in options compare, e.g. BTreeMap provides associative access but ordered where HashMap has faster associative access, but neither supports multiple values per field if I understand this correctly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, not sure why we have that actually. Seems too niche

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wrote the original doc originally IIRC, but the motive for the impl was because a lot of people have situations where they have single-value fields and just want to deserialize and serialize the data to and from JSON as is without needing to convert the by-default multi-value fields into single-value fields. So this was a utility for that.

//!
//! # Implementing your custom documents
//! Often in larger projects or higher performance applications you want to avoid the extra overhead
//! of converting your own types to the Tantivy [Document] type, this can often save you a
//! of converting your own types to the [TantivyDocument] type, this can often save you a
//! significant amount of time when indexing by avoiding the additional allocations.
//!
//! ### Important Note
Expand All @@ -46,6 +48,7 @@
//!
//! impl Document for MyCustomDocument {
//! // The value type produced by the `iter_fields_and_values` iterator.
//! // tantivy already implements the Value trait for serde_json::Value.
//! type Value<'a> = &'a serde_json::Value;
//! // The iterator which is produced by `iter_fields_and_values`.
//! // Often this is a simple new-type wrapper unless you like super long generics.
Expand Down Expand Up @@ -94,20 +97,21 @@
//! implementation for.
//!
//! ## Implementing custom values
//! Internally, Tantivy only works with `ReferenceValue` which is an enum that tries to borrow
//! as much data as it can, in order to allow documents to return custom types, they must implement
//! the `Value` trait which provides a way for Tantivy to get a `ReferenceValue` that it can then
//! In order to allow documents to return custom types, they must implement
//! the [Value] trait which provides a way for Tantivy to get a `ReferenceValue` that it can then
//! index and store.
//! Internally, Tantivy only works with `ReferenceValue` which is an enum that tries to borrow
//! as much data as it can
//!
//! Values can just as easily be customised as documents by implementing the `Value` trait.
//!
//! The implementor of this type should not own the data it's returning, instead it should just
//! hold references of the data held by the parent [Document] which can then be passed
//! on to the [ReferenceValue].
//!
//! This is why `Value` is implemented for `&'a serde_json::Value` and `&'a
//! tantivy::schema::Value` but not for their owned counterparts, as we cannot satisfy the lifetime
//! bounds necessary when indexing the documents.
//! This is why [Value] is implemented for `&'a serde_json::Value` and
//! [&'a tantivy::schema::document::OwnedValue](OwnedValue) but not for their owned counterparts, as
//! we cannot satisfy the lifetime bounds necessary when indexing the documents.
//!
//! ### A note about returning values
//! The custom value type does not have to be the type stored by the document, instead the
Expand Down
Loading