From e45dd59aa5fdb447c31228e59989a5d915c528ee Mon Sep 17 00:00:00 2001 From: Theo Nam Truong Date: Tue, 4 Apr 2023 10:49:07 -0600 Subject: [PATCH] Updated USER_GUIDE.md and Added 6 Guides (#162) Signed-off-by: Theo Truong --- USER_GUIDE.md | 48 ++------ guides/advanced_index_actions.md | 90 ++++++++++++++ guides/bulk.md | 141 ++++++++++++++++++++++ guides/document_lifecycle.md | 133 +++++++++++++++++++++ guides/index_lifecycle.md | 144 +++++++++++++++++++++++ guides/index_template.md | 178 ++++++++++++++++++++++++++++ guides/search.md | 196 +++++++++++++++++++++++++++++++ 7 files changed, 893 insertions(+), 37 deletions(-) create mode 100644 guides/advanced_index_actions.md create mode 100644 guides/bulk.md create mode 100644 guides/document_lifecycle.md create mode 100644 guides/index_lifecycle.md create mode 100644 guides/index_template.md create mode 100644 guides/search.md diff --git a/USER_GUIDE.md b/USER_GUIDE.md index 7a72d51d3..9a7895ca1 100644 --- a/USER_GUIDE.md +++ b/USER_GUIDE.md @@ -1,8 +1,7 @@ - [User Guide](#user-guide) - [Setup](#setup) - - [Sample code](#sample-code) - - [Basic Usage](#basic-usage) - - [Point in Time](#point-in-time) + - [Basic Usage](#basic-usage) + - [Guides by Topics](#guides-by-topics) - [Amazon OpenSearch Service](#amazon-opensearch-service) # User Guide @@ -25,16 +24,14 @@ Import the client: `require 'opensearch'` -## Sample code - -### Basic Usage +## Basic Usage ```ruby require 'opensearch' client = OpenSearch::Client.new( host: 'https://localhost:9200', user: 'admin', - password: 'admin' + password: 'admin', transport_options: { ssl: { verify: false } } # For testing only. Use certificate for validation. ) @@ -109,36 +106,13 @@ response = client.indices.delete( puts response ``` -### Point in Time -Refer to OpenSearch [documentation](https://opensearch.org/docs/latest/point-in-time-api/) for more information on point in time. -```ruby -require 'opensearch-ruby' -client = OpenSearch::Client.new({ host: 'localhost' }) -index = :movies -client.indices.create(index: 'movies') - -# CREATE 3 PITS -client.create_pit index: index, keep_alive: '1m' -client.create_pit index: index, keep_alive: '1m' -client.create_pit index: index, keep_alive: '1m' - -# GET ALL PITS -pits = client.get_all_pits -puts pits - -# DELETE FIRST PIT -client.delete_pit body: { pit_id: [pits.dig('pits', 0, 'pit_id')] } - -# ALL PITS SEGMENTS -puts client.cat.all_pit_segments - -# SEGMENTS FOR A SPECIFIC PIT -puts client.cat.pit_segments body: { pit_id: [pits.dig('pits', 1, 'pit_id')] } - - -# DELETE ALL PITS -puts client.delete_all_pits -``` +## Guides by Topics +- [Index Lifecycle](guides/index_lifecycle.md) +- [Document Lifecycle](guides/document_lifecycle.md) +- [Search](guides/search.md) +- [Bulk](guides/bulk.md) +- [Advanced Index Actions](guides/advanced_index_actions.md) +- [Index Templates](guides/index_template.md) ## Amazon OpenSearch Service diff --git a/guides/advanced_index_actions.md b/guides/advanced_index_actions.md new file mode 100644 index 000000000..98df7c2bb --- /dev/null +++ b/guides/advanced_index_actions.md @@ -0,0 +1,90 @@ +# Advanced Index Actions +In this guide, we will look at some advanced index actions that are not covered in the [Index Lifecycle](index_lifecycle.md) guide. + + +## Setup +Let's create a client instance, and an index named `movies`: +```ruby +require 'opensearch-ruby' +client = OpenSearch::Client.new( + host: 'https://admin:admin@localhost:9200', + transport_options: { ssl: { verify: false } }) +client.indices.create(index: :movies) +``` +## API Actions +### Clear index cache +You can clear the cache of an index or indices by using the `indices.clear_cache` API action. The following example clears the cache of the `movies` index: + +```ruby +client.indices.clear_cache(index: :movies) +``` + +By default, the `indices.clear_cache` API action clears all types of cache. To clear specific types of cache pass the the `query`, `fielddata`, or `request` parameter to the API action: + +```ruby +client.indices.clear_cache(index: :movies, query: true) +client.indices.clear_cache(index: :movies, fielddata: true, request: true) +``` + +### Flush index +Sometimes you might want to flush an index or indices to make sure that all data in the transaction log is persisted to the index. To flush an index or indices use the `indices.flush` API action. The following example flushes the `movies` index: + +```ruby +client.indices.flush(index: :movies) +``` + +### Refresh index +You can refresh an index or indices to make sure that all changes are available for search. To refresh an index or indices use the `indices.refresh` API action: + +```ruby +client.indices.refresh(index: :movies) +``` + +### Open/Close index +You can close an index to prevent read and write operations on the index. A closed index does not have to maintain certain data structures that an opened index require, reducing the memory and disk space required by the index. The following example closes and reopens the `movies` index: + +```ruby +client.indices.close(index: :movies) +client.indices.open(index: :movies) +``` +### Force merge index +You can force merge an index or indices to reduce the number of segments in the index. This can be useful if you have a large number of small segments in the index. Merging segments reduces the memory footprint of the index. Do note that this action is resource intensive and it is only recommended for read-only indices. The following example force merges the `movies` index: + +```ruby +client.indices.forcemerge(index: :movies) +``` + +### Clone index +You can clone an index to create a new index with the same mappings, data, and MOST of the settings. The source index must be in read-only state for cloning. The following example blocks write operations from `movies` index, clones the said index to create a new index named `movies_clone`, then re-enables write: + +```ruby +client.indices.add_block(index: :movies, block: :write) +client.indices.clone(index: :movies, target: :movies_clone) +client.indices.put_settings(index: :movies, body: { index: { blocks: { write: false } } }) +``` + +### Split index +You can split an index into another index with more primary shards. The source index must be in read-only state for splitting. The following example create the read-only `books` index with 30 routing shards and 5 shards (which is divisible by 30), splits index into `bigger_books` with 10 shards (which is also divisible by 30), then re-enables write: + +```ruby +client.indices.create( + index: :books, + body: { settings: { + index: { number_of_shards: 5, + number_of_routing_shards: 30, + blocks: { write: true } } } }) + +client.indices.split( + index: :books, + target: :bigger_books, + body: { settings: { index: { number_of_shards: 10 } } }) + +client.indices.put_settings(index: :books, body: { index: { blocks: { write: false } } }) +``` + +## Cleanup + +Let's delete all the indices we created in this guide: +```ruby +client.indices.delete(index: %i[movies books movies_clone bigger_books]) +``` diff --git a/guides/bulk.md b/guides/bulk.md new file mode 100644 index 000000000..d3383fb22 --- /dev/null +++ b/guides/bulk.md @@ -0,0 +1,141 @@ +# Bulk + +In this guide, you'll learn how to use the OpenSearch Ruby Client API to perform bulk operations. You'll learn how to index, update, and delete multiple documents in a single request. + +## Setup +First, create a client instance with the following code: + +```ruby +require 'opensearch-ruby' +client = OpenSearch::Client.new({ host: 'localhost' }) +``` + +Next, create an index named `movies` and another named `books` with the default settings: + +```ruby +movies = 'movies' +books = 'books' +client.indices.create(index: movies) unless client.indices.exists?(index: movies) +client.indices.create(index: books) unless client.indices.exists?(index: books) +``` + + +## Bulk API + +The `bulk` API action allows you to perform document operations in a single request. The body of the request is an array of objects that contains the bulk operations and the target documents to index, create, update, or delete. + +### Indexing multiple documents +The following code creates two documents in the `movies` index and one document in the `books` index: + +```ruby +client.bulk( + body: [ + { index: { _index: movies, _id: 1 } }, + { title: 'Beauty and the Beast', year: 1991 }, + { index: { _index: movies, _id: 2 } }, + { title: 'Beauty and the Beast - Live Action', year: 2017 }, + { index: { _index: books, _id: 1 } }, + { title: 'The Lion King', year: 1994 } + ] +) +``` +As you can see, each bulk operation is comprised of two objects. The first object contains the operation type and the target document's `_index` and `_id`. The second object contains the document's data. As a result, the body of the request above contains six objects for three index actions. + +Alternatively, the `bulk` method can accept an array of hashes where each hash represents a single operation. The following code is equivalent to the previous example: + +```ruby +client.bulk( + body: [ + { index: { _index: movies, _id: 1, data: { title: 'Beauty and the Beast', year: 1991 } } }, + { index: { _index: movies, _id: 2, data: { title: 'Beauty and the Beast - Live Action', year: 2017 } } }, + { index: { _index: books, _id: 1, data: { title: 'The Lion King', year: 1994 } } } + ] +) +``` + +We will use this format for the rest of the examples in this guide. + +### Creating multiple documents + +Similarly, instead of calling the `create` method for each document, you can use the `bulk` API to create multiple documents in a single request. The following code creates three documents in the `movies` index and one in the `books` index: + +```ruby +client.bulk( + index: movies, + body: [ + { create: { data: { title: 'Beauty and the Beast 2', year: 2030 } } }, + { create: { data: { title: 'Beauty and the Beast 3', year: 2031 } } }, + { create: { data: { title: 'Beauty and the Beast 4', year: 2049 } } }, + { create: { _index: books, data: { title: 'The Lion King 2', year: 1998 } } } + ] +) +``` +Note that we specified only the `_index` for the last document in the request body. This is because the `bulk` method accepts an `index` parameter that specifies the default `_index` for all bulk operations in the request body. Moreover, we omit the `_id` for each document and let OpenSearch generate them for us in this example, just like we can with the `create` method. + +### Updating multiple documents +```ruby +client.bulk( + index: movies, + body: [ + { update: { _id: 1, data: { doc: { year: 1992 } } } }, + { update: { _id: 2, data: { doc: { year: 2018 } } } } + ] +) +``` +Note that the updated data is specified in the `doc` field of the `data` object. + + +### Deleting multiple documents +```ruby +client.bulk( + index: movies, + body: [ + { delete: { _id: 1 } }, + { delete: { _id: 2 } } + ] +) +``` + +### Mix and match operations +You can mix and match the different operations in a single request. The following code creates two documents, updates one document, and deletes another document: + +```ruby +client.bulk( + index: movies, + body: [ + { create: { data: { title: 'Beauty and the Beast 5', year: 2050 } } }, + { create: { data: { title: 'Beauty and the Beast 6', year: 2051 } } }, + { update: { _id: 3, data: { doc: { year: 2052 } } } }, + { delete: { _id: 4 } } + ] +) +``` + +### Handling errors +The `bulk` API returns an array of responses for each operation in the request body. Each response contains a `status` field that indicates whether the operation was successful or not. If the operation was successful, the `status` field is set to a `2xx` code. Otherwise, the response contains an error message in the `error` field. + +The following code shows how to look for errors in the response: + +```ruby +response = client.bulk( + index: movies, + body: [ + { create: { _id: 1, data: { title: 'Beauty and the Beast', year: 1991 } } }, + { create: { _id: 2, data: { title: 'Beauty and the Beast 2', year: 2030 } } }, + { create: { _id: 1, data: { title: 'Beauty and the Beast 3', year: 2031 } } }, # document already exists error + { create: { _id: 2, data: { title: 'Beauty and the Beast 4', year: 2049 } } } # document already exists error + ] +) + +response['items'].each do |item| + next if item.dig('create', 'status').between?(200, 299) + puts item.dig('create', 'error', 'reason') +end +``` + +## Cleanup +To clean up the resources created in this guide, delete the `movies` and `books` indices: + +```ruby +client.indices.delete(index: [movies, books]) +``` diff --git a/guides/document_lifecycle.md b/guides/document_lifecycle.md new file mode 100644 index 000000000..64f6c19d4 --- /dev/null +++ b/guides/document_lifecycle.md @@ -0,0 +1,133 @@ +# Document Lifecycle +This guide covers OpenSearch Ruby Client API actions for Document Lifecycle. You'll learn how to create, read, update, and delete documents in your OpenSearch cluster. Whether you're new to OpenSearch or an experienced user, this guide provides the information you need to manage your document lifecycle effectively. + +## Setup +Assuming you have OpenSearch running locally on port 9200, you can create a client instance +with the following code: +```ruby +require 'opensearch-ruby' +client = OpenSearch::Client.new({ host: 'localhost' }) +``` +Next, create an index named `movies` with the default settings: +```ruby +index = 'movies' +client.indices.create(index: index) unless client.indices.exists?(index: index) +``` + +## Document API Actions +### Create a new document with specified ID +To create a new document, use the `create` or `index` API action. The following code creates two new documents with IDs of `1` and `2`: +```ruby +client.create(index: index, id: 1, body: { title: 'Beauty and the Beast', year: 1991 }) +client.create(index: index, id: 2, body: { title: 'Beauty and the Beast - Live Action', year: 2017 }) +``` +Note that the `create` action is NOT idempotent. If you try to create a document with an ID that already exists, the request will fail: + +```ruby +begin + client.create(index: index, id: 1, body: { title: 'Just Another Movie' }) +rescue StandardError => e + puts e.message +end +``` + +The `index` action, on the other hand, is idempotent. If you try to index a document with an existing ID, the request will succeed and overwrite the existing document. Note that no new document will be created in this case. You can think of the `index` action as an upsert: + +```ruby +client.index(index: index, id: 2, body: { title: 'Updated Title' }) +client.index(index: index, id: 2, body: { title: 'The Lion King', year: 1994 }) +``` + +### Create a new document with auto-generated ID +You can also create a new document with an auto-generated ID by omitting the `id` parameter. The following code creates documents with an auto-generated IDs in the `movies` index: +```ruby +puts client.create(index: index, body: { title: 'The Lion King 2', year: 1998 }) +# OR client.index(index: index, body: { title: 'The Lion King 2', year: 1998 }) +``` +In this case, the ID of the created document in the `result` field of the response body: +```ruby +{ + "_index" => "movies", + "_type" => "_doc", + "_id" => "1", + "_version" => 1, + "result" => "created", + "_shards" => { + "total" => 2, + "successful" => 1, + "failed" => 0 + }, + "_seq_no" => 0, + "_primary_term" => 1 +} +``` + +### Get a document +To get a document, use the `get` API action. The following code gets the document with ID `1` from the `movies` index: +```ruby +puts client.get(index: index, id: 1)['_source'] +# OUTPUT: {"title"=>"Beauty and the Beast", "year"=>1991} +``` +You can also use `_source_include` and `_source_exclude` parameters to specify which fields to include or exclude in the response: +```ruby +puts client.get(index: index, id: 1, _source_includes: ['title'])['_source'] +# OUTPUT: {"title"=>"Beauty and the Beast"} +puts client.get(index: index, id: 1, _source_excludes: ['title'])['_source'] +# OUTPUT: {"year"=>1991} +``` + +### Get multiple documents +To get multiple documents, use the `mget` API action: +```ruby +puts client.mget(index: index, body: { docs: [{ _id: 1 }, { _id: 2 }] })['docs'].map { |doc| doc['_source'] } +``` + +### Check if a document exists +To check if a document exists, use the `exists` API action. The following code checks if the document with ID `1` exists in the `movies` index: +```ruby +puts client.exists(index: index, id: 1) +``` + +### Update a document +To update a document, use the `update` API action. The following code updates the `year` field of the document with ID `1` in the `movies` index: +```ruby +client.update(index: index, id: 1, body: { doc: { year: 1995 } }) +``` +Alternatively, you can use the `script` parameter to update a document using a script. The following code increments the `year` field of the of document with ID `1` by 5 using painless script, the default scripting language in OpenSearch: +```ruby +client.update(index: index, id: 1, body: { script: { source: 'ctx._source.year += 5' } }) +``` +Note that while both `update` and `index` actions perform updates, they are not the same. The `update` action is a partial update, while the `index` action is a full update. The `update` action only updates the fields that are specified in the request body, while the `index` action overwrites the entire document with the new document. + +### Update multiple documents by query +To update documents that match a query, use the `update_by_query` API action. The following code decreases the `year` field of all documents with `year` greater than 2023: +```ruby +client.update_by_query(index: index, body: { + script: { source: 'ctx._source.year -= 1' }, + query: { range: { year: { gt: 2023 } } } +}) +``` + +### Delete a document +To delete a document, use the `delete` API action. The following code deletes the document with ID `1`: +```ruby +client.delete(index: index, id: 1) +``` +By default, the `delete` action is not idempotent. If you try to delete a document that does not exist, or delete the same document twice, you will run into Not Found (404) error. You can make the `delete` action idempotent by setting the `ignore` parameter to `404`: +```ruby +client.delete(index: index, id: 1, ignore: 404) +``` + +### Delete multiple documents by query +To delete documents that match a query, use the `delete_by_query` API action. The following code deletes all documents with `year` greater than 2023: +```ruby +client.delete_by_query(index: index, body: { + query: { range: { year: { gt: 2023 } } } +}) +``` + +## Cleanup +To clean up the resources created in this guide, delete the `movies` index: +```ruby +client.indices.delete(index: index) +``` diff --git a/guides/index_lifecycle.md b/guides/index_lifecycle.md new file mode 100644 index 000000000..3cb3adaba --- /dev/null +++ b/guides/index_lifecycle.md @@ -0,0 +1,144 @@ +# Index Lifecycle +This guide covers OpenSearch Ruby Client API actions for Index Lifecycle. You'll learn how to create, read, update, and delete indices in your OpenSearch cluster. We will also leverage index templates to create default settings and mappings for indices of certain patterns. + +## Setup + +In this guide, we will need an OpenSearch cluster with more than one node. Let's use the sample [docker-compose.yml](https://opensearch.org/samples/docker-compose.yml) to start a cluster with two nodes. The cluster's API will be available at `localhost:9200` with basic authentication enabled with default username and password of `admin:admin`. + +To start the cluster, run the following command: + +```bash +cd /path/to/docker-compose.yml +docker-compose up -d +``` + +Let's create a client instance to access this cluster: + +```ruby +require 'opensearch-ruby' + +client = OpenSearch::Client.new( + host: 'https://admin:admin@localhost:9200', + transport_options: { ssl: { verify: false } }) + +puts client.info # Check server info and make sure the client is connected +``` + +## Index API Actions + +### Create a new index +You can quickly create an index with default settings and mappings by using the `indices.create` API action. The following example creates an index named `paintings` with default settings and mappings: + +```ruby +client.indices.create(index: :paintings) +``` +To specify settings and mappings, you can pass them as the `body` of the request. The following example creates an index named `movies` with custom settings and mappings: + +```ruby +client.indices.create( + index: :movies, + body: { + settings: { + index: { + number_of_shards: 2, + number_of_replicas: 1 + } + }, + mappings: { + properties: { + title: { type: 'text' }, + year: { type: 'integer' } + } + } + } +) +``` +When you create a new document for an index, OpenSearch will automatically create the index if it doesn't exist: + +```ruby +puts client.indices.exists?(index: :burner) # => false +client.create(index: :burner, body: { lorem: 'ipsum' }) +puts client.indices.exists?(index: :burner) # => true +``` + + +### Update an Index +You can update an index's settings and mappings by using the `indices.put_settings` and `indices.put_mapping` API actions. + +The following example updates the `movies` index's number of replicas to `0`: + +```ruby +client.indices.put_settings( + index: :movies, + body: { + index: { + number_of_replicas: 0 + } + } +) +``` +The following example updates the `movies` index's mappings to add a new field named `director`: + +```ruby +client.indices.put_mapping( + index: :movies, + body: { + properties: { + director: { type: 'text' } + } + } +) +``` + +### Get Metadata for an Index +Let's check if the index's settings and mappings have been updated by using the `indices.get` API action: + +```ruby +puts client.indices.get(index: :movies) +``` +The response body contains the index's settings and mappings: + +```ruby +{ + "movies" => { + "aliases" => {}, + "mappings" => { + "properties" => { + "title" => { "type" => "text" }, + "year" => { "type" => "integer" }, + "director" => { "type" => "text" } + } + }, + "settings" => { + "index" => { + "creation_date" => "1680297372024", + "number_of_shards" => "2", + "number_of_replicas" => "0", + "uuid" => "FEDWXgmhSLyrCqWa8F_aiA", + "version" => { "created" => "136277827" }, + "provided_name" => "movies" + } + } + } +} +``` +### Delete an Index +Let's delete the `movies` index by using the `indices.delete` API action: + +```ruby +client.indices.delete(index: :movies) +``` +We can also delete multiple indices at once: + +```ruby +client.indices.delete(index: [:movies, :paintings, :burner], ignore: 404) +``` +Notice that we are passing `ignore: 404` to the request. This tells the client to ignore the `404` error if the index doesn't exist for deletion. Without it, the above `delete` request will throw an error because the `movies` index has already been deleted in the previous example. + +## Cleanup + +All resources created in this guide are automatically deleted when the cluster is stopped. You can stop the cluster by running the following command: + +```bash +docker-compose down +``` diff --git a/guides/index_template.md b/guides/index_template.md new file mode 100644 index 000000000..5e76148fb --- /dev/null +++ b/guides/index_template.md @@ -0,0 +1,178 @@ +# Index Template +Index templates are a convenient way to define settings, mappings, and aliases for one or more indices when they are created. In this guide, you'll learn how to create an index template and apply it to an index. + +## Setup + +Assuming you have OpenSearch running locally on port 9200, you can create a client instance +with the following code: +```ruby +require 'opensearch-ruby' +client = OpenSearch::Client.new({ host: 'localhost' }) +``` + +## Index Template API Actions + +### Create an Index Template +You can create an index template to define default settings and mappings for indices of certain patterns. The following example creates an index template named `books` with default settings and mappings for indices of the `books-*` pattern: + +```ruby +client.indices.put_index_template( + name: :books, + body: { + index_patterns: ['books-*'], + template: { + settings: { + index: { + number_of_shards: 3, + number_of_replicas: 0 + } + }, + mappings: { + properties: { + title: { type: 'text' }, + author: { type: 'text' }, + published_on: { type: 'date' }, + pages: { type: 'integer' } + } + } + } + } +) +``` + +Now, when you create an index that matches the `books-*` pattern, OpenSearch will automatically apply the template's settings and mappings to the index. +Let's create an index named `books-nonfiction` and verify that its settings and mappings match those of the template: + +```ruby +client.indices.create(index: 'books-nonfiction') +puts client.indices.get(index: 'books-nonfiction') +``` + +### Multiple Index Templates +If multiple index templates match the index's name, OpenSearch will apply the template with the highest priority. The following example creates two index templates named `books-*` and `books-fiction-*` with different settings: + +```ruby +client.indices.put_index_template( + name: 'books', + body: { + index_patterns: ['books-*'], + priority: 0, # default priority + template: { + settings: { + index: { + number_of_shards: 3, + number_of_replicas: 0 + } + } + } + } +) + +client.indices.put_index_template( + name: 'books-fiction', + body: { + index_patterns: ['books-fiction-*'], + priority: 1, # higher priority than the `books` template + template: { + settings: { + index: { + number_of_shards: 1, + number_of_replicas: 1 + } + } + } + } +) +``` + +When we create an index named `books-fiction-romance`, OpenSearch will apply the `books-fiction-*` template's settings to the index: + +```ruby +client.indices.create(index: 'books-fiction-romance') +puts client.indices.get(index: 'books-fiction-romance') +``` + +### Composable Index Templates +Composable index templates are a new type of index template that allow you to define multiple component templates and compose them into a final template. The following example creates a component template named `books_mappings` with default mappings for indices of the `books-*` and `books-fiction-*` patterns: + +```ruby +client.cluster.put_component_template( + name: 'books_mappings', + body: { + template: { + mappings: { + properties: { + title: { type: 'text' }, + author: { type: 'text' }, + published_on: { type: 'date' }, + pages: { type: 'integer' } + } + } + } + } +) + +client.indices.put_index_template( + name: 'books', + body: { + index_patterns: ['books-*'], + composed_of: ['books_mappings'], # use the `books_mappings` component template + priority: 0, + template: { + settings: { + index: { + number_of_shards: 3, + number_of_replicas: 0 + } + } + } + } +) + +client.indices.put_index_template( + name: 'books', + body: { + index_patterns: ['books-*'], + composed_of: ['books_mappings'], # use the `books_mappings` component template + priority: 1, + template: { + settings: { + index: { + number_of_shards: 1, + number_of_replicas: 1 + } + } + } + } +) +``` + +When we create an index named `books-fiction-horror`, OpenSearch will apply the `books-fiction-*` template's settings, and `books_mappings` template mappings to the index: + +```ruby +client.indices.create(index: 'books-fiction-horror') +puts client.indices.get(index: 'books-fiction-horror') +``` + +### Get an Index Template +You can get an index template with the `get_index_template` API action: + +```ruby +puts client.indices.get_index_template(name: 'books') +``` + +### Delete an Index Template +You can delete an index template with the `delete_template` API action: + +```ruby +client.indices.delete_index_template(name: 'books') +``` + +## Cleanup +Let's delete all resources created in this guide: + +```ruby +client.indices.delete(index: 'books-*') +client.indices.delete_index_template(name: 'books-fiction') +client.cluster.delete_component_template(name: 'books_mappings') +``` diff --git a/guides/search.md b/guides/search.md new file mode 100644 index 000000000..2d305a69e --- /dev/null +++ b/guides/search.md @@ -0,0 +1,196 @@ +# Search +OpenSearch provides a powerful search API that allows you to search for documents in an index. The search API supports a number of parameters that allow you to customize the search operation. In this guide, we will explore the search API and its parameters. + +# Setup +Let's start by creating an index and adding some documents to it: + +```ruby +require 'opensearch-ruby' +client = OpenSearch::Client.new({ host: 'localhost' }) +client.indices.create(index: 'movies') + + + +10.times do |i| + client.index( + index: 'movies', + id: i, + body: { + title: "The Dark Knight #{i}", + director: 'Christopher Nolan', + year: 2008 + i + } + ) +end + +client.index( + index: 'movies', + body: { + title: 'The Godfather', + director: 'Francis Ford Coppola', + year: 1972 + } +) +client.index( + index: 'movies', + body: { + title: 'The Shawshank Redemption', + director: 'Frank Darabont', + year: 1994 + } +) + +client.indices.refresh(index: 'movies') # refresh the index to make the documents searchable +``` + +## Search API + +### Basic Search + +The search API allows you to search for documents in an index. The following example searches for ALL documents in the `movies` index: + +```ruby +puts client.search(index: 'movies').dig('hits', 'count') +``` + +You can also search for documents that match a specific query. The following example searches for documents that match the query `dark knight`: + +```ruby +puts client.search( + index: 'movies', + body: { + query: { + match: { + title: 'dark knight' + } + } + } +).dig('hits', 'hits') +``` + +OpenSearch query DSL allows you to specify complex queries. Check out the [OpenSearch query DSL documentation](https://opensearch.org/docs/latest/query-dsl/) for more information. + +### Basic Pagination + +The search API allows you to paginate through the search results. The following example searches for documents that match the query `dark knight`, sorted by `year` in ascending order, and returns the first 2 results after skipping the first 5 results: + +```ruby +search_body = { + query: { + match: { + title: 'dark knight' + } + }, + sort: [ + { + year: { + order: 'asc' + } + } + ] +} + +puts client.search( + index: 'movies', + size: 2, + from: 5, + body: search_body +).dig('hits', 'hits') +``` + +With sorting, you can also use the `search_after` parameter to paginate through the search results. Let's say you have already displayed the first page of results, and you want to display the next page. You can use the `search_after` parameter to paginate through the search results. The following example will demonstrate how to get the first 3 pages of results using the search query of the previous example: + +```ruby +page_1 = client.search( + index: 'movies', + size: 2, + body: search_body +).dig('hits', 'hits') + +page_2 = client.search( + index: 'movies', + size: 2, + body: search_body.merge(search_after: page_1.last['sort']) +).dig('hits', 'hits') + +page_3 = client.search( + index: 'movies', + size: 2, + body: search_body.merge(search_after: page_2.last['sort']) +).dig('hits', 'hits') +``` + +### Pagination with scroll + +When retrieving large amounts of non-real-time data, you can use the `scroll` parameter to paginate through the search results. + +```ruby +page_1 = client.search( + index: 'movies', + scroll: '1m', + size: 2, + body: search_body +) + +page_2 = client.scroll( + scroll_id: page_1['_scroll_id'], + scroll: '1m' +) + +page_3 = client.scroll( + scroll_id: page_2['_scroll_id'], + scroll: '1m' +) +``` + +### Pagination with Point in Time + +The scroll example above has one weakness: if the index is updated while you are scrolling through the results, they will be paginated inconsistently. To avoid this, you should use the "Point in Time" feature. The following example demonstrates how to use the `point_in_time` and `pit_id` parameters to paginate through the search results: + +```ruby +# create a point in time +pit = client.create_pit( + index: 'movies', + keep_alive: '1m' +) + +# Include pit info in the search body +pit_search_body = search_body.merge( + pit: { + id: pit['pit_id'], + keep_alive: '1m' + }) + +# Get the first 3 pages of results +page_1 = client.search( + size: 2, + body: pit_search_body +).dig('hits', 'hits') + +page_2 = client.search( + size: 2, + body: pit_search_body.merge( + search_after: page_1.last.dig('sort')) +).dig('hits', 'hits') + +page_3 = client.search( + size: 2, + body: pit_search_body.merge( + search_after: page_2.last.dig('sort')) +).dig('hits', 'hits') + +# Print out the titles of the first 3 pages of results +puts page_1.map { |hit| hit.dig('_source', 'title') } +puts page_2.map { |hit| hit.dig('_source', 'title') } +puts page_3.map { |hit| hit.dig('_source', 'title') } + +# delete the point in time +client.delete_pit(body: { pit_id: pit['pit_id'] }) +``` +Note that a point-in-time is associated with an index or a set of index. So, when performing a search with a point-in-time, you DO NOT specify the index in the search. + +## Cleanup + +```ruby +client.indices.delete(index: 'movies') +```