Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Star Tree Search changes related to new Aggregations supported #9163

Merged
merged 22 commits into from
Feb 11, 2025
Merged
Show file tree
Hide file tree
Changes from 7 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
16 changes: 13 additions & 3 deletions _field-types/supported-field-types/star-tree.md
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,8 @@ PUT logs
"settings": {
"index.number_of_shards": 1,
"index.number_of_replicas": 0,
"index.composite_index": true
"index.composite_index": true,
"index.append_only.enabled": true
},
"mappings": {
"composite": {
Expand All @@ -54,6 +55,9 @@ PUT logs
},
{
"name": "port"
},
{
"name": "method"
}
],
"metrics": [
Expand Down Expand Up @@ -89,6 +93,9 @@ PUT logs
"request_size": {
"type": "integer"
},
"method" : {
"type": "keyword"
},
"latency": {
"type": "scaled_float",
"scaling_factor": 10
Expand Down Expand Up @@ -118,9 +125,12 @@ When using the `ordered_dimesions` parameter, follow these best practices:

- The order of dimensions matters. You can define the dimensions ordered from the highest cardinality to the lowest cardinality for efficient storage and query pruning.
- Avoid using high-cardinality fields as dimensions. High-cardinality fields adversely affect storage space, indexing throughput, and query performance.
- Currently, fields supported by the `ordered_dimensions` parameter are all [numeric field types]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/numeric/), with the exception of `unsigned_long`. For more information, see [GitHub issue #15231](https://github.com/opensearch-project/OpenSearch/issues/15231).
- Support for other field types, such as `keyword` and `ip`, will be added in future versions. For more information, see [GitHub issue #16232](https://github.com/opensearch-project/OpenSearch/issues/16232).
- A minimum of `2` and a maximum of `10` dimensions are supported per star-tree index.
- The `ordered_dimensions` parameter supports the following field types:
- All numeric field types excluding `unsigned_long` and `scaled_float`.
Naarcha-AWS marked this conversation as resolved.
Show resolved Hide resolved
- `keyword`
- `object`
- Support for other field types, such as `date` and `ip`, will be added in future versions. For more information, see [GitHub issue #13875](https://github.com/opensearch-project/OpenSearch/issues/13875).

The `ordered_dimensions` parameter supports the following property.

Expand Down
66 changes: 60 additions & 6 deletions _search-plugins/star-tree-index.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@

Star-tree indexes have the following limitations:

- A star-tree index should only be enabled on indexes whose data is not updated or deleted because updates and deletions are not accounted for in a star-tree index.
- A star-tree index should only be enabled on indexes whose data is not updated or deleted because updates and deletions are not accounted for in a star-tree index. To enforce this policy and use star-tree indexes, set the `index.append_only.enabled` setting to true.
Naarcha-AWS marked this conversation as resolved.
Show resolved Hide resolved
- A star-tree index can be used for aggregation queries only if the queried fields are a subset of the star-tree's dimensions and the aggregated fields are a subset of the star-tree's metrics.
- After a star-tree index is enabled, it cannot be disabled. In order to disable a star-tree index, the data in the index must be reindexed without the star-tree mapping. Furthermore, changing a star-tree configuration will also require a reindex operation.
- [Multi-values/array values]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/index/#arrays) are not supported.
Expand Down Expand Up @@ -68,6 +68,7 @@
- Set the feature flag `opensearch.experimental.feature.composite_index.star_tree.enabled` to `true`. For more information about enabling and disabling feature flags, see [Enabling experimental features]({{site.url}}{{site.baseurl}}/install-and-configure/configuring-opensearch/experimental/).
- Set the `indices.composite_index.star_tree.enabled` setting to `true`. For instructions on how to configure OpenSearch, see [Configuring settings]({{site.url}}{{site.baseurl}}/install-and-configure/configuring-opensearch/index/#static-settings).
- Set the `index.composite_index` index setting to `true` during index creation.
- Set the `index.append_only.enabled` index setting to `true` during index creation.
- Ensure that the `doc_values` parameter is enabled for the `dimensions` and `metrics` fields used in your star-tree mapping.


Expand All @@ -81,7 +82,8 @@
"settings": {
"index.number_of_shards": 1,
"index.number_of_replicas": 0,
"index.composite_index": true
"index.composite_index": true,
"index.append_only.enabled": true
},
"mappings": {
"composite": {
Expand All @@ -94,6 +96,9 @@
},
{
"name": "port"
},
{
"name": "method"
}
],
"metrics": [
Expand Down Expand Up @@ -123,6 +128,9 @@
"size": {
"type": "integer"
},
"method" : {
"type": "keyword"
},
"latency": {
"type": "scaled_float",
"scaling_factor": 10
Expand All @@ -140,14 +148,18 @@

### Supported queries

The following queries are supported as of OpenSearch 2.18:
The following queries are supported as of OpenSearch 2.19:
sandeshkr419 marked this conversation as resolved.
Show resolved Hide resolved

- [Term query]({{site.url}}{{site.baseurl}}/query-dsl/term/term/)
- [Terms query]({{site.url}}{{site.baseurl}}/query-dsl/term/terms/)
- [Match all docs query]({{site.url}}{{site.baseurl}}/query-dsl/match-all/)
- [Range query]({{site.url}}{{site.baseurl}}/query-dsl/term/range/)
sandeshkr419 marked this conversation as resolved.
Show resolved Hide resolved

To use a query with a star-tree index, the query's fields must be present in the `ordered_dimensions` section of the star-tree configuration. Queries must also be paired with a supported aggregation.
To use a query in supported aggregations with a star-tree index, the query's fields must be present in the `ordered_dimensions` section of the star-tree configuration. Queries without aggregaions are not supported, they must be paired with a supported aggregation.

Check failure on line 158 in _search-plugins/star-tree-index.md

View workflow job for this annotation

GitHub Actions / style-job

[vale] reported by reviewdog 🐶 [OpenSearch.Spelling] Error: aggregaions. If you are referencing a setting, variable, format, function, or repository, surround it with tic marks. Raw Output: {"message": "[OpenSearch.Spelling] Error: aggregaions. If you are referencing a setting, variable, format, function, or repository, surround it with tic marks.", "location": {"path": "_search-plugins/star-tree-index.md", "range": {"start": {"line": 158, "column": 185}}}, "severity": "ERROR"}

### Supported aggregations

Naarcha-AWS marked this conversation as resolved.
Show resolved Hide resolved
#### Metric aggregations

Check failure on line 162 in _search-plugins/star-tree-index.md

View workflow job for this annotation

GitHub Actions / style-job

[vale] reported by reviewdog 🐶 [OpenSearch.StackedHeadings] Do not stack headings. Insert an introductory sentence between headings. Raw Output: {"message": "[OpenSearch.StackedHeadings] Do not stack headings. Insert an introductory sentence between headings.", "location": {"path": "_search-plugins/star-tree-index.md", "range": {"start": {"line": 162, "column": 1}}}, "severity": "ERROR"}

The following metric aggregations are supported as of OpenSearch 2.18:
- [Sum]({{site.url}}{{site.baseurl}}/aggregations/metric/sum/)
Expand All @@ -156,12 +168,12 @@
- [Value count]({{site.url}}{{site.baseurl}}/aggregations/metric/value-count/)
- [Average]({{site.url}}{{site.baseurl}}/aggregations/metric/average/)

To use aggregations:
To use aggregations searchable via star-tree:

Check warning on line 171 in _search-plugins/star-tree-index.md

View workflow job for this annotation

GitHub Actions / style-job

[vale] reported by reviewdog 🐶 [OpenSearch.LatinismsSubstitution] Use 'using, through, by accessing, or by choosing' instead of 'via'. Raw Output: {"message": "[OpenSearch.LatinismsSubstitution] Use 'using, through, by accessing, or by choosing' instead of 'via'.", "location": {"path": "_search-plugins/star-tree-index.md", "range": {"start": {"line": 171, "column": 32}}}, "severity": "WARNING"}
Naarcha-AWS marked this conversation as resolved.
Show resolved Hide resolved

- The fields must be present in the `metrics` section of the star-tree configuration.
- The metric aggregation type must be part of the `stats` parameter.

### Aggregation example
##### Example
Naarcha-AWS marked this conversation as resolved.
Show resolved Hide resolved

The following example gets the sum of all the values in the `size` field for all error logs with `status=500`, using the [example mapping](#example-mapping):

Expand All @@ -185,6 +197,48 @@

Using a star-tree index, the result will be retrieved from a single aggregated document as it traverses the `status=500` node, as opposed to scanning through all of the matching documents. This results in lower query latency.

### Date histogram with metric aggregations
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
### Date histogram with metric aggregations
### Date histograms with metric aggregations


You can use [date histograms]({{site.url}}{{site.baseurl}}/aggregations/bucket/date-histogram/) on calendar intervals with metric sub-aggregations.

To use date histogram aggregations and make then searchable in the star-tree index, use the following steps:
Naarcha-AWS marked this conversation as resolved.
Show resolved Hide resolved

natebower marked this conversation as resolved.
Show resolved Hide resolved
- The calendar intervals in star-tree mapping configuration should have either the request calendar field or a lower granularity calendar field. For example, `month` can be resolved by the star-tree from `day` field as well if present in star-tree mapping.
sandeshkr419 marked this conversation as resolved.
Show resolved Hide resolved
- A metric sub-aggregation must be part of the aggregation request.

#### Example

The following example gets the sum of all the values in the `size` field aggregated for each calendar month, for all error logs with `status=500`:

```json
POST /logs/_search
{
"query": {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a valid query ? @sandeshkr419 just double checking. I don't see term/terms etc

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bharath-techie: I updated the query to add terms while keeping the method.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is still not a valid query , we don't support range on timestamp. Lets reword this @Naarcha-AWS

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bharath-techie: Can you suggest a valid query?

"term": {
"status": "500"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use range query or keyword term query since term is already used in the above example ? Maybe keyword term query will be a good example.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bharath-techie: I'll update the example to use a range query.

}
},
"size": 0,
"aggs": {
"by_hour": {
"date_histogram": {
"field": "@timestamp",
"calendar_interval": "month"
},
"aggs": {
"sum_size": {
"sum": {
"field": "size"
}
}
}
}
}
}
```


natebower marked this conversation as resolved.
Show resolved Hide resolved

## Using queries without a star-tree index

Set the `indices.composite_index.star_tree.enabled` setting to `false` to run queries without using a star-tree index.
Loading