From 5d3f5a92bc958653c2737a5d3fd412cd8abe4890 Mon Sep 17 00:00:00 2001 From: Charlie Le Date: Sun, 10 Nov 2024 11:23:54 -0800 Subject: [PATCH] docs: proofread guides Signed-off-by: Charlie Le --- docs/guides/alert-manager-configuration.md | 15 ++-- .../authentication-and-authorisation.md | 13 +-- docs/guides/capacity-planning.md | 79 ++++++++++--------- docs/guides/encryption-at-rest.md | 7 +- docs/guides/encryption-at-rest.template | 7 +- docs/guides/glossary.md | 6 +- docs/guides/gossip-ring-getting-started.md | 43 +++++----- docs/guides/ha-pair-handling.md | 19 ++--- docs/guides/ingesters-rolling-updates.md | 9 ++- docs/guides/ingesters-scaling-up-and-down.md | 9 ++- docs/guides/limitations.md | 5 +- docs/guides/overrides-exporter.md | 10 +-- docs/guides/running-cortex-on-kubernetes.md | 18 ++--- docs/guides/sharded_ruler.md | 15 ++-- docs/guides/shuffle-sharding.md | 63 +++++++-------- docs/guides/tls.md | 10 +-- docs/guides/tracing.md | 30 +++---- docs/guides/zone-replication.md | 15 ++-- 18 files changed, 193 insertions(+), 180 deletions(-) diff --git a/docs/guides/alert-manager-configuration.md b/docs/guides/alert-manager-configuration.md index cb0037f6a7..d8e01b1349 100644 --- a/docs/guides/alert-manager-configuration.md +++ b/docs/guides/alert-manager-configuration.md @@ -7,30 +7,30 @@ slug: alertmanager-configuration ## Context -Cortex Alertmanager notification setup follow mostly the syntax of Prometheus Alertmanager since it is based on the same codebase. The following is a description on how to load the configuration setup so that Alertmanager can use for notification when an alert event happened. +Cortex Alertmanager notification setup follows mostly the syntax of Prometheus Alertmanager since it is based on the same codebase. The following is a description on how to load the configuration setup so that Alertmanager can use it for notification when an alert event happens. ### Configuring the Cortex Alertmanager storage backend -With the introduction of Cortex 1.8 the storage backend config option shifted to the new pattern [#3888](https://github.com/cortexproject/cortex/pull/3888). You can find the new configuration [here](../configuration/config-file-reference.md#alertmanager_storage_config) +With the introduction of Cortex 1.8, the storage backend config option shifted to the new pattern [#3888](https://github.com/cortexproject/cortex/pull/3888). You can find the new configuration [here](../configuration/config-file-reference.md#alertmanager_storage_config) Note that when using `-alertmanager.sharding-enabled=true`, the following storage backends are not supported: `local`, `configdb`. -When using the new configuration pattern it is important that any of the old configuration pattern flags are unset (`-alertmanager.storage`), as well as `-.configs.url`. This is because the old pattern still takes precedence over the new one. The old configuration pattern (`-alertmanager.storage`) is marked as deprecated and will be removed by Cortex version 1.11. However this change doesn't apply to `-alertmanager.storage.path` and `-alertmanager.storage.retention`. +When using the new configuration pattern, it is important that any of the old configuration pattern flags are unset (`-alertmanager.storage`), as well as `-.configs.url`. This is because the old pattern still takes precedence over the new one. The old configuration pattern (`-alertmanager.storage`) is marked as deprecated and will be removed by Cortex version 1.11. However, this change doesn't apply to `-alertmanager.storage.path` and `-alertmanager.storage.retention`. ### Cortex Alertmanager configuration Cortex Alertmanager can be uploaded via Cortex [Set Alertmanager configuration API](../api/_index.md#set-alertmanager-configuration) or using [Cortex Tools](https://github.com/cortexproject/cortex-tools). -Follow the instruction at the `cortextool` link above to download or update to the latest version of the tool. +Follow the instructions at the `cortextool` link above to download or update to the latest version of the tool. To obtain the full help of how to use `cortextool` for all commands and flags, use `cortextool --help-long`. The following example shows the steps to upload the configuration to Cortex `Alertmanager` using `cortextool`. -#### 1. Create the Alertmanager configuration `yml` file. +#### 1. Create the Alertmanager configuration YAML file. -The following is `amconfig.yml`, an example of a configuration for Cortex `Alertmanager` to send notification via email: +The following is `amconfig.yml`, an example of a configuration for Cortex `Alertmanager` to send notifications via email: ``` global: @@ -50,7 +50,7 @@ receivers: - to: 'someone@localhost' ``` -[Example on how to setup Slack](https://grafana.com/blog/2020/02/25/step-by-step-guide-to-setting-up-prometheus-alertmanager-with-slack-pagerduty-and-gmail/#:~:text=To%20set%20up%20alerting%20in,to%20receive%20notifications%20from%20Alertmanager.) to support receiving Alertmanager notification. +[Example on how to set up Slack](https://grafana.com/blog/2020/02/25/step-by-step-guide-to-setting-up-prometheus-alertmanager-with-slack-pagerduty-and-gmail/#:~:text=To%20set%20up%20alerting%20in,to%20receive%20notifications%20from%20Alertmanager.) to support receiving Alertmanager notifications. #### 2. Upload the Alertmanager configuration @@ -76,3 +76,4 @@ cortextool alertmanager get \ --id=100 \ --key= ``` + diff --git a/docs/guides/authentication-and-authorisation.md b/docs/guides/authentication-and-authorisation.md index 7a742b89c6..e5affdaf3f 100644 --- a/docs/guides/authentication-and-authorisation.md +++ b/docs/guides/authentication-and-authorisation.md @@ -9,14 +9,14 @@ All Cortex components take the tenant ID from a header `X-Scope-OrgID` on each request. A tenant (also called "user" or "org") is the owner of a set of series written to and queried from Cortex. All Cortex components trust this value completely: if you need to protect your Cortex installation -from accidental or malicious calls then you must add an additional layer +from accidental or malicious calls, then you must add an additional layer of protection. -Typically this means you run Cortex behind a reverse proxy, and you must +Typically, this means you run Cortex behind a reverse proxy, and you must ensure that all callers, both machines sending data over the `remote_write` interface and humans sending queries from GUIs, supply credentials which identify them and confirm they are authorised. When configuring the -`remote_write` API in Prometheus, the user and password fields of http Basic +`remote_write` API in Prometheus, the user and password fields of HTTP Basic auth, or Bearer token, can be used to convey the tenant ID and/or credentials. See the [Cortex-Tenant](#cortex-tenant) section below for one way to solve this. @@ -34,7 +34,7 @@ To disable the multi-tenant functionality, you can pass the argument to the string `fake` for every request. Note that the tenant ID that is used to write the series to the datastore -should be the same as the one you use to query the data. If they don't match +should be the same as the one you use to query the data. If they don't match, you won't see any data. As of now, you can't see series from other tenants. For more information regarding the tenant ID limits, refer to: [Tenant ID limitations](./limitations.md#tenant-id-naming) @@ -48,6 +48,7 @@ It can be placed between Prometheus and Cortex and will search for a predefined label and use its value as `X-Scope-OrgID` header when proxying the timeseries to Cortex. This can help to run Cortex in a trusted environment where you want to separate your metrics -into distinct namespaces by some criteria (e.g. teams, applications, etc). +into distinct namespaces by some criteria (e.g. teams, applications, etc.). + +Be advised that **cortex-tenant** is a third-party community project and it's not maintained by the Cortex team. -Be advised that **cortex-tenant** is a third-party community project and it's not maintained by Cortex team. diff --git a/docs/guides/capacity-planning.md b/docs/guides/capacity-planning.md index 50c70c0a28..c7c5a2b5ed 100644 --- a/docs/guides/capacity-planning.md +++ b/docs/guides/capacity-planning.md @@ -15,52 +15,53 @@ sent to Cortex. Some key parameters are: - 1. The number of active series. If you have Prometheus already you - can query `prometheus_tsdb_head_series` to see this number. - 2. Sampling rate, e.g. a new sample for each series every minute - (the default Prometheus [scrape_interval](https://prometheus.io/docs/prometheus/latest/configuration/configuration/)). - Multiply this by the number of active series to get the - total rate at which samples will arrive at Cortex. - 3. The rate at which series are added and removed. This can be very - high if you monitor objects that come and go - for example if you run - thousands of batch jobs lasting a minute or so and capture metrics - with a unique ID for each one. [Read how to analyse this on - Prometheus](https://www.robustperception.io/using-tsdb-analyze-to-investigate-churn-and-cardinality). - 4. How compressible the time-series data are. If a metric stays at - the same value constantly, then Cortex can compress it very well, so - 12 hours of data sampled every 15 seconds would be around 2KB. On - the other hand if the value jumps around a lot it might take 10KB. - There are not currently any tools available to analyse this. - 5. How long you want to retain data for, e.g. 1 month or 2 years. +1. The number of active series. If you have Prometheus already, you + can query `prometheus_tsdb_head_series` to see this number. +2. Sampling rate, e.g. a new sample for each series every minute + (the default Prometheus [scrape_interval](https://prometheus.io/docs/prometheus/latest/configuration/configuration/)). + Multiply this by the number of active series to get the + total rate at which samples will arrive at Cortex. +3. The rate at which series are added and removed. This can be very + high if you monitor objects that come and go - for example, if you run + thousands of batch jobs lasting a minute or so and capture metrics + with a unique ID for each one. [Read how to analyse this on + Prometheus](https://www.robustperception.io/using-tsdb-analyze-to-investigate-churn-and-cardinality). +4. How compressible the time-series data are. If a metric stays at + the same value constantly, then Cortex can compress it very well, so + 12 hours of data sampled every 15 seconds would be around 2KB. On + the other hand, if the value jumps around a lot, it might take 10KB. + There are not currently any tools available to analyse this. +5. How long you want to retain data for, e.g. 1 month or 2 years. Other parameters which can become important if you have particularly high values: - 6. Number of different series under one metric name. - 7. Number of labels per series. - 8. Rate and complexity of queries. +6. Number of different series under one metric name. +7. Number of labels per series. +8. Rate and complexity of queries. Now, some rules of thumb: - 1. Each million series in an ingester takes 15GB of RAM. Total number - of series in ingesters is number of active series times the - replication factor. This is with the default of 12-hour chunks - RAM - required will reduce if you set `-ingester.max-chunk-age` lower - (trading off more back-end database IO). - There are some additional considerations for planning for ingester memory usage. - 1. Memory increases during write ahead log (WAL) replay, [See Prometheus issue #6934](https://github.com/prometheus/prometheus/issues/6934#issuecomment-726039115). If you do not have enough memory for WAL replay, the ingester will not be able to restart successfully without intervention. - 2. Memory temporarily increases during resharding since timeseries are temporarily on both the new and old ingesters. This means you should scale up the number of ingesters before memory utilization is too high, otherwise you will not have the headroom to account for the temporary increase. - 2. Each million series (including churn) consumes 15GB of chunk - storage and 4GB of index, per day (so multiply by the retention - period). - 3. The distributors CPU utilization depends on the specific Cortex cluster - setup, while they don't need much RAM. Typically, distributors are capable - to process between 20,000 and 100,000 samples/sec with 1 CPU core. It's also - highly recommended to configure Prometheus `max_samples_per_send` to 1,000 - samples, in order to reduce the distributors CPU utilization given the same - total samples/sec throughput. +1. Each million series in an ingester takes 15GB of RAM. The total number + of series in ingesters is the number of active series times the + replication factor. This is with the default of 12-hour chunks - RAM + required will reduce if you set `-ingester.max-chunk-age` lower + (trading off more back-end database I/O). + There are some additional considerations for planning for ingester memory usage. + 1. Memory increases during write-ahead log (WAL) replay, [See Prometheus issue #6934](https://github.com/prometheus/prometheus/issues/6934#issuecomment-726039115). If you do not have enough memory for WAL replay, the ingester will not be able to restart successfully without intervention. + 2. Memory temporarily increases during resharding since timeseries are temporarily on both the new and old ingesters. This means you should scale up the number of ingesters before memory utilization is too high, otherwise you will not have the headroom to account for the temporary increase. +2. Each million series (including churn) consumes 15GB of chunk + storage and 4GB of index, per day (so multiply by the retention + period). +3. The distributors’ CPU utilization depends on the specific Cortex cluster + setup, while they don't need much RAM. Typically, distributors are capable + of processing between 20,000 and 100,000 samples/sec with 1 CPU core. It's also + highly recommended to configure Prometheus `max_samples_per_send` to 1,000 + samples, in order to reduce the distributors’ CPU utilization given the same + total samples/sec throughput. If you turn on compression between distributors and ingesters (for -example to save on inter-zone bandwidth charges at AWS/GCP) they will use -significantly more CPU (approx 100% more for distributor and 50% more +example, to save on inter-zone bandwidth charges at AWS/GCP), they will use +significantly more CPU (approx. 100% more for distributor and 50% more for ingester). + diff --git a/docs/guides/encryption-at-rest.md b/docs/guides/encryption-at-rest.md index 0f73b78e6e..d713c821fc 100644 --- a/docs/guides/encryption-at-rest.md +++ b/docs/guides/encryption-at-rest.md @@ -48,8 +48,8 @@ The alertmanager S3 server-side encryption can be configured similarly to the bl ### Per-tenant config overrides -The S3 client used by the blocks storage, ruler and alertmanager supports S3 SSE config overrides on a per-tenant basis, using the [runtime configuration file](../configuration/arguments.md#runtime-configuration-file). -The following settings can ben overridden for each tenant: +The S3 client used by the blocks storage, ruler, and alertmanager supports S3 SSE config overrides on a per-tenant basis, using the [runtime configuration file](../configuration/arguments.md#runtime-configuration-file). +The following settings can be overridden for each tenant: - **`s3_sse_type`**
S3 server-side encryption type. It must be set to enable the SSE config override for a given tenant. @@ -60,4 +60,5 @@ The following settings can ben overridden for each tenant: ## Other storages -Other storage backends may support encryption at rest configuring it directly at the storage level. +Other storage backends may support encryption at rest, configuring it directly at the storage level. + diff --git a/docs/guides/encryption-at-rest.template b/docs/guides/encryption-at-rest.template index 48396c5698..63c88ebab3 100644 --- a/docs/guides/encryption-at-rest.template +++ b/docs/guides/encryption-at-rest.template @@ -30,8 +30,8 @@ The alertmanager S3 server-side encryption can be configured similarly to the bl ### Per-tenant config overrides -The S3 client used by the blocks storage, ruler and alertmanager supports S3 SSE config overrides on a per-tenant basis, using the [runtime configuration file](../configuration/arguments.md#runtime-configuration-file). -The following settings can ben overridden for each tenant: +The S3 client used by the blocks storage, ruler, and alertmanager supports S3 SSE config overrides on a per-tenant basis, using the [runtime configuration file](../configuration/arguments.md#runtime-configuration-file). +The following settings can be overridden for each tenant: - **`s3_sse_type`**
S3 server-side encryption type. It must be set to enable the SSE config override for a given tenant. @@ -42,4 +42,5 @@ The following settings can ben overridden for each tenant: ## Other storages -Other storage backends may support encryption at rest configuring it directly at the storage level. +Other storage backends may support encryption at rest, configuring it directly at the storage level. + diff --git a/docs/guides/glossary.md b/docs/guides/glossary.md index 1fadbd0fe4..62af54dfda 100644 --- a/docs/guides/glossary.md +++ b/docs/guides/glossary.md @@ -21,7 +21,7 @@ A single chunk contains timestamp-value pairs for several series. Churn is the frequency at which series become idle. -A series become idle once it's not exported anymore by the monitored targets. Typically, series become idle when the monitored target itself disappear (eg. the process or node gets terminated). +A series becomes idle once it's not exported anymore by the monitored targets. Typically, series become idle when the monitored target itself disappears (eg. the process or node gets terminated). ### Flushing @@ -35,7 +35,7 @@ For more information, please refer to the guide "[Config for sending HA Pairs da ### Hash ring -The hash ring is a distributed data structure used by Cortex for sharding, replication and service discovery. The hash ring data structure gets shared across Cortex replicas via gossip or a key-value store. +The hash ring is a distributed data structure used by Cortex for sharding, replication, and service discovery. The hash ring data structure gets shared across Cortex replicas via gossip or a key-value store. For more information, please refer to the [Architecture](../architecture.md#the-hash-ring) documentation. @@ -94,6 +94,6 @@ _See [Tenant](#tenant)._ ### WAL -The Write-Ahead Log (WAL) is an append only log stored on disk used by ingesters to recover their in-memory state after the process gets restarted, either after a clear shutdown or an abruptly termination. +The Write-Ahead Log (WAL) is an append-only log stored on disk used by ingesters to recover their in-memory state after the process gets restarted, either after a clear shutdown or an abrupt termination. For more information, please refer to [Ingesters with WAL](../blocks-storage/_index.md#the-write-path). diff --git a/docs/guides/gossip-ring-getting-started.md b/docs/guides/gossip-ring-getting-started.md index 3fe2797359..13cf2c493b 100644 --- a/docs/guides/gossip-ring-getting-started.md +++ b/docs/guides/gossip-ring-getting-started.md @@ -5,16 +5,16 @@ weight: 4 slug: getting-started-with-gossiped-ring --- -Cortex requires Key-Value (KV) store to store the ring. It can use traditional KV stores like Consul or Etcd, -but it can also build its own KV store on top of memberlist library using a gossip algorithm. +Cortex requires a Key-Value (KV) store to store the ring. It can use traditional KV stores like Consul or Etcd, +but it can also build its own KV store on top of the memberlist library using a gossip algorithm. -This short guide shows how to start Cortex in [single-binary mode](../architecture.md) with memberlist-based ring. -To reduce number of required dependencies in this guide, it will use [blocks storage](../blocks-storage/_index.md) with no shipping to external stores. -Storage engine and external storage configuration are not dependent on the ring configuration. +This short guide shows how to start Cortex in [single-binary mode](../architecture.md) with a memberlist-based ring. +To reduce the number of required dependencies in this guide, it will use [blocks storage](../blocks-storage/_index.md) with no shipping to external stores. +The storage engine and external storage configuration are not dependent on the ring configuration. ## Single-binary, two Cortex instances -For simplicity and to get started, we'll run it as two instances of Cortex on local computer. +For simplicity and to get started, we'll run it as two instances of Cortex on the local computer. We will use prepared configuration files ([file 1](../../configuration/single-process-config-blocks-gossip-1.yaml), [file 2](../../configuration/single-process-config-blocks-gossip-2.yaml)), with no external dependencies. @@ -36,7 +36,7 @@ remote_write: - url: http://localhost:9109/api/v1/push ``` -After starting Prometheus, it will now start pushing data to Cortex. Distributor component in Cortex will +After starting Prometheus, it will now start pushing data to Cortex. The Distributor component in Cortex will distribute incoming samples between the two instances. To query that data, you can configure your Grafana instance to use http://localhost:9109/prometheus (first Cortex) as a Prometheus data source. @@ -57,27 +57,27 @@ memberlist: This tells memberlist to listen on port 7946, and connect to localhost:7947, which is the second instance. Port numbers are reversed in the second configuration file. -We also need to configure `node_name` and also ingester ID (`ingester.lifecycler.id` field), because default to hostname, +We also need to configure `node_name` and also ingester ID (`ingester.lifecycler.id` field), because it defaults to hostname, but we are running both Cortex instances on the same host. To make sure that both ingesters generate unique tokens, we configure `join_after` and `observe_period` to 10 seconds. -First option tells Cortex to wait 10 seconds before joining the ring. This option is normally used to tell Cortex ingester -how long to wait for a potential tokens and data transfer from leaving ingester, but we also use it here to increase +The first option tells Cortex to wait 10 seconds before joining the ring. This option is normally used to tell Cortex ingester +how long to wait for a potential token and data transfer from leaving ingester, but we also use it here to increase the chance of finding other gossip peers. When Cortex joins the ring, it generates tokens and writes them to the ring. If multiple Cortex instances do this at the same time, they can generate conflicting tokens. This can be a problem -when using gossiped ring (instances may simply not see each other yet), so we use `observe_period` to watch the ring for token conflicts. -If conflict is detected, new tokens are generated instead of conflicting tokens, and observe period is restarted. -If no conflict is detected within the observe period, ingester switches to ACTIVE state. +when using a gossiped ring (instances may simply not see each other yet), so we use `observe_period` to watch the ring for token conflicts. +If conflict is detected, new tokens are generated instead of conflicting tokens, and the observe period is restarted. +If no conflict is detected within the observe period, the ingester switches to the ACTIVE state. We are able to observe ring state on [http://localhost:9109/ring](http://localhost:9109/ring) and [http://localhost:9209/ring](http://localhost:9209/ring). -The two instances may see slightly different views (eg. different timestamps), but should converge to a common state soon, with both instances +The two instances may see slightly different views (eg. different timestamps), but they should converge to a common state soon, with both instances being ACTIVE and ready to receive samples. ## How to add another instance? To add another Cortex to the small cluster, copy `docs/configuration/single-process-config-blocks-gossip-1.yaml` to a new file, -and make following modifications. We assume that third Cortex will run on the same machine again, so we change node name and ingester ID as well. Here -is annotated diff: +and make the following modifications. We assume that the third Cortex will run on the same machine again, so we change the node name and ingester ID as well. Here +is the annotated diff: ```diff ... @@ -124,11 +124,12 @@ is annotated diff: ``` We don't need to change or add `memberlist.join_members` list. This new instance will simply join to the second one (listening on port 7947), and -will discover other peers through it. When using kubernetes, suggested setup is to have a headless service pointing to all pods -that want to be part of gossip cluster, and then point `join_members` to this headless service. +will discover other peers through it. When using Kubernetes, the suggested setup is to have a headless service pointing to all pods +that want to be part of the gossip cluster, and then point `join_members` to this headless service. -We also don't need to change `/tmp/cortex/storage` directory in `blocks_storage.filesystem.dir` field. This is directory where all ingesters will -"upload" finished blocks. This can also be an S3 or GCP storage, but for simplicity, we use local filesystem in this example. +We also don't need to change `/tmp/cortex/storage` directory in the `blocks_storage.filesystem.dir` field. This is the directory where all ingesters will +"upload" finished blocks. This can also be an S3 or GCP storage, but for simplicity, we use the local filesystem in this example. After these changes, we can start another Cortex instance using the modified configuration file. This instance will join the ring -and will start receiving samples after it enters into ACTIVE state. +and will start receiving samples after it enters the ACTIVE state. + diff --git a/docs/guides/ha-pair-handling.md b/docs/guides/ha-pair-handling.md index 91b499a4bd..dbc248e3d9 100644 --- a/docs/guides/ha-pair-handling.md +++ b/docs/guides/ha-pair-handling.md @@ -7,21 +7,21 @@ slug: ha-pair-handling ## Context -You can have more than a single Prometheus monitoring and ingesting the same metrics for redundancy. Cortex already does replication for redundancy and it doesn't make sense to ingest the same data twice. So in Cortex, we made sure we can dedupe the data we receive from HA Pairs of Prometheus. We do this via the following: +You can have more than a single Prometheus monitoring and ingesting the same metrics for redundancy. Cortex already does replication for redundancy, and it doesn't make sense to ingest the same data twice. So in Cortex, we made sure we can dedupe the data we receive from HA Pairs of Prometheus. We do this via the following: -Assume that there are two teams, each running their own Prometheus, monitoring different services. Let's call the Prometheis T1 and T2. Now, if the teams are running HA pairs, let's call the individual Prometheis, T1.a, T1.b and T2.a and T2.b. +Assume that there are two teams, each running their own Prometheus, monitoring different services. Let's call the Promethei T1 and T2. Now, if the teams are running HA pairs, let's call the individual Promethei, T1.a, T1.b, and T2.a and T2.b. -In Cortex we make sure we only ingest from one of T1.a and T1.b, and only from one of T2.a and T2.b. We do this by electing a leader replica for each cluster of Prometheus. For example, in the case of T1, let it be T1.a. As long as T1.a is the leader, we drop the samples sent by T1.b. And if Cortex sees no new samples from T1.a for a short period (30s by default), it'll switch the leader to be T1.b. +In Cortex, we make sure we only ingest from one of T1.a and T1.b, and only from one of T2.a and T2.b. We do this by electing a leader replica for each cluster of Prometheus. For example, in the case of T1, let it be T1.a. As long as T1.a is the leader, we drop the samples sent by T1.b. And if Cortex sees no new samples from T1.a for a short period (30s by default), it'll switch the leader to be T1.b. -This means if T1.a goes down for a few minutes Cortex's HA sample handling will have switched and elected T1.b as the leader. This failover timeout is what enables us to only accept samples from a single replica at a time, but ensure we don't drop too much data in case of issues. Note that with the default scrape period of 15s, and the default timeouts in Cortex, in most cases you'll only lose a single scrape of data in the case of a leader election failover. For any rate queries the rate window should be at least 4x the scrape period to account for any of these failover scenarios, for example with the default scrape period of 15s then you should calculate rates over at least 1m periods. +This means if T1.a goes down for a few minutes, Cortex's HA sample handling will have switched and elected T1.b as the leader. This failover timeout is what enables us to only accept samples from a single replica at a time, but ensure we don't drop too much data in case of issues. Note that with the default scrape period of 15s, and the default timeouts in Cortex, in most cases, you'll only lose a single scrape of data in the case of a leader election failover. For any rate queries, the rate window should be at least 4x the scrape period to account for any of these failover scenarios, for example, with the default scrape period of 15s, then you should calculate rates over at least 1m periods. -Now we do the same leader election process T2. +Now we do the same leader election process for T2. ## Config ### Client Side -So for Cortex to achieve this, we need 2 identifiers for each process, one identifier for the cluster (T1 or T2, etc) and one identifier to identify the replica in the cluster (a or b). The easiest way to do with is by setting external labels, the default labels are `cluster` and `__replica__`. For example: +So for Cortex to achieve this, we need 2 identifiers for each process, one identifier for the cluster (T1 or T2, etc.) and one identifier to identify the replica in the cluster (a or b). The easiest way to do this is by setting external labels; the default labels are `cluster` and `__replica__`. For example: ``` cluster: prom-team1 @@ -37,9 +37,9 @@ __replica__: replica2 Note: These are external labels and have nothing to do with remote_write config. -These two label names are configurable per-tenant within Cortex, and should be set to something sensible. For example, cluster label is already used by some workloads, and you should set the label to be something else but uniquely identifies the cluster. Good examples for this label-name would be `team`, `cluster`, `prometheus`, etc. +These two label names are configurable per-tenant within Cortex and should be set to something sensible. For example, the cluster label is already used by some workloads, and you should set the label to be something else that uniquely identifies the cluster. Good examples for this label-name would be `team`, `cluster`, `prometheus`, etc. -The replica label should be set so that the value for each prometheus is unique in that cluster. Note: Cortex drops this label when ingesting data, but preserves the cluster label. This way, your timeseries won't change when replicas change. +The replica label should be set so that the value for each prometheus is unique in that cluster. Note: Cortex drops this label when ingesting data but preserves the cluster label. This way, your timeseries won't change when replicas change. ### Server Side @@ -47,7 +47,7 @@ The minimal configuration requires: * Enabling the HA tracker via `-distributor.ha-tracker.enable=true` CLI flag (or its YAML config option) * Configuring the KV store for the ring (See: [Ring/HA Tracker Store](../configuration/arguments.md#ringha-tracker-store)). Only Consul and etcd are currently supported. Multi should be used for migration purposes only. -* Setting the limits configuration to accept samples via `-distributor.ha-tracker.enable-for-all-users` (or its YAML config option) +* Setting the limits configuration to accept samples via `-distributor.ha-tracker.enable-for-all-users` (or its YAML config option). The following configuration snippet shows an example of the HA tracker config via YAML config file: @@ -108,3 +108,4 @@ labels to the query. In this case, if it asks for the `__replica__` label, Cortex will not return any data. Therefore, the `__replica__` label should only be added for remote write. + diff --git a/docs/guides/ingesters-rolling-updates.md b/docs/guides/ingesters-rolling-updates.md index 178be3f780..f87f17599d 100644 --- a/docs/guides/ingesters-rolling-updates.md +++ b/docs/guides/ingesters-rolling-updates.md @@ -6,10 +6,11 @@ slug: ingesters-rolling-updates --- Cortex [ingesters](architecture.md#ingester) are semi-stateful. -A running ingester holds several hours of time series data in memory, before they're flushed to the long-term storage. -When an ingester shutdowns, because of a rolling update or maintenance, the in-memory data must not be discarded in order to avoid any data loss. +A running ingester holds several hours of time series data in memory before they're flushed to the long-term storage. +When an ingester shuts down because of a rolling update or maintenance, the in-memory data must not be discarded in order to avoid any data loss. The Cortex [blocks storage](../blocks-storage/_index.md) requires ingesters to run with a persistent disk where the TSDB WAL and blocks are stored (eg. a StatefulSet when deployed on Kubernetes). -During a rolling update, the leaving ingester closes the open TSDBs, synchronize the data to disk (`fsync`) and releases the disk resources. -The new ingester, which is expected to reuse the same disk of the leaving one, will replay the TSDB WAL on startup in order to load back in memory the time series that have not been compacted into a block yet. +During a rolling update, the leaving ingester closes the open TSDBs, synchronizes the data to disk (`fsync`), and releases the disk resources. +The new ingester, which is expected to reuse the same disk as the leaving one, will replay the TSDB WAL on startup to load back into memory the time series that have not been compacted into a block yet. + diff --git a/docs/guides/ingesters-scaling-up-and-down.md b/docs/guides/ingesters-scaling-up-and-down.md index 3c8846c493..ff11e7ca3a 100644 --- a/docs/guides/ingesters-scaling-up-and-down.md +++ b/docs/guides/ingesters-scaling-up-and-down.md @@ -7,22 +7,22 @@ slug: ingesters-scaling-up-and-down This guide explains how to scale up and down ingesters. -_If you're looking how to run ingesters rolling updates, please refer to the [dedicated guide](./ingesters-rolling-updates.md)._ +If you're looking for how to run ingesters rolling updates, please refer to the [dedicated guide](./ingesters-rolling-updates.md)._ ## Scaling up Adding more ingesters to a Cortex cluster is considered a safe operation. When a new ingester starts, it will register to the [hash ring](../architecture.md#the-hash-ring) and the distributors will reshard received series accordingly. Ingesters that were previously receiving those series will see data stop arriving and will consider those series "idle". -If you run with `-distributor.shard-by-all-labels=false` (the default), before adding a second ingester you have to wait until data has migrated from idle series to the back-end store, otherwise you will see gaps in queries. This will happen after the next "head compaction" (typically every 2 hours). -If you have set `-querier.query-store-after` then that is also a minimum time you have to wait before adding a second ingester. +If you run with `-distributor.shard-by-all-labels=false` (the default), before adding a second ingester, you have to wait until data has migrated from idle series to the back-end store; otherwise, you will see gaps in queries. This will happen after the next "head compaction" (typically every 2 hours). +If you have set `-querier.query-store-after`, then that is also a minimum time you have to wait before adding a second ingester. If you run with `-distributor.shard-by-all-labels=true`, no special care is required to take when scaling up ingesters. ## Scaling down -A running ingester holds several hours of time series data in memory, before they're flushed to the long-term storage. When an ingester shuts down, because of a scale down operation, the in-memory data must not be discarded in order to avoid any data loss. +A running ingester holds several hours of time series data in memory before they're flushed to the long-term storage. When an ingester shuts down because of a scale down operation, the in-memory data must not be discarded in order to avoid any data loss. Ingesters don't flush series to blocks at shutdown by default. However, Cortex ingesters expose an API endpoint [`/shutdown`](../api/_index.md#shutdown) that can be called to flush series to blocks and upload blocks to the long-term storage before the ingester terminates. @@ -44,3 +44,4 @@ The ingesters scale down is deemed an infrequent operation and no automation is 2. Wait until the HTTP call returns successfully or "finished flushing and shipping TSDB blocks" is logged 3. Terminate the ingester process (the `/shutdown` will not do it) 4. Before proceeding to the next ingester, wait 2x the maximum between `-blocks-storage.bucket-store.sync-interval` and `-compactor.cleanup-interval` + diff --git a/docs/guides/limitations.md b/docs/guides/limitations.md index 5951371f74..357f4e526a 100644 --- a/docs/guides/limitations.md +++ b/docs/guides/limitations.md @@ -7,7 +7,7 @@ slug: limitations ## Tenant ID naming -The tenant ID (also called "user ID" or "org ID") is the unique identifier of a tenant within a Cortex cluster. The tenant ID is an opaque information to Cortex, which doesn't make any assumption on its format/content, but its naming has two limitations: +The tenant ID (also called "user ID" or "org ID") is the unique identifier of a tenant within a Cortex cluster. The tenant ID is opaque information to Cortex, which doesn't make any assumptions on its format/content, but its naming has two limitations: 1. Supported characters 2. Length @@ -24,7 +24,7 @@ The following character sets are generally **safe for use in the tenant ID**: - Exclamation point (`!`) - Hyphen (`-`) - Underscore (`_`) - - Single Period (`.`), but the tenant IDs `.` and `..` is considered invalid + - Single Period (`.`), but the tenant IDs `.` and `..` are considered invalid - Asterisk (`*`) - Single quote (`'`) - Open parenthesis (`(`) @@ -35,3 +35,4 @@ All other characters are not safe to use. In particular, slashes `/` and whitesp ### Length The tenant ID length should not exceed 150 bytes/characters. + diff --git a/docs/guides/overrides-exporter.md b/docs/guides/overrides-exporter.md index 95ed81ebe6..9cfa7e00f0 100644 --- a/docs/guides/overrides-exporter.md +++ b/docs/guides/overrides-exporter.md @@ -14,15 +14,15 @@ to their limits tenants are, the `overrides-exporter` module can expose limits a To update configuration without restarting, Cortex allows operators to supply a `runtime_config` file that will be periodically reloaded. This file can be specified under the `runtime_config` section of the main [configuration file](../configuration/arguments.md#runtime-configuration-file) or using the `-runtime-config.file` -command line flag. This file is used to apply tenant-specific limits. +command-line flag. This file is used to apply tenant-specific limits. ## Example -The `overrides-exporter` is not enabled by default, it must be explicitly enabled. We recommend +The `overrides-exporter` is not enabled by default; it must be explicitly enabled. We recommend only running a single instance of it in your cluster due to the cardinality of the metrics emitted. -With a `runtime.yaml` file given below +With a `runtime.yaml` file given below: [embedmd]:# (./overrides-exporter-runtime.yaml) ```yaml @@ -40,13 +40,13 @@ overrides: max_series_per_query: 100000 ``` -The `overrides-exporter` is configured to run as follows +The `overrides-exporter` is configured to run as follows: ``` cortex -target overrides-exporter -runtime-config.file runtime.yaml -server.http-listen-port=8080 ``` -After the `overrides-exporter` starts, you can to use `curl` to inspect the tenant overrides. +After the `overrides-exporter` starts, you can use `curl` to inspect the tenant overrides. ```text curl -s http://localhost:8080/metrics | grep cortex_overrides diff --git a/docs/guides/running-cortex-on-kubernetes.md b/docs/guides/running-cortex-on-kubernetes.md index 13016e7986..f13fadbb90 100644 --- a/docs/guides/running-cortex-on-kubernetes.md +++ b/docs/guides/running-cortex-on-kubernetes.md @@ -17,7 +17,7 @@ this, but this is not mandatory. If using Kubernetes, each container should specify resource requests so that the scheduler can place them on a node with sufficient capacity. -For example an ingester might request: +For example, an ingester might request: ``` resources: @@ -27,19 +27,19 @@ For example an ingester might request: ``` The specific values here should be adjusted based on your own -experiences running Cortex - they are very dependent on rate of data +experiences running Cortex - they are very dependent on the rate of data arriving and other factors such as series churn. ### Take extra care with ingesters Ingesters hold hours of timeseries data in memory; you can configure -Cortex to replicate the data but you should take steps to avoid losing +Cortex to replicate the data, but you should take steps to avoid losing all replicas at once: - - Don't run multiple ingesters on the same node. - - Don't run ingesters on preemptible/spot nodes. - - Spread out ingesters across racks / availability zones / whatever - applies in your datacenters. +- Don't run multiple ingesters on the same node. +- Don't run ingesters on preemptible/spot nodes. +- Spread out ingesters across racks / availability zones / whatever + applies in your datacenters. You can ask Kubernetes to avoid running on the same node like this: @@ -59,13 +59,13 @@ You can ask Kubernetes to avoid running on the same node like this: ``` Give plenty of time for an ingester to hand over or flush data to -store when shutting down; for Kubernetes this looks like: +store when shutting down; for Kubernetes, this looks like: ``` terminationGracePeriodSeconds: 2400 ``` -Ask Kubernetes to limit rolling updates to one ingester at a time, and +Ask Kubernetes to limit rolling updates to one ingester at a time and signal the old one to stop before the new one is ready: ``` diff --git a/docs/guides/sharded_ruler.md b/docs/guides/sharded_ruler.md index 88a61904ab..826e35e05f 100644 --- a/docs/guides/sharded_ruler.md +++ b/docs/guides/sharded_ruler.md @@ -7,31 +7,31 @@ slug: ruler-sharding ## Context -One option to scale the ruler is by scaling it horizontally. However, with multiple ruler instances running they will need to coordinate to determine which instance will evaluate which rule. Similar to the ingesters, the rulers establish a hash ring to divide up the responsibilities of evaluating rules. +One option to scale the ruler is by scaling it horizontally. However, with multiple ruler instances running, they will need to coordinate to determine which instance will evaluate which rule. Similar to the ingesters, the rulers establish a hash ring to divide up the responsibilities of evaluating rules. ## Config -In order to enable sharding in the ruler the following flag needs to be set: +In order to enable sharding in the ruler, the following flag needs to be set: ``` -ruler.enable-sharding=true ``` -In addition the ruler requires it's own ring to be configured, for instance: +In addition, the ruler requires its own ring to be configured, for instance: ``` -ruler.ring.consul.hostname=consul.dev.svc.cluster.local:8500 ``` -The only configuration that is required is to enable sharding and configure a key value store. From there the rulers will shard and handle the division of rules automatically. +The only configuration that is required is to enable sharding and configure a key-value store. From there, the rulers will shard and handle the division of rules automatically. Unlike ingesters, rulers do not hand over responsibility: all rules are re-sharded randomly every time a ruler is added to or removed from the ring. ## Ruler Storage -The ruler supports six kinds of storage (configdb, azure, gcs, s3, swift, local). Most kinds of storage work with the sharded ruler configuration in an obvious way. i.e. configure all rulers to use the same backend. +The ruler supports six kinds of storage (configdb, Azure, GCS, S3, Swift, local). Most kinds of storage work with the sharded ruler configuration in an obvious way, i.e. configure all rulers to use the same backend. -The local implementation reads [Prometheus recording rules](https://prometheus.io/docs/prometheus/latest/configuration/recording_rules/) off of the local filesystem. This is a read only backend that does not support the creation and deletion of rules through [the API](../api/_index.md#ruler). Despite the fact that it reads the local filesystem this method can still be used in a sharded ruler configuration if the operator takes care to load the same rules to every ruler. For instance this could be accomplished by mounting a [Kubernetes ConfigMap](https://kubernetes.io/docs/concepts/configuration/configmap/) onto every ruler pod. +The local implementation reads [Prometheus recording rules](https://prometheus.io/docs/prometheus/latest/configuration/recording_rules/) off the local filesystem. This is a read-only backend that does not support the creation and deletion of rules through [the API](../api/_index.md#ruler). Despite the fact that it reads the local filesystem, this method can still be used in a sharded ruler configuration if the operator takes care to load the same rules to every ruler. For instance, this could be accomplished by mounting a [Kubernetes ConfigMap](https://kubernetes.io/docs/concepts/configuration/configmap/) onto every ruler pod. A typical local config may look something like: ``` @@ -39,10 +39,11 @@ A typical local config may look something like: -ruler-storage.local.directory=/tmp/cortex/rules ``` -With the above configuration the ruler would expect the following layout: +With the above configuration, the ruler would expect the following layout: ``` /tmp/cortex/rules//rules1.yaml /rules2.yaml ``` Yaml files are expected to be in the [Prometheus format](https://prometheus.io/docs/prometheus/latest/configuration/recording_rules/#recording-rules). + diff --git a/docs/guides/shuffle-sharding.md b/docs/guides/shuffle-sharding.md index 27b9c78c98..3d178534c3 100644 --- a/docs/guides/shuffle-sharding.md +++ b/docs/guides/shuffle-sharding.md @@ -5,37 +5,38 @@ weight: 10 slug: shuffle-sharding --- -Cortex leverages on sharding techniques to horizontally scale both single and multi-tenant clusters beyond the capacity of a single node. +Cortex leverages sharding techniques to horizontally scale both single and multi-tenant clusters beyond the capacity of a single node. ## Background -The **default sharding strategy** employed by Cortex distributes the workload across the entire pool of instances running a given service (eg. ingesters). For example, on the write path each tenant's series are sharded across all ingesters, regardless how many active series the tenant has or how many different tenants are in the cluster. +The **default sharding strategy** employed by Cortex distributes the workload across the entire pool of instances running a given service (eg. ingesters). For example, on the write path, each tenant's series are sharded across all ingesters, regardless of how many active series the tenant has or how many different tenants are in the cluster. -The default strategy allows to have a fair balance on the resources consumed by each instance (ie. CPU and memory) and to maximise these resources across the cluster. +The default strategy allows for a fair balance on the resources consumed by each instance (ie. CPU and memory) and to maximise these resources across the cluster. -However, in a **multi-tenant** cluster this approach also introduces some **downsides**: +However, in a **multi-tenant** cluster, this approach also introduces some **downsides**: 1. An outage affects all tenants -1. A misbehaving tenant (eg. causing out of memory) could affect all other tenants +1. +2. A misbehaving tenant (eg. causing out-of-memory) could affect all other tenants. The goal of **shuffle sharding** is to provide an alternative sharding strategy to reduce the blast radius of an outage and better isolate tenants. -## What is shuffle sharding +## What is shuffle sharding? -Shuffle sharding is a technique used to isolate different tenant's workloads and to give each tenant a single-tenant experience even if they're running in a shared cluster. This technique has been publicly shared and clearly explained by AWS in their [builders' library](https://aws.amazon.com/builders-library/workload-isolation-using-shuffle-sharding/) and a reference implementation has been shown in the [Route53 Infima library](https://github.com/awslabs/route53-infima/blob/master/src/main/java/com/amazonaws/services/route53/infima/SimpleSignatureShuffleSharder.java). +Shuffle sharding is a technique used to isolate different tenants’ workloads and to give each tenant a single-tenant experience even if they're running in a shared cluster. This technique has been publicly shared and clearly explained by AWS in their [builders' library](https://aws.amazon.com/builders-library/workload-isolation-using-shuffle-sharding/) and a reference implementation has been shown in the [Route53 Infima library](https://github.com/awslabs/route53-infima/blob/master/src/main/java/com/amazonaws/services/route53/infima/SimpleSignatureShuffleSharder.java). -The idea is to assign each tenant a shard composed by a subset of the Cortex service instances, aiming to minimize the overlapping instances between two different tenants. Shuffle sharding brings the following **benefits** over the default sharding strategy: +The idea is to assign each tenant a shard composed of a subset of the Cortex service instances, aiming to minimize the overlapping instances between two different tenants. Shuffle sharding brings the following **benefits** over the default sharding strategy: - An outage on some Cortex cluster instances/nodes will only affect a subset of tenants. - A misbehaving tenant will affect only its shard instances. Due to the low overlap of instances between different tenants, it's statistically quite likely that any other tenant will run on different instances or only a subset of instances will match the affected ones. -Shuffle sharding requires no more resources than the default sharding strategy but instances may be less evenly balanced from time to time. +Shuffle sharding requires no more resources than the default sharding strategy, but instances may be less evenly balanced from time to time. ### Low overlapping instances probability For example, given a Cortex cluster running **50 ingesters** and assigning **each tenant 4** out of 50 ingesters, shuffling instances between each tenant, we get **230K possible combinations**. -Randomly picking two different tenants we have the: +Randomly picking two different tenants, we have the: - 71% chance that they will not share any instance - 26% chance that they will share only 1 instance @@ -73,32 +74,32 @@ The Cortex shuffle sharding implementation guarantees the following properties: ### Ingesters shuffle sharding -By default the Cortex distributor spreads the received series across all running ingesters. +By default, the Cortex distributor spreads the received series across all running ingesters. -When shuffle sharding is **enabled** for the ingesters, the distributor and ruler on the **write path** spread each tenant series across `-distributor.ingestion-tenant-shard-size` number of ingesters, while on the **read path** the querier and ruler queries only the subset of ingesters holding the series for a given tenant. +When shuffle sharding is **enabled** for the ingesters, the distributor and ruler on the **write path** spread each tenant series across `-distributor.ingestion-tenant-shard-size` number of ingesters, while on the **read path**, the querier and ruler queries only the subset of ingesters holding the series for a given tenant. _The shard size can be overridden on a per-tenant basis in the limits overrides configuration._ #### Ingesters write path -To enable shuffle-sharding for ingesters on the write path you need to configure the following CLI flags (or their respective YAML config options) to **distributor**, **ingester** and **ruler**: +To enable shuffle-sharding for ingesters on the write path, you need to configure the following CLI flags (or their respective YAML config options) to **distributor**, **ingester**, and **ruler**: - `-distributor.sharding-strategy=shuffle-sharding` - `-distributor.ingestion-tenant-shard-size=`
- `` set to the number of ingesters each tenant series should be sharded to. If `` is greater than the number of available ingesters in the Cortex cluster, the tenant series are sharded across all ingesters. + `` should be set to the number of ingesters each tenant series should be sharded to. If `` is greater than the number of available ingesters in the Cortex cluster, the tenant series are sharded across all ingesters. #### Ingesters read path -Assuming shuffle-sharding has been enabled for the write path, to enable shuffle-sharding for ingesters on the read path too you need to configure the following CLI flags (or their respective YAML config options) to **querier** and **ruler**: +Assuming shuffle-sharding has been enabled for the write path, to enable shuffle-sharding for ingesters on the read path too, you need to configure the following CLI flags (or their respective YAML config options) to **querier** and **ruler**: - `-distributor.sharding-strategy=shuffle-sharding` - `-distributor.ingestion-tenant-shard-size=` - `-querier.shuffle-sharding-ingesters-lookback-period=`
- Queriers and rulers fetch in-memory series from the minimum set of required ingesters, selecting only ingesters which may have received series since 'now - lookback period'. The configured lookback `` should be greater or equal than `-querier.query-store-after` and `-querier.query-ingesters-within` if set, and greater than the estimated minimum time it takes for the oldest samples stored in a block uploaded by ingester to be discovered and available for querying (3h with the default configuration). + Queriers and rulers fetch in-memory series from the minimum set of required ingesters, selecting only ingesters which may have received series since 'now - lookback period'. The configured lookback `` should be greater than or equal to `-querier.query-store-after` and `-querier.query-ingesters-within` if set, and greater than the estimated minimum time it takes for the oldest samples stored in a block uploaded by ingester to be discovered and available for querying (3h with the default configuration). #### Rollout strategy -If you're running a Cortex cluster with shuffle-sharding disabled and you want to enable it for ingesters, the following rollout strategy should be used to avoid missing querying any time-series in the ingesters memory: +If you're running a Cortex cluster with shuffle-sharding disabled and you want to enable it for ingesters, the following rollout strategy should be used to avoid missing querying any time-series in the ingesters’ memory: 1. Enable ingesters shuffle-sharding on the **write path** 2. **Wait** at least `-querier.shuffle-sharding-ingesters-lookback-period` time @@ -106,7 +107,7 @@ If you're running a Cortex cluster with shuffle-sharding disabled and you want t #### Limitation: decreasing the tenant shard size -The current shuffle-sharding implementation in Cortex has a limitation which prevents to safely decrease the tenant shard size if the ingesters shuffle-sharding is enabled on the read path. +The current shuffle-sharding implementation in Cortex has a limitation which prevents safely decreasing the tenant shard size if the ingesters shuffle-sharding is enabled on the read path. The problem is that if a tenant’s subring decreases in size, there is currently no way for the queriers and rulers to know how big the tenant subring was previously, and hence they will potentially miss an ingester with data for that tenant. In other words, the lookback mechanism to select the ingesters which may have received series since 'now - lookback period' doesn't work correctly if the tenant shard size is decreased. @@ -118,19 +119,19 @@ This is deemed an infrequent operation that we considered banning, but a workaro ### Query-frontend and Query-scheduler shuffle sharding -By default all Cortex queriers can execute received queries for given tenant. +By default, all Cortex queriers can execute received queries for a given tenant. -When shuffle sharding is **enabled** by setting `-frontend.max-queriers-per-tenant` (or its respective YAML config option) to a value higher than 0 and lower than the number of available queriers, only specified number of queriers will execute queries for single tenant. +When shuffle sharding is **enabled** by setting `-frontend.max-queriers-per-tenant` (or its respective YAML config option) to a value higher than 0 and lower than the number of available queriers, only the specified number of queriers will execute queries for a single tenant. -Note that this distribution happens in query-frontend, or query-scheduler if used. When using query-scheduler, `-frontend.max-queriers-per-tenant` option must be set for query-scheduler component. When not using query-frontend (with or without scheduler), this option is not available. +Note that this distribution happens in the query-frontend, or query-scheduler if used. When using query-scheduler, the `-frontend.max-queriers-per-tenant` option must be set for the query-scheduler component. When not using query-frontend (with or without scheduler), this option is not available. _The maximum number of queriers can be overridden on a per-tenant basis in the limits overrides configuration._ #### The impact of "query of death" -In the event a tenant is repeatedly sending a "query of death" which leads the querier to crash or getting killed because of out-of-memory, the crashed querier will get disconnected from the query-frontend or query-scheduler and a new querier will be immediately assigned to the tenant's shard. This practically invalidates the assumption that shuffle-sharding can be used to contain the blast radius in case of a query of death. +In the event a tenant is repeatedly sending a "query of death" which leads the querier to crash or get killed because of out-of-memory, the crashed querier will get disconnected from the query-frontend or query-scheduler and a new querier will be immediately assigned to the tenant's shard. This practically invalidates the assumption that shuffle-sharding can be used to contain the blast radius in case of a query of death. -To mitigate it, Cortex allows to configure a delay between when a querier disconnects because of a crash and when the crashed querier is actually removed from the tenant's shard (and another healthy querier is added as replacement). A delay of 1 minute may be a reasonable trade-off: +To mitigate it, Cortex allows you to configure a delay between when a querier disconnects because of a crash and when the crashed querier is actually removed from the tenant's shard (and another healthy querier is added as a replacement). A delay of 1 minute may be a reasonable trade-off: - Query-frontend: `-query-frontend.querier-forget-delay=1m` - Query-scheduler: `-query-scheduler.querier-forget-delay=1m` @@ -139,7 +140,7 @@ To mitigate it, Cortex allows to configure a delay between when a querier discon The Cortex store-gateway -- used by the [blocks storage](../blocks-storage/_index.md) -- by default spreads each tenant's blocks across all running store-gateways. -When shuffle sharding is **enabled** via `-store-gateway.sharding-strategy=shuffle-sharding` (or its respective YAML config option), each tenant blocks will be sharded across a subset of `-store-gateway.tenant-shard-size` store-gateway instances. This configuration needs to be set to **store-gateway**, **querier** and **ruler**. +When shuffle sharding is **enabled** via `-store-gateway.sharding-strategy=shuffle-sharding` (or its respective YAML config option), each tenant’s blocks will be sharded across a subset of `-store-gateway.tenant-shard-size` store-gateway instances. This configuration needs to be set to **store-gateway**, **querier**, and **ruler**. _The shard size can be overridden on a per-tenant basis setting `store_gateway_tenant_shard_size` in the limits overrides configuration._ @@ -149,18 +150,18 @@ _Please check out the [store-gateway documentation](../blocks-storage/store-gate Cortex ruler can run in three modes: -1. **No sharding at all.** This is the most basic mode of the ruler. It is activated by using `-ruler.enable-sharding=false` (default) and works correctly only if single ruler is running. In this mode the Ruler loads all rules for all tenants. -2. **Default sharding**, activated by using `-ruler.enable-sharding=true` and `-ruler.sharding-strategy=default` (default). In this mode rulers register themselves into the ring. Each ruler will then select and evaluate only those rules that it "owns". -3. **Shuffle sharding**, activated by using `-ruler.enable-sharding=true` and `-ruler.sharding-strategy=shuffle-sharding`. Similarly to default sharding, rulers use the ring to distribute workload, but rule groups for each tenant can only be evaluated on limited number of rulers (`-ruler.tenant-shard-size`, can also be set per tenant as `ruler_tenant_shard_size` in overrides). +1. **No sharding at all.** This is the most basic mode of the ruler. It is activated by using `-ruler.enable-sharding=false` (default) and works correctly only if a single ruler is running. In this mode, the Ruler loads all rules for all tenants. +2. **Default sharding**, activated by using `-ruler.enable-sharding=true` and `-ruler.sharding-strategy=default` (default). In this mode, rulers register themselves into the ring. Each ruler will then select and evaluate only those rules that it "owns". +3. **Shuffle sharding**, activated by using `-ruler.enable-sharding=true` and `-ruler.sharding-strategy=shuffle-sharding`. Similarly to default sharding, rulers use the ring to distribute workload, but rule groups for each tenant can only be evaluated on a limited number of rulers (`-ruler.tenant-shard-size`, can also be set per tenant as `ruler_tenant_shard_size` in overrides). -Note that when using sharding strategy, each rule group is evaluated by single ruler only, there is no replication. +Note that when using sharding strategy, each rule group is evaluated by a single ruler only; there is no replication. ### Compactor shuffle sharding Cortex compactor can run in three modes: -1. **No sharding at all.** This is the most basic mode of the compactor. It is activated by using `-compactor.sharding-enabled=false` (default). In this mode every compactor will run every compaction. -2. **Default sharding**, activated by using `-compactor.sharding-enabled=true` and `-compactor.sharding-strategy=default` (default). In this mode compactors register themselves into the ring. One single tenant will belong to only 1 compactor. +1. **No sharding at all.** This is the most basic mode of the compactor. It is activated by using `-compactor.sharding-enabled=false` (default). In this mode, every compactor will run every compaction. +2. **Default sharding**, activated by using `-compactor.sharding-enabled=true` and `-compactor.sharding-strategy=default` (default). In this mode, compactors register themselves into the ring. One single tenant will belong to only 1 compactor. 3. **Shuffle sharding**, activated by using `-compactor.sharding-enabled=true` and `-compactor.sharding-strategy=shuffle-sharding`. Similarly to default sharding, but compactions for each tenant can be carried out on multiple compactors (`-compactor.tenant-shard-size`, can also be set per tenant as `compactor_tenant_shard_size` in overrides). With shuffle sharding selected as the sharding strategy, a subset of the compactors will be used to handle a user based on the shard size. @@ -172,4 +173,4 @@ The idea behind using the shuffle sharding strategy for the compactor is to furt ### Does shuffle sharding add additional overhead to the KV store? No, shuffle sharding subrings are computed client-side and are not stored in the ring. KV store sizing still depends primarily on the number of replicas (of any component that uses the ring, e.g. ingesters) and tokens per replica. -However, each tenant's subring is cached in memory on the client-side which may slightly increase the memory footprint of certain components (mostly the distributor). +However, each tenant's subring is cached in memory on the client-side, which may slightly increase the memory footprint of certain components (mostly the distributor). diff --git a/docs/guides/tls.md b/docs/guides/tls.md index 752edf6877..32f794d8fb 100644 --- a/docs/guides/tls.md +++ b/docs/guides/tls.md @@ -13,10 +13,10 @@ components. This guide describes the process of setting up TLS. The first step to securing inter-service communication in Cortex with TLS is generating certificates. A Certifying Authority (CA) will be used for this -purpose which should be private to the organization, as any certificates signed +purpose, which should be private to the organization, as any certificates signed by this CA will have permissions to communicate with the cluster. -We will use the following script to generate self signed certs for the cluster: +We will use the following script to generate self-signed certs for the cluster: ``` # keys @@ -36,7 +36,7 @@ openssl x509 -req -in client.csr -CA root.crt -CAkey root.key -CAcreateserial -o openssl x509 -req -in server.csr -CA root.crt -CAkey root.key -CAcreateserial -out server.crt -days 100000 -sha256 ``` -Note that the above script generates certificates that are valid for 100000 days. +Note that the above script generates certificates that are valid for 100,000 days. This can be changed by adjusting the `-days` option in the above commands. It is recommended that the certs be replaced at least once every 2 years. @@ -79,7 +79,7 @@ through the following config parameters: #### Client flags -Client flags are component specific. +Client flags are component-specific. For an HTTP client in the Alertmanager: ``` @@ -117,4 +117,4 @@ Similarly, for the GRPC Ingester Client: -ingester.client.tls-ca-path=/path/to/root.crt ``` -TLS can be configured in a similar fashion for other HTTP/GRPC clients in Cortex. \ No newline at end of file +TLS can be configured in a similar fashion for other HTTP/GRPC clients in Cortex. diff --git a/docs/guides/tracing.md b/docs/guides/tracing.md index aa695c763c..9aa5b27a72 100644 --- a/docs/guides/tracing.md +++ b/docs/guides/tracing.md @@ -13,25 +13,25 @@ Cortex in production. ### Dependencies -In order to send traces you will need to set up a Jaeger deployment. A -deployment includes either the jaeger all-in-one binary, or else a distributed +In order to send traces, you will need to set up a Jaeger deployment. A +deployment includes either the Jaeger all-in-one binary or else a distributed system of agents, collectors, and queriers. If running on Kubernetes, [Jaeger Kubernetes](https://github.com/jaegertracing/jaeger-kubernetes) is an excellent resource. ### Configuration -In order to configure Cortex to send traces you must do two things: +In order to configure Cortex to send traces, you must do two things: 1. Set the `JAEGER_AGENT_HOST` environment variable in all components to point to your Jaeger agent. This defaults to `localhost`. 1. Enable sampling in the appropriate components: - * The Ingester and Ruler self-initiate traces and should have sampling - explicitly enabled. - * Sampling for the Distributor and Query Frontend can be enabled in Cortex - or in an upstream service such as your frontdoor. + * The Ingester and Ruler self-initiate traces and should have sampling + explicitly enabled. + * Sampling for the Distributor and Query Frontend can be enabled in Cortex + or in an upstream service such as your front door. -To enable sampling in Cortex components you can specify either -`JAEGER_SAMPLER_MANAGER_HOST_PORT` for remote sampling, or +To enable sampling in Cortex components, you can specify either +`JAEGER_SAMPLER_MANAGER_HOST_PORT` for remote sampling or `JAEGER_SAMPLER_TYPE` and `JAEGER_SAMPLER_PARAM` to manually set sampling configuration. See the [Jaeger Client Go documentation](https://github.com/jaegertracing/jaeger-client-go#environment-variables) @@ -46,19 +46,19 @@ even if you plan to use the default values. ### Dependencies -In order to send traces you will need to set up an OpenTelemetry Collector. The collector will be able to send traces to -multiple destinations such [AWS X-Ray](https://aws-otel.github.io/docs/getting-started/x-ray), +In order to send traces, you will need to set up an OpenTelemetry Collector. The collector will be able to send traces to +multiple destinations such as [AWS X-Ray](https://aws-otel.github.io/docs/getting-started/x-ray), [Google Cloud](https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/exporter/googlecloudexporter), [DataDog](https://docs.datadoghq.com/tracing/trace_collection/open_standards/otel_collector_datadog_exporter/) and -[others](https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/exporter). OpenTelemetry Collector +[others(https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/exporter). OpenTelemetry Collector provides a [helm chart](https://github.com/open-telemetry/opentelemetry-helm-charts/tree/main/charts/opentelemetry-collector/examples/deployment-otlp-traces) to set up the environment. ### Configuration -See document on the tracing section in [Configuration file](https://cortexmetrics.io/docs/configuration/configuration-file/). +See the document on the tracing section in the [Configuration file](https://cortexmetrics.io/docs/configuration/configuration-file/). ### Current State -Cortex is maintaining backward compatibility with Jaeger support, Cortex has not fully migrated from OpenTracing to OpenTelemetry and is currently using the -[OpenTracing bridge](https://opentelemetry.io/docs/migration/opentracing/). \ No newline at end of file +Cortex is maintaining backward compatibility with Jaeger support. Cortex has not fully migrated from OpenTracing to OpenTelemetry and is currently using the +[OpenTracing bridge](https://opentelemetry.io/docs/migration/opentracing/). diff --git a/docs/guides/zone-replication.md b/docs/guides/zone-replication.md index a2254d3e1b..90184fa52b 100644 --- a/docs/guides/zone-replication.md +++ b/docs/guides/zone-replication.md @@ -9,7 +9,7 @@ Cortex supports data replication for different services. By default, data is tra It is completely possible that all the replicas for the given data are held within the same availability zone, even if the Cortex cluster spans multiple zones. Storing multiple replicas for a given data within the same availability zone poses a risk for data loss if there is an outage affecting various nodes within a zone or a full zone outage. -For this reason, Cortex optionally supports zone-aware replication. When zone-aware replication is **enabled**, replicas for the given data are guaranteed to span across different availability zones. This requires Cortex cluster to run at least in a number of zones equal to the configured replication factor. +For this reason, Cortex optionally supports zone-aware replication. When zone-aware replication is **enabled**, replicas for the given data are guaranteed to span across different availability zones. This requires the Cortex cluster to run at least in a number of zones equal to the configured replication factor. Reads from a zone-aware replication enabled Cortex Cluster can withstand zone failures as long as there are no more than `floor(replication factor / 2)` zones with failing instances. @@ -26,11 +26,11 @@ The Cortex time-series replication is used to hold multiple (typically 3) replic 1. Configure the availability zone for each ingester via the `-ingester.availability-zone` CLI flag (or its respective YAML config option) 2. Rollout ingesters to apply the configured zone -3. Enable time-series zone-aware replication via the `-distributor.zone-awareness-enabled` CLI flag (or its respective YAML config option). Please be aware this configuration option should be set to distributors, queriers and rulers. +3. Enable time-series zone-aware replication via the `-distributor.zone-awareness-enabled` CLI flag (or its respective YAML config option). Please be aware that this configuration option should be set to distributors, queriers, and rulers. -The `-distributor.shard-by-all-labels` setting has an impact on read availability. When enabled, a metric is sharded across all ingesters and querier needs to fetch series from all ingesters while, when disabled, a metric is sharded only across `` ingesters. +The `-distributor.shard-by-all-labels` setting has an impact on read availability. When enabled, a metric is sharded across all ingesters, and the querier needs to fetch series from all ingesters. While, when disabled, a metric is sharded only across `` ingesters. -In the event of a large outage impacting ingesters in more than 1 zone, when `-distributor.shard-by-all-labels=true` all queries will fail, while when disabled some queries may still succeed if the ingesters holding the required metric are not impacted by the outage. To learn more about this flag, please refer to [distributor arguments](../configuration/arguments.md#distributor). +In the event of a large outage impacting ingesters in more than 1 zone, when `-distributor.shard-by-all-labels=true`, all queries will fail, while when disabled, some queries may still succeed if the ingesters holding the required metric are not impacted by the outage. To learn more about this flag, please refer to [distributor arguments](../configuration/arguments.md#distributor). ## Store-gateways: blocks replication @@ -42,14 +42,15 @@ To enable the zone-aware replication for the store-gateways, please refer to the For Cortex to function correctly, there must be at least the same number of availability zones as the replication factor. For example, if the replication factor is configured to 3 (default for time-series replication), the Cortex cluster should be spread at least over 3 availability zones. -It is safe to have more zones than the replication factor, but it cannot be less. Having fewer availability zones than replication factor causes a replica write to be missed, and in some cases, the write fails if the availability zones count is too low. +It is safe to have more zones than the replication factor, but it cannot be less. Having fewer availability zones than the replication factor causes a replica write to be missed, and in some cases, the write fails if the availability zones count is too low. ## Impact on unbalanced zones **Cortex requires that each zone runs the same number of instances** of a given service for which the zone-aware replication is enabled. This guarantees a fair split of the workload across zones. -On the contrary, if zones are unbalanced, the zones with a lower number of instances would have an higher pressure on resources utilization (eg. CPU and memory) compared to zones with an higher number of instances. +On the contrary, if zones are unbalanced, the zones with a lower number of instances would have a higher pressure on resources utilization (eg. CPU and memory) compared to zones with a higher number of instances. ## Impact on costs -Depending on the underlying infrastructure being used, deploying Cortex across multiple availability zones may cause an increase in running costs as most cloud providers charge for inter availability zone networking. The most significant change would be for a Cortex cluster currently running in a single zone. +Depending on the underlying infrastructure being used, deploying Cortex across multiple availability zones may cause an increase in running costs as most cloud providers charge for inter-availability zone networking. The most significant change would be for a Cortex cluster currently running in a single zone. +