Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: jaeger-es-rollover-init tries to create already existing [jaeger-span-000001] index in Elasticseach #6203

Closed
sergeykad opened this issue Nov 12, 2024 · 4 comments · Fixed by #6638

Comments

@sergeykad
Copy link

What happened?

I am redeploying Jaeger Helm chart on a Kubernetes namespace. During the deployment, jaeger-es-rollover-init tries to create the already existing index and fails, failing the whole deployment. I am not sure what caused it since I redeployed it a few times successfully before.

The only change I can think of is that the Spark job was recently added to the deployment.

I am using the following configuration.

esRollover:
  enabled: true
  cmdlineParams:
    es.use-ilm: "true"
    es.ilm-policy-name: "30-days-default"
  initHook:
    extraEnv:
      - name: SHARDS
        value: "3"
      - name: REPLICAS
        value: "0"

Steps to reproduce

  1. Deploy Jaeger
  2. Add Spark job
  3. Redeploy

Spark configuration

spark:
  enabled: true
  schedule: "30 7 * * *"
  image:
    registry: ghcr.io
    repository: jaegertracing/spark-dependencies/spark-dependencies
  extraEnv:
    - name: ES_USE_ALIASES
      value: "true"
    - name: JAVA_OPTS
      value: "-Xmx64G"

Expected behavior

Jaeger deployed succesfully

Relevant log output

Error: failed to create index: jaeger-span-000001, request failed, status code: 400, body: {"error":{"root_cause":[{"type":"invalid_index_name_exception","reason":"Invalid index name [jaeger-span-000001], already exists as alias","index_uuid":"_na_","index":"jaeger-span-000001"}],"type":"invalid_index_name_exception","reason":"Invalid index name [jaeger-span-000001], already exists as alias","index_uuid":"_na_","index":"jaeger-span-000001"},"status":400}

Screenshot

No response

Additional context

Jaeger 3.2.0
Kubernetes 1.25.6
Elasticsearch 8.9.0

Jaeger backend version

3.2.0

SDK

OpenTelemetry Operator

Pipeline

OTEL SDK -> Jaeger Collector -> Elasticsearch

Stogage backend

Elasticsearch 8.9.0

Operating system

No response

Deployment model

Kubernetes

Deployment configs

@Manik2708
Copy link
Contributor

Manik2708 commented Jan 16, 2025

This is weird, init was made indempotent, I tested running init multple times but never got this error, this needs investigation!

@sergeykad
Copy link
Author

This happens consistently on our servers. We had to disable the init configuration to work around this.

@Manik2708
Copy link
Contributor

Did it use to happen before addition of spark also?

@sergeykad
Copy link
Author

I am not sure about it.

github-merge-queue bot pushed a commit that referenced this issue Feb 14, 2025
…r alias already exist (#6638)

## Which problem is this PR solving?
Fixes: #6203 

## Description of the changes
- Currently `es-rollover` checks for index existence through errors, it
is mainly expecting the error:
```
{"error":{"root_cause":[{"type":"resource_already_exists_exception","reason":"]"}],"type":"resource_already_exists_exception","reason":"request [/jaeger-*] contains unrecognized parameter: [help]"},"status":400}
```
But it can lead to inconsistent results as found in the issue, where
init was failing due to the error:
```
Error: failed to create index: jaeger-span-000001, request failed, status code: 400, body: {"error":{"root_cause":[{"type":"invalid_index_name_exception","reason":"Invalid index name [jaeger-span-000001], already exists as alias","index_uuid":"_na_","index":"jaeger-span-000001"}],"type":"invalid_index_name_exception","reason":"Invalid index name [jaeger-span-000001], already exists as alias","index_uuid":"_na_","index":"jaeger-span-000001"},"status":400}
``` 
Here if we see carefully the error is coming due to existence of index
but the reason is different. `es-rollover` is ready only for
`resource_already_exists_exception` but there are other errors also
which can be generated due to this (like the above).

The current way of marshalling error is unsafe, the safe way is: Check
if index exists -> Create if not exists. This way the certained error
(`resource_already_exists` is fixed) and the unavoidable error like
`index_name_exception` is ignored.

## How was this change tested?
- Unit and E2E Tests

## Checklist
- [x] I have read
https://github.com/jaegertracing/jaeger/blob/master/CONTRIBUTING_GUIDELINES.md
- [x] I have signed all commits
- [x] I have added unit tests for the new functionality
- [x] I have run lint and test steps successfully
  - for `jaeger`: `make lint test`
  - for `jaeger-ui`: `npm run lint` and `npm run test`

---------

Signed-off-by: Manik2708 <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants