Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: broken store config in grafana loki #2830

Merged
merged 1 commit into from
Nov 20, 2024
Merged

Conversation

takirala
Copy link
Contributor

@takirala takirala commented Nov 20, 2024

What problem does this PR solve?:

When looking at the production cluster due to an unrelated issue @fatz raised, i found that the loki compactor was printing the following error

❯ k logs -nkommander grafana-loki-loki-distributed-compactor-0 -f
level=error ts=2024-11-19T23:16:56.703832344Z caller=compactor.go:523 msg="failed to run compaction" err="index store client not found for aws"
level=error ts=2024-11-19T23:26:56.703352606Z caller=compactor.go:523 msg="failed to run compaction" err="index store client not found for aws"
level=error ts=2024-11-19T23:36:56.702274812Z caller=compactor.go:523 msg="failed to run compaction" err="index store client not found for aws"
level=error ts=2024-11-19T23:46:56.704304251Z caller=compactor.go:523 msg="failed to run compaction" err="index store client not found for aws"

it is printed regularly at an interval of 10m which happens to be the default compaction interval - indicating that we aren't compacting. Acc. to grafana/loki#10554:

The issue is indeed caused by adding the multi-store support to the compactor. The compactor does not treat aws as an alias for s3 and tries to find an index store client for aws (which does not exist).
The object_type value in the period config needs to match the value of shared_store in the *_shipper config. In your case, either both aws or both s3.

I looked at the daily cluster and we still have this bug in 2.13 (so 2.8 to 2.13 - all are affected). This is before the fix:

sh-4.4$ ./s5cmd --endpoint-url http://rook-ceph-rgw-dkp-object-store.kommander.svc:80 --credentials-file creds ls -H s3://dkp-loki/index/loki_index_20046/
2024/11/19 18:02:50            184.5K  grafana-loki-loki-distributed-ingester-0-1732036730465239319-1732038592.gz
2024/11/19 18:17:50            180.0K  grafana-loki-loki-distributed-ingester-0-1732036730465239319-1732039200.gz
2024/11/19 18:32:50             77.5K  grafana-loki-loki-distributed-ingester-0-1732036730465239319-1732040100.gz
2024/11/19 18:47:50             82.5K  grafana-loki-loki-distributed-ingester-0-1732036730465239319-1732041000.gz
2024/11/19 19:02:50            101.4K  grafana-loki-loki-distributed-ingester-0-1732036730465239319-1732041900.gz
2024/11/19 19:17:50            114.8K  grafana-loki-loki-distributed-ingester-0-1732036730465239319-1732042800.gz
2024/11/19 19:32:50            444.3K  grafana-loki-loki-distributed-ingester-0-1732036730465239319-1732043700.gz
2024/11/19 19:47:50            168.1K  grafana-loki-loki-distributed-ingester-0-1732036730465239319-1732044600.gz
2024/11/19 20:02:50            102.6K  grafana-loki-loki-distributed-ingester-0-1732036730465239319-1732045500.gz
2024/11/19 20:17:50            101.2K  grafana-loki-loki-distributed-ingester-0-1732036730465239319-1732046400.gz
2024/11/19 20:32:50             94.7K  grafana-loki-loki-distributed-ingester-0-1732036730465239319-1732047300.gz
2024/11/19 20:47:50            104.6K  grafana-loki-loki-distributed-ingester-0-1732036730465239319-1732048200.gz
2024/11/19 21:02:50             93.2K  grafana-loki-loki-distributed-ingester-0-1732036730465239319-1732049100.gz
2024/11/19 21:17:50             97.7K  grafana-loki-loki-distributed-ingester-0-1732036730465239319-1732050000.gz
...
...
..

After applying the patch from this PR, the compactor runs successfully and this is the update:

./s5cmd --endpoint-url http://rook-ceph-rgw-dkp-object-store.kommander.svc:80 --credentials-file creds ls -H s3://dkp-loki/index/loki_index_20046/
2024/11/20 04:12:03              3.9M  compactor-1732075922.gz

(for those curious, in my cluster setup - it compacted from 5.2Mb to 3.9M approx.)

Not compacting will not cause any data loss but consumes unncessary storage. This needs to be backported all the way back to 2.8 (and 2.7 even TBD)

Which issue(s) does this PR fix?:

https://jira.nutanix.com/browse/NCN-104281

Special notes for your reviewer:

Does this PR introduce a user-facing change?:


Checklist

  • If the PR adds a version bump, ensure there is no breaking change in Licensing model (or NA).
  • If a chart is changed or app configuration is significantly changed, the chart version is correctly incremented (so that apps are not automatically upgraded from a previous version of DKP).

@github-actions github-actions bot added services/grafana-loki size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels Nov 20, 2024
@mesosphere-ci mesosphere-ci added ok-to-test Signals mergebot that CI checks are ready to be kicked off update-licenses signals mergebot to update licenses.d2iq.yaml labels Nov 20, 2024
@coveralls
Copy link

Pull Request Test Coverage Report for Build 11926650890

Details

  • 0 of 0 changed or added relevant lines in 0 files are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage remained the same at 51.703%

Totals Coverage Status
Change from base Build 11923434883: 0.0%
Covered Lines: 167
Relevant Lines: 323

💛 - Coveralls

Copy link
Contributor

@msdolbey msdolbey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍
Love this so much
not that one, this one

@takirala takirala merged commit ec097b1 into main Nov 20, 2024
142 checks passed
@takirala takirala deleted the tga/fix-grafana-loki branch November 20, 2024 16:25
mesosphere-ci pushed a commit that referenced this pull request Nov 20, 2024
mesosphere-ci pushed a commit that referenced this pull request Nov 20, 2024
mesosphere-ci pushed a commit that referenced this pull request Nov 20, 2024
@mesosphere-ci
Copy link
Contributor

💚 All backports created successfully

Status Branch Result
release-2.7
release-2.8
release-2.12

Questions ?

Please refer to the Backport tool documentation and see the Github Action logs for details

takirala added a commit that referenced this pull request Nov 21, 2024
(cherry picked from commit ec097b1)

Co-authored-by: Tarun Gupta Akirala <[email protected]>
takirala added a commit that referenced this pull request Nov 21, 2024
(cherry picked from commit ec097b1)

Co-authored-by: Tarun Gupta Akirala <[email protected]>
takirala added a commit that referenced this pull request Nov 21, 2024
(cherry picked from commit ec097b1)

Co-authored-by: Tarun Gupta Akirala <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
auto-backport backport-to-release-2.7 backport-to-release-2.8 backport-to-release-2.12 ok-to-test Signals mergebot that CI checks are ready to be kicked off services/grafana-loki size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. update-licenses signals mergebot to update licenses.d2iq.yaml
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants