-
-
Notifications
You must be signed in to change notification settings - Fork 87
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Solr metrics #291
base: monitoring
Are you sure you want to change the base?
Solr metrics #291
Conversation
There's a temporary commit here that requires you to build a local docker build -t metabrainz/mb-solr:solr-9.7.0 -f Dockerfile . |
This works nicely locally - to be honest I have zero clues what is useful data for solr for us, but in https://docs.google.com/document/d/1vQBiHdxO_qkmxUAfS10QWhoULSYcPazxZWSxsxJMS9o/edit?pli=1&tab=t.0 we wanted "Number of collection documents & Data size" - is that in there? There's so much stuff, we should document exactly where to look if so! |
I also added a basic healthcheck to the search service here, though the indexer services still fails with an HTTP 500 on startup:
|
"Searcher Documents" and "Index Size" are both available under "Core Metrics." They seem to be tracked per shard IIUC. |
bf73070
to
f0fe505
Compare
@yvanzo A couple small points:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the progress!
@yvanzo A couple small points:
1. I realized the Solr health check I added was also useful for making sure the Solr exporter container (for Prometheus) didn't crash on startup. So I kept it, but converted it to a Perl script which checks that every collection is active. Do you see any problem with this method?
It seems useful to have a health check. I made a few comments on the implementation.
2. Regarding sir, I tried switching from live-indexing-search to sir-dev locally, but this causes: ``` indexer-1 | /usr/local/bin/docker-entrypoint.sh: line 15: MUSICBRAINZ_POSTGRES_SERVER: unbound variable ```
That’s a regression I just fixed in a separate PR about SIR development setup. You can cherry-pick the commit 5c5e6ec.
@@ -0,0 +1,60 @@ | |||
#!/usr/bin/perl |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please avoid introducing Perl in repositories that don’t use it already.
Python is also available in Solr image.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I only chose Perl because Python wasn't available in the image for me:
solr@22a1a3f26448:/opt/solr-9.7.0$ python
bash: python: command not found
solr@22a1a3f26448:/opt/solr-9.7.0$ perl --version
This is perl 5, version 34, subversion 0 (v5.34.0) built for x86_64-linux-gnu-thread-multi
[...]
Am I missing something? I'm fine with installing Python in the image instead. I could even use bash + jq. But I figured a small amount of Perl would be fine for a tiny one-off script.
(Edit: Perhaps Python was available in the current Solr image, but not the image built from the solr-9.7.0 branch.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops, my bad, I mistook with another image and didn’t actually checked Solr image.
Yes, Bash would be fine.
To detect whether Solr is running locally or listening to the outside, I didn’t find anything useful in the APIs. This seems to work:
ps auxww | grep Djetty.host=localhost
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried using lsof as discussed on IRC, and although it easily worked for detecting whether Solr was available for outside connections, the solr-exporter container still wasn't happy until the collections were healthy:
solr-exporter | INFO - 2025-01-31 17:33:40.329; org.apache.solr.client.solrj.impl.CloudSolrClient; Request to collection [annotation] failed due to (510) org.apache.solr.common.SolrException: Could not find a healthy node to handle the request., retry=0 maxRetries=5 commError=false errorCode=510
Merging the lsof check with the collection health check script seems to work well, though.
my @collections = qw( | ||
area | ||
artist | ||
cdstub | ||
editor | ||
event | ||
instrument | ||
label | ||
place | ||
recording | ||
release | ||
release-group | ||
series | ||
tag | ||
url | ||
work | ||
); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it needed to check each collection separately?
Maybe a collection-agnostic query can be made instead? OVERSEERSTATUS
for example?
Otherwise, I would suggest to fetch the list of collections either from API, or from the directory /usr/lib/mbsssss
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see any fields related to the health or status of the collections in the OVERSEERSTATUS
response locally, so I'm not sure that one would work.
Getting the list from /usr/lib/mbsssss sounds like a good idea. I avoided using the API in case a collection failed to be created and wasn't returned from the API. In Matrix you mentioned that:
Solr isn’t accepting connections from outside the container when creating collections.
So I figured we should be sure all of the known collections are active first.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From the other discussion:
To detect whether Solr is running locally or listening to the outside, I didn’t find anything useful in the APIs. This seems to work:
ps auxww | grep Djetty.host=localhost
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(From the other discussion again), I ended up merging the lsof check with the collection health check script to avoid "Could not find a healthy node" type errors in the solr-exporter logs.
I did update the script to get the collection names from /usr/lib/mbsssss.
@mwiencek: I merged SIR dev stuff into |
f0fe505
to
c3249b5
Compare
This is based on #290 and I'm launching it in the same way (
admin/configure add monitoring
+docker compose up -d
).I followed https://solr.apache.org/guide/solr/latest/deployment-guide/monitoring-with-prometheus-and-grafana.html and copied the default dashboard under contrib/, only fixing the Prometheus data source name.