-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Wait some time for the backend to be ready for indexing on startup #144
Comments
Important: I think we really only should wait for the indexing-on-start feature. It doesn't make sense to wait in the case of the periodic reindexing or when indexing is explicitly triggered through the management interface. |
Hmm, actually we already do have such code: search.quarkus.io/src/main/java/io/quarkus/search/app/indexing/IndexingService.java Lines 96 to 117 in 6ae7d58
But it doesn't work... I'll have a look |
Ok, so, actually, what happens is that the first backend node is up, so the check passes, but other nodes are down, so the next request sent through the client will try to target other nodes (it's a round-robin of sorts) and fail. See #131 (comment) We'd need some sort of failover, to try the next node if one fails, and I though there was one... but apparently not :/ |
Maybe checking for cluster health would work? (It probably will be yellow locally though...) |
Right, I'll try that. Though if failover doesn't work, we have a bug: our tests expect it to work, see https://github.com/hibernate/hibernate-search/blob/94c571c53c35a92257bede06edfb7f4bc3dd50f3/integrationtest/backend/elasticsearch/src/test/java/org/hibernate/search/integrationtest/backend/elasticsearch/client/ElasticsearchClientFactoryImplIT.java#L599-L722 |
FWIW running this locally, with opensearch already running on port 9200, works as expected (i.e. there is failover): quarkus dev -Dquarkus.devservices.enabled=false -Dquarkus.hibernate-search-orm.elasticsearch.hosts=localhost:9200,bar:9200,foobar:9200 -Dindexing.on-startup.when=always |
We just had an indexing failure in prod caused by the whole cluster being restarted:
Originally posted by @yrodiere in #130 (comment)
The text was updated successfully, but these errors were encountered: