Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wait some time for the backend to be ready for indexing on startup #144

Closed
yrodiere opened this issue Jan 16, 2024 · 6 comments · Fixed by #145
Closed

Wait some time for the backend to be ready for indexing on startup #144

yrodiere opened this issue Jan 16, 2024 · 6 comments · Fixed by #145
Assignees

Comments

@yrodiere
Copy link
Member

We just had an indexing failure in prod caused by the whole cluster being restarted:

This is what happens when the app starts before the backend is ready.
Maybe we should add some step to wait for the backend to be ready before attempting to index?

Originally posted by @yrodiere in #130 (comment)

@yrodiere
Copy link
Member Author

Important: I think we really only should wait for the indexing-on-start feature. It doesn't make sense to wait in the case of the periodic reindexing or when indexing is explicitly triggered through the management interface.

@yrodiere
Copy link
Member Author

Hmm, actually we already do have such code:

var waitInterval = indexingConfig.onStartup().waitInterval();
// https://smallrye.io/smallrye-mutiny/2.0.0/guides/polling/#how-to-use-polling
Multi.createBy().repeating()
.supplier(this::isSearchBackendAccessible)
.until(backendAccessible -> backendAccessible)
.onItem().invoke(() -> {
Log.infof("Search backend is not reachable yet, waiting...");
})
.onCompletion().call(() -> Uni.createFrom()
.item(() -> {
reindex();
return null;
})
.runSubscriptionOn(Infrastructure.getDefaultWorkerPool()))
// https://smallrye.io/smallrye-mutiny/2.5.1/guides/controlling-demand/#pacing-the-demand
.paceDemand().on(Infrastructure.getDefaultWorkerPool())
.using(new FixedDemandPacer(1L, waitInterval))
.subscribe().with(
// We don't care about the items, we just want this to run.
ignored -> {
},
t -> Log.errorf(t, "Reindexing on startup failed: %s", t.getMessage()));

But it doesn't work... I'll have a look

@yrodiere yrodiere self-assigned this Jan 16, 2024
@yrodiere
Copy link
Member Author

yrodiere commented Jan 16, 2024

Ok, so, actually, what happens is that the first backend node is up, so the check passes, but other nodes are down, so the next request sent through the client will try to target other nodes (it's a round-robin of sorts) and fail. See #131 (comment)

We'd need some sort of failover, to try the next node if one fails, and I though there was one... but apparently not :/

@marko-bekhta
Copy link
Collaborator

marko-bekhta commented Jan 16, 2024

Maybe checking for cluster health would work?
https://opensearch.org/docs/2.11/api-reference/cluster-api/cluster-health/#example

(It probably will be yellow locally though...)

@yrodiere
Copy link
Member Author

FWIW running this locally, with opensearch already running on port 9200, works as expected (i.e. there is failover):

quarkus dev -Dquarkus.devservices.enabled=false -Dquarkus.hibernate-search-orm.elasticsearch.hosts=localhost:9200,bar:9200,foobar:9200 -Dindexing.on-startup.when=always

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants