Wait some time for the backend to be ready for indexing on startup #144

yrodiere · 2024-01-16T12:03:38Z

We just had an indexing failure in prod caused by the whole cluster being restarted:

This is what happens when the app starts before the backend is ready.
Maybe we should add some step to wait for the backend to be ready before attempting to index?

Originally posted by @yrodiere in #130 (comment)

yrodiere · 2024-01-16T12:04:47Z

Important: I think we really only should wait for the indexing-on-start feature. It doesn't make sense to wait in the case of the periodic reindexing or when indexing is explicitly triggered through the management interface.

yrodiere · 2024-01-16T12:06:18Z

Hmm, actually we already do have such code:

search.quarkus.io/src/main/java/io/quarkus/search/app/indexing/IndexingService.java

Lines 96 to 117 in 6ae7d58

    
           var waitInterval = indexingConfig.onStartup().waitInterval(); 
        
           // https://smallrye.io/smallrye-mutiny/2.0.0/guides/polling/#how-to-use-polling 
        
           Multi.createBy().repeating() 
        
                   .supplier(this::isSearchBackendAccessible) 
        
                   .until(backendAccessible -> backendAccessible) 
        
                   .onItem().invoke(() -> { 
        
                       Log.infof("Search backend is not reachable yet, waiting..."); 
        
                   }) 
        
                   .onCompletion().call(() -> Uni.createFrom() 
        
                           .item(() -> { 
        
                               reindex(); 
        
                               return null; 
        
                           }) 
        
                           .runSubscriptionOn(Infrastructure.getDefaultWorkerPool())) 
        
                   // https://smallrye.io/smallrye-mutiny/2.5.1/guides/controlling-demand/#pacing-the-demand 
        
                   .paceDemand().on(Infrastructure.getDefaultWorkerPool()) 
        
                   .using(new FixedDemandPacer(1L, waitInterval)) 
        
                   .subscribe().with( 
        
                           // We don't care about the items, we just want this to run. 
        
                           ignored -> { 
        
                           }, 
        
                           t -> Log.errorf(t, "Reindexing on startup failed: %s", t.getMessage()));

But it doesn't work... I'll have a look

yrodiere · 2024-01-16T12:13:38Z

Ok, so, actually, what happens is that the first backend node is up, so the check passes, but other nodes are down, so the next request sent through the client will try to target other nodes (it's a round-robin of sorts) and fail. See #131 (comment)

We'd need some sort of failover, to try the next node if one fails, and I though there was one... but apparently not :/

marko-bekhta · 2024-01-16T12:21:54Z

Maybe checking for cluster health would work?
https://opensearch.org/docs/2.11/api-reference/cluster-api/cluster-health/#example

(It probably will be yellow locally though...)

yrodiere · 2024-01-16T12:28:21Z

Right, I'll try that.

Though if failover doesn't work, we have a bug: our tests expect it to work, see https://github.com/hibernate/hibernate-search/blob/94c571c53c35a92257bede06edfb7f4bc3dd50f3/integrationtest/backend/elasticsearch/src/test/java/org/hibernate/search/integrationtest/backend/elasticsearch/client/ElasticsearchClientFactoryImplIT.java#L599-L722

yrodiere · 2024-01-16T12:42:37Z

FWIW running this locally, with opensearch already running on port 9200, works as expected (i.e. there is failover):

quarkus dev -Dquarkus.devservices.enabled=false -Dquarkus.hibernate-search-orm.elasticsearch.hosts=localhost:9200,bar:9200,foobar:9200 -Dindexing.on-startup.when=always

yrodiere self-assigned this Jan 16, 2024

yrodiere mentioned this issue Jan 16, 2024

Wait for cluster to be ready (green) before indexing on startup #145

Merged

yrodiere closed this as completed in #145 Jan 17, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Wait some time for the backend to be ready for indexing on startup #144

Wait some time for the backend to be ready for indexing on startup #144

yrodiere commented Jan 16, 2024

yrodiere commented Jan 16, 2024

yrodiere commented Jan 16, 2024

yrodiere commented Jan 16, 2024 •

edited

Loading

marko-bekhta commented Jan 16, 2024 •

edited

Loading

yrodiere commented Jan 16, 2024

yrodiere commented Jan 16, 2024

Wait some time for the backend to be ready for indexing on startup #144

Wait some time for the backend to be ready for indexing on startup #144

Comments

yrodiere commented Jan 16, 2024

yrodiere commented Jan 16, 2024

yrodiere commented Jan 16, 2024

yrodiere commented Jan 16, 2024 • edited Loading

marko-bekhta commented Jan 16, 2024 • edited Loading

yrodiere commented Jan 16, 2024

yrodiere commented Jan 16, 2024

yrodiere commented Jan 16, 2024 •

edited

Loading

marko-bekhta commented Jan 16, 2024 •

edited

Loading