Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hot reload of node.yaml configuration #5531

Open
esatterwhite opened this issue Oct 30, 2024 · 0 comments
Open

Hot reload of node.yaml configuration #5531

esatterwhite opened this issue Oct 30, 2024 · 0 comments
Labels
enhancement New feature or request low-priority

Comments

@esatterwhite
Copy link
Collaborator

Is your feature request related to a problem? Please describe.
Changes configuration in the static node.yml file aren't picked up by running instances of quickwit. changing anything means that we most often have to restart the entire cluster to pick up changes. In a busy cluster, this can be rather disruptive from both an ingestion and search perspective. It is also quite a manual and laborious process.

Describe the solution you'd like
The node.yaml file should be watched and trigger live reconfiguration in places where possible reducing the amount of restarts and potential downtime of a running cluster

Describe alternatives you've considered

In an effort to make it more dynamic, we have templated environment variables allowing us to alter the pod specifications when we change this which will automtically trigger pods to restart in kubernetes. While less work, it is still rather disruptive.

Additional context
Letting kubernetes restart our indexer pool can take upwards of an hour and our ingestion is rather heavily degraded while that is happening. A combination of a reduced pool size of running indexers, shard rebalancing + backpressure, etc

Search performance drops and restarts cause a loss of hot cache in memory and it will take time for it to repopulate. Also the searcher pool size is reduce which contributes to the drop in performance.

Additionally, nodes, dropping in an out of the cluster puts a bit of stress on the control plane and metastore, increases the size of the chitchat state which has let to stability problems in the past

see: #5446

@esatterwhite esatterwhite added the enhancement New feature or request label Oct 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request low-priority
Projects
None yet
Development

No branches or pull requests

2 participants