RFC: Clustering (refactor pub-sub) #2304
Replies: 10 comments
-
yes I'm thinking of clustering and ha for quite a while, and the two things holding us back are: how to cluster gRPC & pub-sub |
Beta Was this translation helpful? Give feedback.
-
Clustering pub-sub should easily be possible by writing a redis / rabbitmq implementation of this interface. gRPC can probably simply be clustered by putting a load-balancer in front of it. The queue would be an additional part which could need some kind of refactoring for clustering, but may be covered by simply having a centeral database already. |
Beta Was this translation helpful? Give feedback.
-
I would like to have an implementation of pub-sub based on redis & also for the queue (if we still need it after refactoring ...) to be able to use redis but it should be opt-in! also we additionaly could look at NATS ... |
Beta Was this translation helpful? Give feedback.
-
Hum. I aways wonder about implementing messaging. Some questions about the current state:
Other thoughts:My model of thinking is to reduce elements. In that spirit we can also do, Trigger side:
Runner:
Cron: |
Beta Was this translation helpful? Give feedback.
-
as you write, there are redundant structures we will refactor away, as a legacy from the drone codebase |
Beta Was this translation helpful? Give feedback.
-
maybe add some mq and split server to server(user/ui/api) and control, control can be singleton, but api-server can be multiple servers |
Beta Was this translation helpful? Give feedback.
-
But this would be a single point of failure again. In addition I would currently prefer to stick to a single server binary as it makes the deployment much simpler (think of RaspberryPIs etc users). |
Beta Was this translation helpful? Give feedback.
-
I'm assuming from the state of this discussion and other related issues that HA remains unavailable, which is disappointing, as this is one of the few things holding me back from going "all-in" on recommending woodpecker to my organization. I would like to propose (and possible implement) the following "baby step": Add a new env to the server, (maybe call it The idea here is to have a single server binary that can be a stateful all-in-one, a stateless HA UI-only server, or a stateful non-HA "queue" server. The UI servers would publish events to the queue server, which would be polled by the agent servers, and perform any synchronized tasks without the risk of collision. While this would still have a single point of failure, it would at least allow for an HA UI behind your load-balancer of choice that still remains partially functional while the rest of the system is in a degraded state, without needing to add any additional third-party services. From there, it would likely be much easier to implement support for an HA queuing system, such as NATS, Kafka, Redis, etc, with that singleton server left over as a single stateless component to handle functionality requiring synchronization that would not need any HA beyond automatic restarts, as it would not need to be available to answer network requests. |
Beta Was this translation helpful? Give feedback.
-
Don't know what do you mean by |
Beta Was this translation helpful? Give feedback.
-
I would suggestion as pub-sub implementation to use https://nats.io -> this is written in golang and could be compiled into woodpecker-ci (so no extra central redis, rabbitmq or so on is needed). I believe the how communication between server and agent, and the server to ui could be handle with NATS.io (independe from replica count). Only that the Cronjob are not created by mulitple server on the same time has to be handle with a logic in code. |
Beta Was this translation helpful? Give feedback.
-
Clustering (multiple servers)
To be able to cluster we need to change some things:
Refactor pub-sub
We currently have a "interesting" pub-sub mechanism in woodpecker which works, but could get some love. I would like to refactor it a bit to have a more generic event system. For example at the moment if a build-step was updated, the event contains the complete repo, build and build step data, which seems quite heavy to me. Instead I would like some kind of
{ type: 'pipeline-step', data: ... }
structure approach even if this would require use to send two events where we currently used one event like{ repo: ...., build: ..., proc: ... }
In addition we could thing about some cool interface for the pub-sub implementation so we could add / use some external / professional pub-sub system like rabbitmq or redis, which would allow some scaling etc in future.
Beta Was this translation helpful? Give feedback.
All reactions