Skip to content

Latest commit

 

History

History
193 lines (112 loc) · 35.8 KB

60_RunnerArchitecture.adoc

File metadata and controls

193 lines (112 loc) · 35.8 KB

Runner architecture

Introduction

Throughout this paper, I’ve glossed over a very important part of Polycode: the runners. I’ve told you multiple times that I wasn’t diving in deeper the subject because there was going to be a section dedicated to it; here we are. In this section, I’m going to talk about what are runners, what are the constraints we need to respect, how can we create a system that scales in a secure and efficient manner and we will finish, as always, by taking a look at some diagrams to better understand the decisions I’ve made for Polycode.

Runners are the backbone of our application. This is what drives the interactivity and the engagement, since it is at the core of every piece of code that is executed on the website. For the user, it looks seamless, but there is a great deal of complexity and thoughts that must be put into it.

What are runners

We need to define what runners are and what they aren’t. As you might have figured, runner run code. More specifically, they run the code fed by the user when they resolve an exercise. The code the user wrote is then being fed by the input of each validators that might exist for the exercise, and the standard output and standard error streams are returned.

There are a few key points that needs to be made clear. First off, runners are more or less agnostic from the underlying (or overlying, depending on how you see it) application. It does not now what validators are, or what contents are, they just take pieces of code and run it against some input. It could very well be used for some online Rust playground with some very minor tweaks. Although we must be conscious about the Polycode’s requirements and constraints, we are far enough from the core business logic that most of the decisions will be made on a technical and human standpoint.

Furthermore, since the runners execute user input that can’t be trusted, since anybody can use our platform, we need some strong security and isolation mechanism in place to make sure that:

  • A user can’t disrupt the stability of the platform

  • A bad actor can’t escape the isolation and access the system

As of right now, we need our runners to be able to run code written in Javascript, Rust, Java and Python, with project composed of one or multiple files. This means that our runner architecture needs to be able to compile Rust code, and our actual runtime needs to have the appropriate tools and interpreters to execute the code that is being given to them.

Program input is a string, that can include newlines. This is a pretty simple input system, and there is no way to influence the input based on the output of the running program. This means validators must be deterministic.

Running code in an isolated manner

As I’ve mentioned in the introduction, one of the key point and constraint we need to respect is to have the user code run in a totally isolated manner. It should not be able to interact with the host system. A great way to think about runners is a serverless platform that you can find in most public cloud nowadays, like AWS’s lambda. You give AWS a piece of code, that will be executed somewhere in their cloud. This is the same principle that we have with Polycode: the user give us a piece of code and they want it executed somewhere in our system. Just like AWS and its customers, we don’t want the code to be able to escape the boundaries we have given them. In this chapter, we will look at different ways we can achieve this goal, the pros and cons of each of them and what it implies.

Containers

The most basic and easy way to isolate programs, is to run them within a container. A container is nothing more than a process running on the host system, that has been configured in a way to typically run in its own virtual filesystem, with limited network connectivity, no knowledge of other running programs, limited in resources and isolated from the host machine. They allows running multiple isolated systems on a single host, even if those system tries to access the same resources, the same port for example.

This looks like everything we need, with no major drawback ! Right ? Not so easy unfortunately. The isolation is done at the operating system level, opening a great and somewhat high level surface of attack. Escaping from container isolation is something that can be done pretty easily, if the container configuration is faulty. This can be especially dangerous if the container is running as the root user, as the attacker would then have full access to the host system. Just like with any software, containers can contain vulnerabilities that need to be patched. If these patches are not applied in a timely manner, the container could be at risk. This is not a huge deal in our case, since the container lifespan is the one of the program of the user, which is pretty short. Moreover, the user could only compromise its own container, as long as they are not able to escape it.

Because containers share the host’s kernel, a containerized application that is able to compromise the kernel could potentially gain access to other containers on the same host, as well as the host itself. This could allow an attacker to gain access to sensitive data or to compromise other systems on the network. This is exactly what we want to avoid, and it looks like running unsafe code in a container will not give us sufficient isolation to bring the risk down to comfortable levels.

Moreover, managing resource of a container can be tricky. The main and only available solution out there is to kill a container that is grabbing more memory or CPU resources than it should. Ideally, we would want a system that is actually able to cap the usage of a program, making sure that it can’t grab more resources than it should in the first place.

One last concern to have about containers in our system, is that we are probably going to run them inside an already dockerized environment. This means we need to run our runtime in a Docker in Docker (DinD) setup. DinD can introduce additional security risks. Because the Docker daemon inside the container has access to the host system’s Docker daemon, any vulnerabilities in the container could potentially be exploited to compromise the host system. This is especially concerning if the container is running with elevated privileges, such as the root user. Another drawback is that DinD can have performance overhead compared to running the Docker daemon directly on the host system. Because the Docker daemon inside the container needs to communicate with the Docker daemon on the host system, there is additional overhead in the form of network communication and process management. This results in reduced resource efficiency.

Virtual Machines

Another big tool for creating isolated environments at our disposal is virtual machines (VMs). Today, the world runs on VMs. Every cloud providers, private or public, relies heavily on VMs to divide a bare-metal server resources into smaller, individual and isolated machines.

VMs are using modern processors virtualization capabilities, this means our workload are isolated thanks to hardware implementation, whatever the host and guests operating systems. This allows for a much tighter isolation, and security. There is a big notion in the previous statement: the guest runs its own operating system. This means that there is no collision at the kernel level that can occurs, since we would spin up a virtual machine for each of the user’s request. It can compromise the whole guest system, and still be isolated, thanks once again to hardware virtualization. Escaping from hardware virtualization is significantly harder, at level that I think are comfortable.

VMs also allows us to better isolate resources, since we can configure the available resources in each VMs. Unlike with containers, the guest operating system will be responsible for managing those resources, and we have a clear boundary on them. It can’t use more than what we gave it. However, operating systems typically have system in place to make sure to continue functioning properly even in the case where it needs resources it doesn’t have. For example, if you need more RAM, your operating system usually have swap pages that it can use as if it was RAM, at a very significant performance cost.

This also means that we need that the overall resources usage will go up, since every code execution also means spinning up an operating system. However, Linux has some great option, providing kernels that not only takes fraction of seconds before becoming operational, but also very little resources. Linux images can be customized to our needs, tuning them to our needs, to further decrease resource usage and boot time. We can also imagine having special VMs that run special operating systems, such as Windows. This would allow for executing code that is targeted at win32 for example. This also opens up the possibility of running and executing code compiled for different processors architecture. This would allow to create content on RISC-V assembly for example, even if those VMs would be extremely slow, since we are not virtualizing anymore, but emulating. It also means we need to use a virtual machine manager capable of emulating such system. But we open a native experience for the user.

However, there is also a increased complexity with VMs: talking to them is not as easy as with processes. With containers, you could programmatically run commands easily, with solutions like Docker for example. With VMs, you either need to have a custom kernel that is built to run some commands at startup, and find a way to extract executions results out of them, or you need to find a way to communicate with an agent in the VM (still meaning a custom linux image), that will respond to commands from the host.

All of this complexity might be a blessing in disguise: this opens up a world of possibilities in how we run our runners. Having an agent system that takes in request allows for a lot of features and optimization: preemptive startup (starting VMs before actually needing them, with them waiting for requests), fine-tuning execution parameters on the fly, or even executing custom operations depending on the code being executed, that can’t be known ahead of time.

Overall, virtual machines gives us a satisfactory level of isolation, at the cost of increased resource usage and complexity. However, I would argue that the tradeoff is worth it, and it goes beyond that: we must accept every challenges to make sure we have good isolation and a secure system.

Serverless

As I’ve eluded earlier, what we are trying to accomplish is very similar to what cloud providers have been doing for years now: taking some small pieces of code, execute it and exit. Why not base ourselves on the work these companies has already done, instead of reinventing the wheel ? Let’s look at the pros and cons of using serverless services cloud providers offer.

First, let’s define what serverless actually is. Serverless is a cloud computing execution model in which the cloud provider dynamically manages the allocation of resources and charges for the execution of code. With serverless, you can run your code without having to worry about the underlying infrastructure, such as servers or virtual machines.

In a serverless model, you write and deploy your code in the form of functions, and the cloud provider executes the code in response to triggers, such as HTTP requests or events. The cloud provider automatically allocates the necessary resources to execute the code, and you are only charged for the actual execution time and the number of requests or events processed.

This fits particularly well what we are trying to achieve. However, there is an additional variable in all of this: cost. Indeed, until now, we didn’t worry about what the cost of our solutions were, since we are currently using machines that are at our disposal. With these serverless services, you have to pay for the execution time of your code.

Another thing to demystified, is that it’s not "send the code of the user to the serverless service and everything happens magically". We need to write a piece of code that takes a user input as a requests, figure out the inputs the program should be running, then start a runtime environment to run that code, while compiling any code before running, if applicable. We also need to need to properly isolate the code, and manage permissions of our serverless application properly, to make sure that the user doesn’t have access to any other resources that is at disposal in our cloud platform account.

Serverless platform usually runs in a pretty trimmed-down system, and you have not a lot of guarantees and possibilities about the system that you’re running on. This means you might not have the needed dependencies. Serverless was made for serving HTTP requests, opening some sockets to some databases or other HTTP connection along the way. Not for spawning another program. Serverless is more complex to design and implement than traditional architectures, because you need to consider the specific constraints and limitations of the serverless model. In addition, the additional abstraction introduced by the serverless model can make it more difficult to debug and troubleshoot issues, because you don’t have direct access to the underlying infrastructure.

If we forget the fact that we have a base infrastructure already at our disposal, and that we don’t currently have to pay for anything, cost become an interesting factor that might justify putting the work into making serverless work.

With serverless, you pay for execution time. This means that if you have a low volume of user using your service, you might need to execute code during 1 hour in the day. With serverless, you would pay the rate for code execution of 1 hour. If you have a server idling, waiting to spawn VMs, this server will run during 24 hours, and you will be charged for this period. Even if you can get much more with your money with VMs, if you don’t have the usage to justify it, it is not cost-effective, and you might rather take the decision to use serverless services. Moreover, serverless is scalable basically infinitely, since you rely on your cloud providers resources. This comes with other benefits cloud providers typically gives you, such as a near-always uptime, hardware, hypervisors and operating system maintenance for this kind of highly-abstracted services, flexibility, global availability, etc..

The right balance, firmly on a cost of operation effectiveness standpoint is most likely somewhere in the middle. We will talk about it more later in this section.

Running code in a scalable way

One of the other challenge of our architecture, albeit for the runner of for the entire system, is to create a system that can scale seamlessly, whatever the count of users are on our platform. In this chapter, I want to take a look at how we can accomplish that, from a system architecture standpoint mostly, with some discussions about technological choices. We need to be confident in our runners, and their ability to handle all the work we throw at them.

We want to be able to scale horizontally, meaning we want to be able to spawn new runners into the system when the traffic surges. This also means we need to have a way to control this behavior, as well as having an entry point to this sub-system of our architecture. Concretely, we need a service that takes user code and input to be run against, schedules a machine to run the user’s code, with the input, get the results once the machine has finished executing, returning it to the caller.

There are a lot of implications in this architecture, let’s try to tackle them one by one.

Controller / worker architecture

What I’ve just described looks awfully similar to a controller worker architecture, where we have our scheduling/supervising microservice (let’s call it the controller from now on), that gives directives to worker machines, about what to execute and how to execute it. However, there is a significant caveat that is usually not found in this type of architecture: we can have (and will have) multiple controllers. Controllers also needs to be scalable, we can’t trust the resiliency and scalability of an architecture with a single point of failure. This means we need to coordinate somewhere, since we don’t want our controllers giving two directives at the same time.

This would cause, depending on the implementation:

  • Risking exceeding the worker machine resources, if we spawn one too many workload due to improper communication between controllers and the worker machine

  • Having a failed execution request to one of the controller, if the worker simply refuse to run one of the requests

In either way, this is some unwanted behavior. How can we resolve that ? There is probably a piece of a solution that can be found with message queues, where the controller would send their requests to a message queue, and available runners would pick them up as long as they have available resources. This would mean that it is the runner’s job to manage its resource, which totally makes sense, since he is the one that knows its available resources in real time. This creates a new problem though: how to retrieve the execution results ? Another message queue ? But since the request to the controller was made synchronously, we need the correct controller to pick back the answer. And in a timely manner.

All of this adds a lot of complexity, even if we would end up with a more robust system. For now, I would argue that the best solution is to simply have no synchronization, but make sure that the workers rejects one of the controllers request if it received two requests at the same time. The controller needs to have logic to handle this possibility, and reschedule a new runner immediately.

At our scale, and up until a hundred concurrent user on the platform, we probably won’t even run in this scenario. I think this system would work well enough to handle thousands of concurrent user pretty easily. Let’s not forget that most runners most likely will have enough resources to run multiple machines at the same time, lowering the occurrences of this problem even more.

But is there a simpler solution ? Yes, and much more elegant. Instead of looking at the problem as worker picking up work when they can, what about simply coordinating controllers ? We have state that must be shared across all instances of our controllers, the overall status of the runner infrastructure. Let’s use this state in a way that disallows having inconsistent data. I think a great candidate for the job is Redis. It is fast, we don’t have concurrent access concerns since it is one threaded, we don’t care if we lose data if it shuts down, since we have lost our infrastructure anyway and need to reset everything back to zero. The data we need to store can easily be stored with Redis' datatypes. We do add another piece of software in our system, but I think this is justified.

So to sum up this architecture: we have a controller microservice, that is responsible of receiving execution requests. It then looks at its available pool of machine, and choose one that is available. It forwards the request, and the worker execute the code, returning its output. The controller then returns the output back to the request emitter.

Machine pool

Another implication of this architecture, that I’ve touched just before, is the fact that we need to manage a pool of workers. These workers can spawn machines that executes the user code with some specified input. It means we need a way to control this pool, manage its state, adding new workers when needed, removing others when we don’t need them, and detecting faulty ones.

We want to be able to dynamically register new machines to our system, without disrupting the service. This can be done pretty easily, since we are using Redis as a shared state. All we need to do is to hit up a controller with a "register new worker" request, and it adds it to the pool. However, we want to have some security mechanism in place. We want to add only trusted workers into our system. A basic authentication challenge should be enough, such as a certificate or a key.

Another key point, is to detect faulty workers. A worker can be faulty in a myriad of ways. For example, it might appears available, but there is a power outage, a network disruption or other external factors that made the worker not operational. If we don’t have a way to detect these cases, we will end up with dead workers. There is also the possibility that the worker is available, that it can be contacted but for some reasons, all the requests ends up with errors. Something might be wrong on the machine.

I want to discuss how we can prevent and remediate this kind of failures. In the case of the whole worker machine going down, or being available, I think a good solution is to periodically send liveness requests to this worker. This would mean that, every x seconds, a simple request is send, with the expectation of having a response. If we have no response, this means that the worker is not operational anymore, and should be removed from the worker pool. This is similar to the heartbeat pattern. This would ensure we have a clean pool of machine to work with, at a low cost, since sending such simple requests is not resource-heavy at all, and even if we have multiple thousands workers, if we send a heartbeat request every 30 seconds, over multiple controller replicas, it computes down to a negligible number of requests for modern systems. As for the implementation, I would make use of Redis' sorted set datatype, using the elapsed time since last heartbeat as our scoring system. With that, controllers could periodically check for the workers that are over this threshold quickly, and then send our heartbeat packets.

This, however, does not resolve the problem of faulty worker that fails to run workload that we throw at them. Resolving that is not as trivial, but the use of the circuit breaker pattern might be a great way to start tackling this problem. When an abnormal numbers of errors are detected on one runner, we would mark this worker as faulty, effectively disabling it for x amount of time. After this time has passed, we try to slowly send back requests to this worker again, and if it is still malfunctioning, we tripped the circuit again, for a longer period this time. This should be synced with other controllers, via redis once again.

There should be a strong monitoring and alerting system in place when a circuit breaker trips, so that system administrators are notified something’s wrong with one of the worker, and to let them investigate if it is a false positive, or if there is actually a problem.

Circuit breakers are a great tool, but it might add a lot of work to implement it. I don’t think it is a good idea to implement it right now in Polycode. However, this option should be kept in mind if we encounter this kind of problems as the application grows.

Scaling infinitely

One major problem with our infrastructure as it is, is that we are limited by the number of workers that are available. If the traffic was to exceed our workers' capacities, it would mean that our user would not be able to run code in quickly, or even at all. To circumvent this problem, we need a system in place that automatically finds new way and resources to run user’s code.

Thankfully, cloud providers gives us a way to get access to machines quickly. We can use this strength at our advantage. I see two main way of using cloud providers:

  • We have a system in place that creates and destroys workers on the cloud depending on the traffic

  • We use the serverless offerings of those cloud providers

Both approaches have pros and cons, but let’s talk about the similarity of both approaches. As we have discussed before, renting machines or execution time on the cloud is costly. We need to prioritize our own infra as much as possible. We have bare-metal servers available to us, we need to use their resource to the fullest before turning towards cloud-providers. This means we always make profitable our hardware investments, which is already paid for, before adding exploitation charges, in the form of services rental in the cloud. However, to handle peak traffic, it might not make much sense to invest in more on-premise servers, as it would be idling most of the time. This is where the usage of the cloud really shines in Polycode’s case.

Let’s talk about renting a whole machine on the cloud. The biggest issue we have here, is that we can’t use VM-based isolation, since nested virtualization is not feasible on most cloud providers. We would need to rent bare-metal machines, which is definitely a doable and valid solution (c5.metal x86_64 machines go for ~4$/hour on AWS, pretty expensive, but can run a lot of workload concurrently). We can also go the docker route. However, as discussed previously, a dockerized environment is perhaps not safe enough for running untrusted code.

The other option is to run code via serverless services of cloud providers. As we have discussed previously, running via serverless functions can be somewhat tricky. Running the code itself is not very complicated, it’s isolating it that is harder. But as mentioned, previously, do we really need isolation in a serverless environment ? As long as the serverless instance doesn’t have any access to our AWS account (in the case of AWS), we should be safe enough. I think exploring the serverless solution is the best of the two.

So to sum up, we would use our on-premise infrastructure to its fullest. If traffic surges above our capacities, we offload some of the work to serverless services in the cloud.

Caching

Another technique commonly used to improve response time and reduce system load is caching. It is most effective when accessing a rarely mutating resource, with high volumes of read and the data resides in a foreign data store or involves heavy computation. At first glance, it is not really applicable to our use case here. However, I would argue we can try to find some way to cache requests, since the cost of running the user’s code is very high, and also time consuming. I don’t expect to catch a lot of similar requests between users, however, there are edge cases where we will significantly reduce the load on workers, such as users trying to run the base code that is given to them, or them rerunning their code out of frustration, for example. For simple exercises, having a cache system makes even more sense, since we can expect some users to come up with the exact same code. Our goal is to cache code execution results for a specific combination of user input, validator and content.

Since we already have a Redis cluster at our disposal, why not use it as our data store for our caching ? Redis is often used for caching as it is fast, and provides efficient way to store simple data, which is exactly what we need. Our cache would be shared between all our controllers, which is what we want.

I propose the following as a first draft as how to cache data:

We take the user input, stripping it of all newlines and spaces. We then hash it, using a fast algorithm such as md5. This will be used in our key in Redis. I suggest a key in this format : {contentId}:{validatorId}:{language}:{hashedInput}.

This way, we have a simple a deterministic way to lookup our cache. If there is a cache miss, we send the execution request to the runner. We get back the outputs, add them to the key we defined above.

There are different caveats that we need to think about and avoid. Firstly, the results might vary execution to execution, even if it is the same code. For example, if the user uses randoms or time in his code, every execution will yield different results. However, since this kind of behavior is not something validators can predict, it means that no validator will rely on this type of code. I think it is reasonable to ignore this edge case, since the user will never be pushed towards using those variable functions, and if they do, it will have minor impact on their experience.

Another caveat to consider, is when validators are updated. Indeed, as of right now, we cache via the validator’s ID, and not its input. This means that someone can update a validator, updating its input, but since the ID is the same, we don’t see the change and our cache is invalid. This would mean that we would continue to return old values. I see three main way to solve that:

  • Setting a low Time-To-Leave for our cache inputs. This is the easiest, but it means our cache becomes weaker, and we still have the problem, just for not too long.

  • Notifying the controller, so that it invalidates the cache. This has immediate effect and we can also cleanup the old cache that we don’t need anymore, but we need to add additional routes and communication within our content context.

  • Caching using the inputs in the key instead of the validator. This has immediate effect, require no additional routes or code. But we aren’t notified when a validator has changed, meaning that we don’t have a way to cleanup the cache.

I would argue that the last option, caching using inputs in the key, is our best solution here. It requires no effort and is very effective. We do need to have a way to cleanup data though, but I don’t have enough knowledge and experience to have an idea at how fast we would fill up the Redis cluster. I would test it in some production environment, to have a better understanding about how it behaves over time and make a decision based on that. Our key format would now be {contentId}:{hashedValidatorInput}:{language}:{hashedInput}. This option also has the added benefits of totally decoupling the notion of validators from the runner. We just take some input as parameters, we don’t have to know from which validator it is from. This is probably what we want, and I would argue that the job of getting the validator’s input should be done prior reaching the runner system.

Caching is something that should not be prioritized too much. As of right now, having a functional runner system is the priority. We don’t have enough user to justify putting this much effort in a caching system.

Heating up VMs

The last point I would like to touch on in this chapter, is the process of heating up VMs. What I mean by heating up VMs, is starting execution VMs before needed, in order to reduce the response time by moving the VMs startup time before a request comes in. This will improve the user experience, by reducing the processing time significantly, but also improve our capacity, since we make some work done during off-time. This means we are able to better handle surges in traffic.

Workers could be configured to heat up some of their VMs. When they are ready, they would notify controller. When controllers receives runner requests, they would prioritize any already heat up VM for the execution. This also means that it is an implicit way to prioritize workers, since we can configure some that have limited resources or that we want to use only when absolutely needed to not heat up VMs.

Once again, this is not an absolute priority. This feature should be added incrementally, whenever we have time and resources to spare, just like caching requests.

The complete picture

For the runner architecture, I suggest using a controller/worker model, where workers are effectively in a pool of machines that controllers can choose to use for running workloads. This model allows us to scale efficiently, with a system where we prioritize workers where there is heated up VMs, then workers where there is none and finally a system that sends the code on the cloud in a serverless environment, to be executed. To have a better-optimized and sustainable system, we would cache every requests and their output, and then try to read the cache to not execute code that has already been executed on subsequent request.

Polycode

Now that we have defined what are the constraints for our runner architecture, let’s take a broader view and take a step back to completely integrate our architecture into Polycode. In this chapter, I will draw some diagrams showing how the architecture operates on a broader scale.

Architecture diagram

To start off, let’s take a look at our complete architecture :

60 Architecture Diagram
Figure 1. Runner architecture diagram

This is the complete diagram, with all the suggested features from above. I’ve chose AWS as our cloud provider, but this is not a definitive choice. It is here for example purposes, but we can chose any cloud providers that offers serverless services. I’ve made the decision that fetching the validators input should be done within the submission microservice. In fact, with this diagram, the question of "Should the runner have been its own bounded context ?" arise.

Looking at the diagram, it also becomes apparent that the runner microservice is clearly a controller (or a supervisor), talking to its worker to execute the workloads. We can see that the worker spawns VMs, but we also see that the runner microservice communicates with Lambda if the on-premise workers becomes overloaded.

Registering workers, sequence diagram

One thing that doesn’t show in the architecture diagram is that workers can be added and removed dynamically from the system. To better illustrate that, I came up with a sequence diagram, of a new worker registering into the system:

60 Sequence Diagram
Figure 2. Registering a new worker, sequence diagram

They are a few key points to extract from this diagram, let’s talk about them one by one.

Firstly, we can see that we have some kind of encryption at play when making a register request. As mentioned previously, we don’t want anybody who finds out the IP address of our controller to register their worker, as this would open security risks, and it would mean that the bad actor could forge false responses to the code ran. To protect against that, I suggest using simple asymmetric keys. Workers would need this key to sign their request, and the controller would use the public key to verify the signature. This way, unless the key leaks, we can be sure that the worker is coming from an authorized party. The underlying protocol should be encrypted as well, with something that prevents replay attack. If the request is sent in clear-text, an attacker could grab the request and do a replay attack. Same thing if the underlying protocol doesn’t prevent replay attacks. We can also imagine that the controller sends a nonce when sending the configuration negotiation, and the worker sending it back, signing the response to prevent such attacks.

Secondly, it is the controller that requests configuration to the worker, not the worker sending it to the controller when trying to register. This is done in order to make sure that the controller can get all the information it needs, and reject workers that do not comply with either the configuration structure or the configuration itself. This is also where the controller should check the nonce and signature that we just talked about.

Once the negotiation is finished, the controller adds the corresponding information in Redis, effectively marking it as ready. If everything’s good, the registration is finished and accepted.

Conclusion

The runner architecture, as you just saw, is a topic on its own. It needs thinking to come up with solution that are secure, in its way it runs code, but also in the way the system interacts, as well as being able to scale properly, whatever the traffic is. I suggest using a mix of a controller/worker architecture, on-premise workers and public cloud’s serverless features to scale properly. Our on-premise workers should create VMs to ensure the execution is isolated to the best we can. To manage state, I would use Redis as a data store for the controllers. This Redis cluster can also be used for caching, to deduplicate requests and make sure that we keep the number of costly execution as low as possible, hand-in-hand with prioritizing on-premise workers before using serverless.