Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Switch to kubernetes for the scheduling #5

Open
shankari opened this issue Feb 14, 2020 · 7 comments
Open

Switch to kubernetes for the scheduling #5

shankari opened this issue Feb 14, 2020 · 7 comments

Comments

@shankari
Copy link
Contributor

No description provided.

@shankari
Copy link
Contributor Author

@njriasan will work on this

@njriasan
Copy link
Contributor

I've started investigating the switch to kubernetes and I this may be a fundamental problem https://stackoverflow.com/questions/54821044/how-to-stop-pause-a-pod-in-kubernetes. My understanding is that the ability to suspend containers is essential because we are attempting to pause many containers that are not actively being used but will rely on holding a user's private key. @shankari do you believe there is a reasonable alternative that I am overlooking or does this seem to indicate that kubernetes cannot support our use case.

@njriasan
Copy link
Contributor

To clarify this seems to suggest to me that it is probably feasible to route users to specific pods (or their UPCs) but it would not be possible to support suspending the containers. So a pod would need to be actively running to contain the key or require the user to provide the key for each restart.

@njriasan
Copy link
Contributor

I've been giving this some more thought and now I have some other questions. The first is even if we can't pause containers should we do that anyways? It must be worthwhile to compare what the capacity for a machine is with Kubernetes and compare the performance with a real scheduler vs manually selecting the scheduling and seeing how performance scale (i.e. what is the capacity when never stopping containers). I could run this experiment just with the docker containers but it may prove interesting to investigate what the real smart scheduler would opt to do in comparison with the additional flexibility that could be gained by stopping or pausing containers. It may also be worth trying to investigate how performance is impacted if stopping containers requires online users again and what sort of impact the producers with the need to restart containers (how often do users need to restart under varying workloads could also dictate the need or benefit of pausing containers).

@njriasan
Copy link
Contributor

Ok so after speaking with David today I'm going to resume the port to kubernetes. One important point he made that I hadn't considered was that if they were to ever reach mainstream deployment containers would go down all the time. Relying on pausing containers or choosing pausing containers as an advantage is probably overstating the impact it might have on a real deployment system.

@njriasan
Copy link
Contributor

@shankari I think I need a second opinion on if what I am attempting to do is reasonable or too hacky. I'll start by summarizing what I believe our goals are for kubernetes because I'm not sure these are consistent with everyone's goal.

Kubernetes offers privileged containers. This is available in docker BUT NOT docker swarm, so this is why we couldn't use docker swarm.

The ability to address node/container crashes. Both kubernetes and docker swarm should handle this automatically.

Having volumes replicated across the cluster. I haven't fully investigated how this overlaps with ecryptfs, but my understanding is once you get past configuring your cluster the volume should now be possible to replicate across machines should your container get moved to elsewhere on the cluster.

Allocation of containers to machines. Kubernetes probably makes much smarter decisions about where to assign containers in your cluster, although I'm sure if you push Kubernetes to its limits no good allocation exists.

However my understanding is we DO NOT want the true micro service nature of kubernetes where all upc servers are treated as the same (after all this is the private aspect). If we did want this we would need to modify how we are storing data because having all nodes behind a common DNS should route traffic in a manner that cannot achieve isolating a particular cloud.

So I believe we want to construct a unique service for each individual, ideally all based on the same configuration file. This is where the problem comes in. The service is defined by its name and the port upon which the node is listening needs to be unique to each service and both of these are defined by the configuration file(s). This problem is also true in docker, but I get around them because docker lets you rename your container (so I would dynamically create a random name at run time) and while you can't dynamically specify ports with docker-compose the configuration file does allow for bash environment variables to be resolved at container launch, so I was able to pass in port specifications as environment variables. Neither of these appear to be possible in kubernetes, which is a bit of an impasse.

Now one way to do this would be of course to change the file. From simple experimentation I found that I could launch multiple services with identical configuration except for exposed port and name (and not if I kept either of those the same). So I'm proposing that when launching a kubernetes service I undergo the following steps:

Step 0: Have a bunch of base configuration files generated before any of the services start.
Step 1: Read in the contents of the service configuration file, and any other configuration files that need changing.
Step 2: Write out the contents of the files to named temporary files (https://docs.python.org/3/library/tempfile.html#tempfile.NamedTemporaryFile) with a randomly generated name and port for the service (and if I read in multiple files make sure those names when referenced are changed as well).
Step 3: Launch a new Kubernetes service with a single replica from the named temporary service file and repeat for each container that is dynamically spawned.

This all seems reasonable to me but I wanted a second opinion.

@njriasan
Copy link
Contributor

To expand on what we talked about yesterday it appears that docker and kubernetes both support external storage mechanisms that provide distributed storage. There appears to be no advantage to kubernetes now for this except that it may support more external storage options (though I haven't fully explored how you can push the limits of docker). I also learned that why distributed storage was listed as an advantage was because until docker 1.7 it didn't support volumes that weren't local.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants