Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a controller in the cluster keeping in sync routes and hosts file #1793

Merged
merged 7 commits into from
Apr 20, 2021

Conversation

guillaumerose
Copy link
Contributor

@guillaumerose guillaumerose commented Dec 22, 2020

  • The daemon has an API available in the VM: curl http://gateway/hosts/add or remove.
  • A controller is running the cluster and calls the daemon API.

For Windows, in the future, the daemon will forward request to the adminhelper daemon. Currently, a UAC prompt is shown.

@openshift-ci-robot
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: guillaumerose

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Makefile Outdated
@@ -88,12 +88,19 @@ $(BUILD_DIR)/macos-amd64/crc: $(SOURCES)
$(BUILD_DIR)/linux-amd64/crc: $(SOURCES)
GOOS=linux GOARCH=amd64 go build -ldflags="$(LDFLAGS)" -o $(BUILD_DIR)/linux-amd64/crc ./cmd/crc

$(BUILD_DIR)/linux-amd64/routes-controller: $(SOURCES)
CGO_ENABLED=0 GOOS=linux GOARCH=amd64 go build -ldflags="$(LDFLAGS)" -o $(BUILD_DIR)/linux-amd64/routes-controller ./cmd/routes-controller
Copy link
Contributor

@gbraad gbraad Dec 23, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

./cmd/routes-controller this source code likely has to become a repo on it's own as this is part of a container.

@gbraad
Copy link
Contributor

gbraad commented Dec 23, 2020

This 'routes-controller' is a separate binary? What is it's function?
Please describe this more clearly in the description of the PR

The problem I see is that this becomes another binary we need to maintain the lifecycle of. How will this work? Who maintains this? Why can't the/a (slightly more generic) daemon handle this? It seems this is part of a container run with podman


OK, so the routes-controller ends up in a container hosted in the VM... please make this part of the description. Technical details are important for a review.

The container build defintion need to be separated into their own repo. Which would also mean the routes-controller needs to be extracted.

Makefile Outdated
$(BUILD_DIR)/windows-amd64/crc.exe: $(SOURCES)
GOARCH=amd64 GOOS=windows go build -ldflags="$(LDFLAGS)" -o $(BUILD_DIR)/windows-amd64/crc.exe ./cmd/crc

$(HOST_BUILD_DIR)/crc-embedder: $(SOURCES)
go build --tags="build" -ldflags="$(LDFLAGS)" -o $(HOST_BUILD_DIR)/crc-embedder ./cmd/crc-embedder

.PHONY: routes-controller
routes-controller: $(BUILD_DIR)/linux-amd64/routes-controller
podman build -t quay.io/gurose/routes-controller:v1 -f cmd/routes-controller/Dockerfile $(BUILD_DIR)/linux-amd64/
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

quay.io/crcont/ ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cmd/routes-controller/Dockerfile containers need to live in their own repos as part of the security review.

@cfergeau
Copy link
Contributor

cfergeau commented Jan 6, 2021

Could the controller be running on the host since we have the crc daemon which is running in the background anyway? It seems it would be overall easier, no new binary/container, no new VM<->host API, ...

@@ -159,6 +159,10 @@ func adminHelperMux() *http.ServeMux {
http.Error(w, err.Error(), http.StatusBadRequest)
return
}
if runtime.GOOS == "windows" { // Disable it on Windows, it avois tons of UAC prompts.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

'avoids' (twice)

@guillaumerose
Copy link
Contributor Author

I have 4 choices in fact:

  • Use this controller + API
  • Make crc start run in the daemon so that I can know when the VM is started and kubeconfig is here. Then I can launch the controller
  • Add events to the network stack so that, when a new VM connects, I try to run the controller
  • Poll the kubeconfig file. When it appears, run the controller.

I guess the good solution is to make crc start the client of the daemon.

@cfergeau
Copy link
Contributor

I have 4 choices in fact:

* Use this controller + API

* Make crc start run in the daemon so that I can know when the VM is started and kubeconfig is here. Then I can launch the controller

* Add events to the network stack so that, when a new VM connects, I try to run the controller

* Poll the kubeconfig file. When it appears, run the controller.

I guess the good solution is to make crc start the client of the daemon.

What I had in mind was to rework machine.Start() a bit:

        // Post-VM start
        vmState, err := host.Driver.GetState()
        if err != nil {                                                                                                                   
                return nil, errors.Wrap(err, "Error getting the state")                                                                  
        }
        if vmState != state.Running {
                return nil, errors.Wrap(err, "CodeReady Containers VM is not running")
        }
                
        instanceIP, err := getIP(host, client.useVSock())
        if err != nil {
                return nil, errors.Wrap(err, "Error getting the IP")
        }
        sshRunner, err := crcssh.CreateRunner(instanceIP, getSSHPort(client.useVSock()), crcBundleMetadata.GetSSHKeyPath(), constants.GetPrivateKeyPath(), constants.GetRsaPrivateKeyPath())
        if err != nil {
                return nil, errors.Wrap(err, "Error creating the ssh client")
        }
        defer sshRunner.Close()
        
        logging.Debug("Waiting until ssh is available")
        if err := cluster.WaitForSSH(sshRunner); err != nil {
                return nil, errors.Wrap(err, "Failed to connect to the CRC VM with SSH -- host might be unreachable")
        }
        logging.Info("CodeReady Containers VM is running")

The kubeconfig file could be copied to ~/.crc/machines/crc around that time, rather than being done later as being done now. And the host DNS configuration (addOpenShiftHosts / runPostStartForOS) could be also be done after checking the VM is up, but before waiting for SSH. In the vsock case, addOpenShiftHosts would talk to the daemon, starting it if necessary.

@guillaumerose
Copy link
Contributor Author

After a long discussion, it seems that embedding this controller in the daemon is really complicated as machine.Start can be called from the crc CLI and from the daemon itself.
I propose we start with this controller in a pod and then, when crc CLI will always be a client of the daemon, move the controller to the daemon.

@guillaumerose
Copy link
Contributor Author

I extracted the controller code to https://github.com/code-ready/routes-controller and modified everything to get it working on Windows.

UAC prompts only appear two times at crc start time and then each time a route is created/updated/deleted.

cc @gbraad

@gbraad
Copy link
Contributor

gbraad commented Jan 25, 2021

So a UAC prompt is given twice on start, and each time on a route? That sounds like a lot (too much).

@guillaumerose
Copy link
Contributor Author

twice on start, and each time on a route

Yes twice (clean + add the default ones) and then, when you add/update/delete a route, you get a prompt.
Compared to now, this is just one eventual prompt if you run oc new-app.

It will be 0 when the adminhelper will be a daemon.

For now:

  • I can disable this for Windows and let the user do it by hand,
  • or keep as it is.

@gbraad
Copy link
Contributor

gbraad commented Jan 25, 2021

It will be 0 when the adminhelper will be a daemon.

That is what I am looking for. Can you outline the needed steps for this in a new issue or on the admin-helper issue?
Some estimation or timeline is appreciated. Especially to conclude if this should be merged or wait for to coincide with another possible follow-up.

@guillaumerose
Copy link
Contributor Author

This is already good to take. It makes the cluster work fully. The debate for the daemon will be more intense for sure. Let's start with this.

Copy link
Contributor

@cfergeau cfergeau left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, though probably best to land 'admin-helper as a daemon' first.

@@ -133,6 +146,44 @@ func run(configuration *types.Configuration, endpoints []string) error {
return <-errCh
}

// This API is only exposed in the virtual network. Any process can reach it by connecting to gateway.crc.testing.
func internalAPIMux() *http.ServeMux {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

internal is very non descriptive. hostsAPIMux() since this is handling requests under the /hosts path?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I renamed to gatewayAPIMux to indicate it's only gateway.crc.testing.

http.Error(w, err.Error(), http.StatusBadRequest)
return
}
if err := adminhelper.RemoveFromHostsFile(req...); err != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better to avoid the code c&p.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't get this. What do you mean?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code handling /hosts/add and /hosts/remove is the same except for the line calling adminhelper.AddToHostsFile / adminhelper.RemoveFromHostsFile, dunno if it's possible to factor this somehow without making the code too complicated?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes good point, done

Containers: []v1.Container{
{
Name: "routes-controller",
Image: "quay.io/crcont/routes-controller:6effc2f0304d3a11c1e22c278b10630e834d0220",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one more container image we need to remember to maintain :)

@@ -6,7 +6,12 @@ import (
"github.com/code-ready/crc/pkg/os/windows/powershell"
)

func execute(args ...string) error {
func executePrivileged(args ...string) error {
_, _, err := powershell.ExecuteAsAdmin("modifying hosts file", strings.Join(append([]string{goodhostPath}, args...), " "))
return err
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably better to land first the admin-helper daemon work?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

admin-helper daemon still require some work: a preflight to install it, define the protocol, etc.

With this PR, we solve it for macOS and Linux. For Windows, the experience is not too bad. There is just one UAC prompt per new route.

This network stack could be GA on macOS/Linux after this PR (+ some minor enhancements, better errors, etc.).
I think we have the feature parity with the traditional network stack.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This network stack could be GA on macOS/Linux after this PR (+ some minor enhancements, better errors, etc.).
I think we have the feature parity with the traditional network stack.

daemon autostart is missing on linux (and on macos setups without the tray running). And if I'm not mistaken this also does not handle routes using non-standard ports? I guess autostart falls under the 'minor enhancements category')? I don't mind the non-standard thing for now, I agree we can push for wider testing after this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does not handle routes using non-standard ports

I tried to find examples or code around that but I couldn't find it. The only route I see like this would be the one for Kubernetes API.
Router container only exposes port 80 and 443, not others. I don't really understand how an other port can be used.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought our own doc had such examples ( https://code-ready.github.io/crc/#deploying-sample-application-with-odo_gsg ) but after being deployed, the app is exposed on port 80/443, not on 8080 as I expected.
https://docs.okd.io/latest/networking/configuring_ingress_cluster_traffic/configuring-ingress-cluster-traffic-load-balancer.html may be related to this, but is not working with the default networking anyway.

@guillaumerose guillaumerose force-pushed the routescontroller branch 2 times, most recently from c4dc2f6 to aa4047c Compare April 14, 2021 19:21
@guillaumerose
Copy link
Contributor Author

/retest

3 similar comments
@guillaumerose
Copy link
Contributor Author

/retest

@guillaumerose
Copy link
Contributor Author

/retest

@guillaumerose
Copy link
Contributor Author

/retest

Copy link
Contributor

@cfergeau cfergeau left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still a few minor comments, but overall looks good to me.

return execute(append([]string{"add", instanceIP}, hostnames...)...)
var filtered []string
for _, hostname := range hostnames {
stdout, err := execute("contains", instanceIP, hostname)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you considered using github.com/code-ready/admin-helper/pkg/hosts directly? contains implementation is:

func contains(args []string) error {
        hosts, err := hosts.New()
        if err != nil {
                return err
        }
        if hosts.Contains(args[0], args[1]) {
                fmt.Println("true")
        } else {
                fmt.Println("false")
        }
        return nil
}

which is the same amount of code, and would be more efficient.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes done. I wanted to avoid to vendor admin-helper in crc. We just need to remind to bump it in 2 places.

if err != nil {
return err
}
if err := sshRunner.CopyData(bin, "/tmp/routes-controller.json", 0444); err != nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@guillaumerose this is only pod definition so if somehow this pod crash then this route stuff will break, should we have deployment as Kind instead pod so that pod recreation happen if pod crashes?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Deployment are useful version management. For crash, there is the RestartPolicy on the pod directly. It is "Always" by default so we should be good.

I tried to kill the process by hand. It is restarted.
I tried to start/stop/start the VM. The process is here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@guillaumerose For me it is not happening.

13:07 $ oc get pods -n openshift-ingress
NAME                              READY   STATUS    RESTARTS   AGE
router-default-7476bbd498-g9r7k   1/1     Running   1          13d
routes-controller                 1/1     Running   0          30m

13:08 $ oc delete pod routes-controller -n openshift-ingress
pod "routes-controller" deleted

13:11 $ oc get pods -n openshift-ingress
NAME                              READY   STATUS    RESTARTS   AGE
router-default-7476bbd498-g9r7k   1/1     Running   1          13d

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes sure, I mean kill with sudo kill -9 pid.
If the user does this, why bother him and recreate the pod ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since it is part of openshift-ingress and if a user just delete all the pods under this namespace, all pods except this one come up without any issue, Also can we make it as static pod if we really want it to be running always?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to add in static pod directory but it doesn't work:

Apr 19 12:19:17 crc-xl2km-master-0 hyperkube[2154]: E0419 12:19:17.473167    2154 kubelet.go:1671] Failed creating a mirror pod for "routes-controller-crc-xl2km-master-0_openshift-ingress(fd89789fbe33ba7075234cb104ec8e39)": pods "routes-controller-crc-xl2km-master-0" is forbidden: a mirror pod may not reference service accounts

if a user just delete all the pods under this namespace

It should not happen or he really wants to do this?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not blocker for this PR so let's have some follow up issue in case we hit it.

A controller running in the VM will be able to add and remote hosts file
entries.

The API is reachable within the virtual network only at
http://gateway.crc.testing.
It will keep hosts file in sync with routes. It talks to the http server
running at gateway.crc.testing:80.
These routes are the default routes. They are in every OpenShift
cluster.
Hosts file on Windows also contains crc entries. crc cleanup should
remove them.
Each time the routes controller reconciles, a UAC prompt appear because
admin-helper need to check if the entry is present or not.
With this new unprivileged check, the entry is added only if needed.
This saves couple of UAC prompts on Windows.
vendor/github.com/dimchansky/utfbom/.travis.yml is breaking this script.
git is replacing automatically CRLF by LF and this script cacthes the
difference.
@guillaumerose
Copy link
Contributor Author

retest this please

@praveenkumar
Copy link
Member

/lgtm

@guillaumerose guillaumerose merged commit 088d645 into crc-org:master Apr 20, 2021
guillaumerose added a commit to guillaumerose/snc that referenced this pull request Apr 21, 2021
The feature was added in crc by crc-org/crc#1793
The image digest needs to be sync between both projects.
guillaumerose added a commit to guillaumerose/snc that referenced this pull request Apr 27, 2021
The feature was added in crc by crc-org/crc#1793
The image digest needs to be sync between both projects.

https://github.com/code-ready/routes-controller/releases/tag/v0.0.1
guillaumerose added a commit to guillaumerose/snc that referenced this pull request Apr 29, 2021
The feature was added in crc by crc-org/crc#1793
The image digest needs to be sync between both projects.

https://github.com/code-ready/routes-controller/releases/tag/v0.0.1
praveenkumar pushed a commit to crc-org/snc that referenced this pull request Apr 29, 2021
The feature was added in crc by crc-org/crc#1793
The image digest needs to be sync between both projects.

https://github.com/code-ready/routes-controller/releases/tag/v0.0.1
openshift-cherrypick-robot pushed a commit to openshift-cherrypick-robot/snc that referenced this pull request Apr 29, 2021
The feature was added in crc by crc-org/crc#1793
The image digest needs to be sync between both projects.

https://github.com/code-ready/routes-controller/releases/tag/v0.0.1
praveenkumar pushed a commit to crc-org/snc that referenced this pull request Apr 29, 2021
The feature was added in crc by crc-org/crc#1793
The image digest needs to be sync between both projects.

https://github.com/code-ready/routes-controller/releases/tag/v0.0.1
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants