Skip to content
This repository has been archived by the owner on Sep 12, 2024. It is now read-only.

Helios Testing broken with Docker for Mac (beta) #916

Closed
daenney opened this issue May 11, 2016 · 16 comments
Closed

Helios Testing broken with Docker for Mac (beta) #916

daenney opened this issue May 11, 2016 · 16 comments

Comments

@daenney
Copy link

daenney commented May 11, 2016

The Docker for Mac beta ships a Docker setup packaged and well running on top of xhyve. Though the docker CLI is smart enough to find it (it knows where the socket is) other tools that look for the docker daemon need the DOCKER_HOST environment variable to be set. In this case, it should point at unix:///var/tmp/docker.sock.

Unfortunately setting DOCKER_HOST seems to break the TemporaryJobs.builder() as that contains a fair amount of glue to try and figure out where the heck Docker went. This results in the following stack trace:

[main] WARN com.spotify.helios.client.Endpoints - Unable to resolve hostname null into IP address
java.net.UnknownHostException: null: unknown error
    at java.net.Inet6AddressImpl.lookupAllHostAddr(Native Method)
    at java.net.InetAddress$2.lookupAllHostAddr(InetAddress.java:928)
    at java.net.InetAddress.getAddressesFromNameService(InetAddress.java:1323)
    at java.net.InetAddress.getAllByName0(InetAddress.java:1276)
    at java.net.InetAddress.getAllByName(InetAddress.java:1192)
    at java.net.InetAddress.getAllByName(InetAddress.java:1126)
    at org.apache.http.impl.conn.SystemDefaultDnsResolver.resolve(SystemDefaultDnsResolver.java:45)
    at com.spotify.helios.client.Endpoints.of(Endpoints.java:92)
    at com.spotify.helios.client.Endpoints.of(Endpoints.java:79)
    at com.spotify.helios.client.HeliosClient$Builder.setEndpoints(HeliosClient.java:532)
    at com.spotify.helios.client.HeliosClient$Builder.setEndpointStrings(HeliosClient.java:548)
    at com.spotify.helios.testing.TemporaryJobs$Builder.endpointStrings(TemporaryJobs.java:663)
    at com.spotify.helios.testing.TemporaryJobs$Builder.endpoints(TemporaryJobs.java:658)
    at com.spotify.helios.testing.TemporaryJobs$Builder.configureWithEnv(TemporaryJobs.java:602)
    at com.spotify.helios.testing.TemporaryJobs$Builder.<init>(TemporaryJobs.java:562)
    at com.spotify.helios.testing.TemporaryJobs.builder(TemporaryJobs.java:507)
    at com.spotify.helios.testing.TemporaryJobs.builder(TemporaryJobs.java:502)
    at com.spotify.helios.testing.TemporaryJobs.builder(TemporaryJobs.java:494)
    at com.spotify.helios.testing.TemporaryJobs.builder(TemporaryJobs.java:490)
    at com.spotify.podservice.client.HttpPodserviceClientIT.<clinit>(HttpPodserviceClientIT.java:23)
    at sun.misc.Unsafe.ensureClassInitialized(Native Method)
    at sun.reflect.UnsafeFieldAccessorFactory.newFieldAccessor(UnsafeFieldAccessorFactory.java:43)
    at sun.reflect.ReflectionFactory.newFieldAccessor(ReflectionFactory.java:142)
    at java.lang.reflect.Field.acquireFieldAccessor(Field.java:1088)
    at java.lang.reflect.Field.getFieldAccessor(Field.java:1069)
    at java.lang.reflect.Field.get(Field.java:393)
    at org.junit.runners.model.FrameworkField.get(FrameworkField.java:73)
    at org.junit.runners.model.TestClass.getAnnotatedFieldValues(TestClass.java:230)
    at org.junit.runners.ParentRunner.classRules(ParentRunner.java:255)
    at org.junit.runners.ParentRunner.withClassRules(ParentRunner.java:244)
    at org.junit.runners.ParentRunner.classBlock(ParentRunner.java:194)
    at org.junit.runners.ParentRunner.run(ParentRunner.java:362)
    at org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:344)
    at org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:269)
    at org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:240)
    at org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:184)
    at org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:286)
    at org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:240)
    at org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:121)

I believe that this is caused by the following code path: https://github.com/spotify/helios/blob/83bb315662db0b1eede2385a178e2afe48d633e2/helios-testing/src/main/java/com/spotify/helios/testing/TemporaryJobs.java#L565-607

We end up hitting L596 and because DOCKER_HOST is set and will now try to parse unix:///var/tmp/docker.sock as a URI and call getHost() on it, which will return null. Because of the string concat and not explicitly checking for that null it means we end up adding an endpoint of http://null:5801.

This in turn causes the eventual java.net.UnknownHostException: null: unknown error when we try to do the DNS lookup for the host part of the endpoint.

@danielnorberg
Copy link
Contributor

Java and Unix sockets are not the best of friends, although I believe the docker client that helios-testing uses does support unix sockets.

Is it possible to simply configure docker beta for mac to listen on a tcp port?

@danielnorberg
Copy link
Contributor

Also, do you have any docker beta invites to hand out? ;)

@daenney
Copy link
Author

daenney commented May 13, 2016

I can use an IP but in this case that would be localhost/127.0.0.1 which also seems to confuse Helios: https://github.com/spotify/helios/blob/9b45e0ded17147d9f6081874c80b5bf94a1dd4da/docs/integration_tests_with_helios_solo.md#issues-if-docker_host-refers-to-localhost-or-127001 😞

Unfortunately I don't get additional beta keys to hand out but after signing up for it I got my key within 3 days. So big chance that you can get one for yourself fairly quickly.

@mattnworb
Copy link
Member

Thanks for tracing this down to the problematic code path. It seems like a bunch of tools make assumptions that docker-for-mac breaks.

I'm curious - does docker-for-mac set DOCKER_HOST for you by default, or tell you to run some sort of command similar to docker-machine env .. to set the proper variables? Or is it leaving it up to the user to figure out they have to set DOCKER_HOST to something?

@daenney
Copy link
Author

daenney commented May 16, 2016

It doesn't set DOCKER_HOST for you as the CLI binary that comes with it (and the other components) know where to find the socket in this case and just use that. Similarly it doesn't require the eval $(docker-machine env default) stuff.

The part I haven't figured out is why it goes coocoo when you specify a tcp://127.0.0.1:2375 as DOCKER_HOST, then even the Docker for Mac CLI breaks but setting it to a unix socket always works as expected even though it clearly is bound on that port. I'm guessing that has to do with how they're leveraging xhyve somehow.

@mattnworb
Copy link
Member

@daenney ah that is what I figured, thanks. Hopefully my beta invite comes through soon so I can test this out locally as well 😄

@mattnworb
Copy link
Member

@daenney just for completeness can you paste how your TemporaryJobs setup code looks like?

@daenney
Copy link
Author

daenney commented May 18, 2016

Sure thing, here you go.

    private static final HeliosDeploymentResource soloResource =
            new HeliosDeploymentResource(HeliosSoloDeployment.fromEnv().build());

    private static final TemporaryJobs temporaryJobs = TemporaryJobs.builder()
            .client(soloResource.client())
            .build();

    @ClassRule
    public static final RuleChain chain = RuleChain
            .outerRule(soloResource)
            .around(temporaryJobs);

    @BeforeClass
    public static void setupContainers() throws IOException {
        TemporaryJob podserviceJob = temporaryJobs.job()
                .image("<the-registry>/spotify/podservice-integrationtest:latest")
                .port("podservice", 12000)
                .deploy();

@mattnworb
Copy link
Member

@danielnorberg i also got my key just 2 days after signing up, there doesn't appear to be an actual wait

@mattnworb
Copy link
Member

#920 fixes a part of this, where TemporaryJobs.Builder tries to use the DOCKER_HOST value as a http:// URI.

The next issue I am seeing is that the container that is launched for my TemporaryJobs locally (with Docker for Mac) has an IP address like 172.17.0.1, which TemporaryJob uses when probing the container to see if the service in it is up or not. For some reason this is not routable from the test running on the host. This might have more to do with Docker for Mac than anything in Helios though, I need to look further.

@mattnworb
Copy link
Member

mattnworb commented May 18, 2016

Another incompatibility: exec healthchecks are rejected if you run helios-solo locally on Docker for Mac because these checks fail - docker.info().executionDriver() is "" for some reason.

@mattnworb
Copy link
Member

I created a topic on the Docker for Mac forums about the IP address issues I am having now: https://forums.docker.com/t/ip-address-given-to-container-is-not-routable-in-bridge-mode/12725. This might be the last issue with helios-testing and Docker for Mac.

@mattnworb
Copy link
Member

So, helios assumes that the IP address of the container (i.e. docker inspect -f {{.NetworkSettings.IPAddress}} container) is the address to use to reach the exposed ports of the service in that container. When TemporaryJob deploys a job with a port mapping, it probes the port at container_ip:port to check if the service is running before continuing with the test.

Docker for Mac gives a container an IP on a bridge, like 172.17.0.x, and then seems to map the external port of the container on the host (OS X). So if you expose port 80 on a container and it has IP address 172.17.0.2, you would connect to localhost:80 from the OS X host, not 172.17.0.2:80 (which you would use if using regular docker or docker-machine).

Routing to the container's IP seems to be not yet be supported in Docker for Mac.

@daenney
Copy link
Author

daenney commented May 19, 2016

Interesting rabbit hole you've gone down so far 😄

@mattnworb
Copy link
Member

#927 addresses some of my last comment. TemporaryJobs tries to figure out what address to use to communicate with any ports mapped from the container to the host, and ends up using a bad value for Docker for Mac.

I've found another new issue though - it seems like the port mapped by Docker for Mac accepts connections as soon as the container is started (i.e. if port 8080 on the container is mapped to localhost:31234, attempts to connect to localhost:31234 are accepted immediately), even before the process inside the container is listening to the port. This breaks assumptions by DefaultProber, which is responsible for determining when the service inside the container is "up" and when the actual test can resume.

If the TemporaryJob has a healthcheck, then there isn't an issue, but in cases where the "probing" of the port is assumed to be enough, then the test will end up running as soon as the container is started, which might be well before the container is actually accepting connections on the internal port.

mattnworb added a commit to spotify/dockerfile-maven that referenced this issue Jun 1, 2017
need to add a healthcheck to the TemporaryJob to avoid helios-testing
from thinking that the job is immediately available after the deploy
finishes.

see:
#13 (comment)
spotify/helios#916 (comment)
dflemstr pushed a commit to spotify/dockerfile-maven that referenced this issue Jun 1, 2017
need to add a healthcheck to the TemporaryJob to avoid helios-testing
from thinking that the job is immediately available after the deploy
finishes.

see:
#13 (comment)
spotify/helios#916 (comment)
@mattnworb
Copy link
Member

Closing this as there is one known issue outlined in the previous comment but otherwise helios-testing works with docker for mac (as far as we know)

vbhavsar pushed a commit that referenced this issue Aug 28, 2018
execStart should use the 'noTimeoutClient'.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants