Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for linux/arm64 (ARMv8, aarch64) #466

Closed
1 task done
vielmetti opened this issue Apr 4, 2017 · 28 comments
Closed
1 task done

Support for linux/arm64 (ARMv8, aarch64) #466

vielmetti opened this issue Apr 4, 2017 · 28 comments

Comments

@vielmetti
Copy link

vielmetti commented Apr 4, 2017

  • Feature Request or Change Proposal

OS/Container environment:

ARMv8 server is a Packet 2A (Cavium ThunderX, 96-core at 2 Ghz)

Steps or code to reproduce the issue:

Expected result:

linux-arm64 supported release

Actual result:

No files found.

As of 2017-04-04, build works fine, tests fail until timeouts are extended, and we've identified a performance issue on ARMv8 Go 1.8 crypto/tls. Further work pending Go performance improvements on ARMv8.

Feature Requests

Use Case:

Two use cases: one for ARMv8 single-board computers (e.g. Raspberry Pi 3, Odroid C2, Pine64); another for ARMv8 in the data center (e.g. Cavium ThunderX).

Proposed Change:

Build and test for arm64, validate that it works, add as supported release.

Who Benefits From The Change(s)?

Users of arm64 (ARMv8) platforms as listed above.

Alternative Approaches

Planning to build from source and see how that goes; I'll use this issue to identify anything that comes up.

@vielmetti
Copy link
Author

test fails; looking into this:

root@docker-build-test:~/src/nats-io/gnatsd# go build
root@docker-build-test:~/src/nats-io/gnatsd# go test ./...
?       nats-io/gnatsd  [no test files]
?       nats-io/gnatsd/auth     [no test files]
ok      nats-io/gnatsd/conf     0.018s
ok      nats-io/gnatsd/logger   0.618s
ok      nats-io/gnatsd/server   24.935s
ok      nats-io/gnatsd/server/pse       0.104s
--- FAIL: TestServerRestartReSliceIssue (10.01s)
panic: Unable to start NATS Server in Go Routine [recovered]
        panic: Unable to start NATS Server in Go Routine

goroutine 44 [running]:
panic(0x8154a0, 0x482000b7e0)
        /usr/lib/go-1.6/src/runtime/panic.go:481 +0x384
testing.tRunner.func1(0x4820250870)
        /usr/lib/go-1.6/src/testing/testing.go:467 +0x168
panic(0x8154a0, 0x482000b7e0)
        /usr/lib/go-1.6/src/runtime/panic.go:443 +0x4b4
nats-io/gnatsd/test.RunServerWithAuth(0x482027c3c0, 0x0, 0x0, 0xffff9c66e110)
        /root/src/nats-io/gnatsd/test/test.go:102 +0x180
nats-io/gnatsd/test.RunServerWithConfig(0x9c72f0, 0x14, 0x0, 0x482027c3c0)
        /root/src/nats-io/gnatsd/test/test.go:79 +0x2a4
nats-io/gnatsd/test.runServers(0x4820250870, 0x0, 0x0, 0x0, 0x0)
        /root/src/nats-io/gnatsd/test/cluster_test.go:66 +0x4c
nats-io/gnatsd/test.TestServerRestartReSliceIssue(0x4820250870)
        /root/src/nats-io/gnatsd/test/client_cluster_test.go:17 +0x3c
testing.tRunner(0x4820250870, 0xbff288)
        /usr/lib/go-1.6/src/testing/testing.go:473 +0xbc
created by testing.RunTests
        /usr/lib/go-1.6/src/testing/testing.go:582 +0x65c
FAIL    nats-io/gnatsd/test     11.477s
?       nats-io/gnatsd/util     [no test files]
?       nats-io/gnatsd/vendor/github.com/nats-io/nuid   [no test files]
?       nats-io/gnatsd/vendor/golang.org/x/crypto/bcrypt        [no test files]
?       nats-io/gnatsd/vendor/golang.org/x/crypto/blowfish      [no test files]
?       nats-io/gnatsd/vendor/golang.org/x/sys/windows  [no test files]
?       nats-io/gnatsd/vendor/golang.org/x/sys/windows/registry [no test files]

@vielmetti
Copy link
Author

Run from command line works just fine - at least the server comes up.

Is there a particularly good client you'd recommend to exercise the server, @kozlovic ? Happy to bash on it to see if I can trigger whatever this issue is.

@kozlovic
Copy link
Member

kozlovic commented Apr 4, 2017

Just realized that it worked for server package.
Could you make sure that there is no gnatsd running in the background and then do this just to check:

go test -race -v -p=1 ./...

@kozlovic
Copy link
Member

kozlovic commented Apr 4, 2017

The -p=1 will ensure that each package is run after the other. I am just wondering if there could be ports conflicts between the tests in different packages. We normally try to use different ports, and it works fine on Travis, but it could be just luck.

@vielmetti
Copy link
Author

go test -race is not available in Go 1.6.x on arm64 on Ubuntu.

With -p=1 I get a lot more tests to pass, but a few still fail, all related to TLS:

root@docker-build-test:~# grep FAIL gnats-test.out
--- FAIL: TestTLSConnz (1.12s)
--- FAIL: TestPingSentToTLSConnection (0.71s)
--- FAIL: TestTLSConnection (1.19s)
--- FAIL: TestTLSBadAuthError (1.11s)
FAIL
FAIL    nats-io/gnatsd/test     73.991s
root@docker-build-test:~# go version
go version go1.6.3 linux/arm64

@vielmetti
Copy link
Author

Looking in a little more detail, here are all of the error messages:

root@docker-build-test:~# grep "version 4552" gnats-test.out
        monitor_test.go:337: Got an error on Connect with Secure Options: tls: received record with version 4552 when expecting version 303
        test.go:128: Error writing command to conn: tls: received record with version 4552 when expecting version 303
        tls_test.go:44: Got an error on Connect with Secure Options: tls: received record with version 4552 when expecting version 303
        tls_test.go:252: Excpected and auth violation, got tls: received record with version 4552 when expecting version 303

@kozlovic
Copy link
Member

kozlovic commented Apr 4, 2017

You may want to try with a newer version of Go, just to make sure.

@kozlovic
Copy link
Member

kozlovic commented Apr 4, 2017

Oh, that's because the timeouts are too small.

@kozlovic
Copy link
Member

kozlovic commented Apr 4, 2017

Let me see in which place you would have to increase this timeout to make sure that's only that.

@kozlovic
Copy link
Member

kozlovic commented Apr 4, 2017

Two things you could try:

  • Run those failed tests individually to see if they still fail: go test -v -run=TestTLSConnz ./test
  • Increase the auth and tls timeout values
    Since there may be several places in tests to change that value, I would just override in the server code, again just to make sure that this is simply a timeout issue.
    That would be in server/server.go:591:
//ttl := secondsToDuration(s.opts.TLSTimeout)
ttl := 10*time.Second

And server/client.go:1244:

// c.atmr = time.AfterFunc(d, func() { c.authTimeout() }) 
c.atmr = time.AfterFunc(10*time.Second, func() { c.authTimeout() })

Could you please try and report back?

@vielmetti
Copy link
Author

Single test still fails:

root@docker-build-test:~/src/github.com/nats-io/gnatsd# go test -v -run=TestTLSConnz ./test
=== RUN   TestTLSConnz
--- FAIL: TestTLSConnz (1.11s)
        monitor_test.go:337: Got an error on Connect with Secure Options: tls: received record with version 4552 when expecting version 303
FAIL
exit status 1
FAIL    github.com/nats-io/gnatsd/test  1.134s

My version of Go is 1.6.3 which is older than the one you recommend; I'll report back separately testing under Go 1.8.

root@docker-build-test:~/src/github.com/nats-io/gnatsd# go version
go version go1.6.3 linux/arm64

@kozlovic
Copy link
Member

kozlovic commented Apr 4, 2017

When you ran the test, have you override the timeouts? For that test specifically, if you do not want to tweak the code, you can modify the config file used in this test:

test/configs/tls.conf

Change both timeout values in this file to 10 instead of 2 and 1.

@vielmetti
Copy link
Author

Wtih longer timeouts, the 10 second times patched in above into client.c and server.c, we pass a test:

=== RUN   TestTLSConnz
--- PASS: TestTLSConnz (2.25s)
PASS
ok      github.com/nats-io/gnatsd/test  2.269s

minio has some accelerated crypto routines which should speed up TLS, if that timeout is due to slow performance.

@kozlovic
Copy link
Member

kozlovic commented Apr 4, 2017

Ok, now the problem re-running the whole test suite with the override is that you may then get some test failures because the test expect the timeout to occur say within 2 seconds. But we should be able to figure out if that's the case based on the test name.

@vielmetti
Copy link
Author

vielmetti commented Apr 4, 2017

All the TLS tests now pass, but there's one test that fails:

=== RUN   TestAuthClientNoConnect
--- FAIL: TestAuthClientNoConnect (3.03s)
        test.go:128: Error reading from conn: read tcp 127.0.0.1:43868->127.0.0.1:
10422: i/o timeout

                2 - /root/src/github.com/nats-io/gnatsd/test/auth_test.go:80
                3 - /usr/lib/go-1.6/src/testing/testing.go:473
                4 - /usr/lib/go-1.6/src/runtime/asm_arm64.s:975

The code in auth_test.go:80 reads

        // This is timing dependent..
        time.Sleep(server.AUTH_TIMEOUT)

@kozlovic
Copy link
Member

kozlovic commented Apr 4, 2017

Yes, like I said. So it means that the only failures you got were due to timeout. What surprises me is that you go the failures in the first place. Even with current values (sometimes as low as 0.5 is some config files), it works even when running the suite on Travis, which sometimes is way slower than when we run on our personal laptops. So it is a bit surprising considering the spec of your machine?

@vielmetti
Copy link
Author

The timeouts are very surprising given the spec of the machine. I'm going to rebuild with Go 1.8 next, because I know I've seen speed improvements overall with that, and maybe that is enough to help.

With one failed test, I get this as an overall test time:

FAIL    github.com/nats-io/gnatsd/test  80.181s

and it looks like the last log on Travis runs the same tests in

ok  	github.com/nats-io/gnatsd/test	68.013s

@vielmetti
Copy link
Author

With Go 1.8 it fails a little faster

FAIL    github.com/nats-io/gnatsd/test  78.930s

still failing in

--- FAIL: TestAuthClientNoConnect (3.03s)

I'm sure that's because Go is using software crypto on arm64, rather than the hardware instructions on the chip. The minio code is at https://github.com/minio/sha256-simd which might help.

@vielmetti
Copy link
Author

Nope, gnatsd doesn't use the sha256 code, but I was able to benchmark Go's crypto/tls and found it wanting on arm64. The open issue is

golang/go#19840

I'll chase this upstream, for the moment let's mark this issue as "on hold", and I'll work to get a performance improvement.

@vielmetti
Copy link
Author

Go 1.9 beta 1 is out and has a binary build for ARM64 (yay).

According to the referenced issue golang/go#19840 the opportunity for this particular performance issue to be resolved in Go for ARM will come in the Go 1.10 timeframe. However there may be other performance improvements in Go 1.9 so that's worth a quick test.

@ghost
Copy link

ghost commented Aug 19, 2017

Is it possible for you to summarise the state of the aarch64 server build, we are very interested in using it on our embedded aarch64 platform as a control plane enabler.

@vielmetti
Copy link
Author

@salerio - what are the specs for your aarch64 platform? The concern expressed above was that some of the crypto instruction in Go on aarch64 are not hardware accelerated, and that the soft versions of the algorithms have poor performance on one system (Cavium ThunderX).

@ghost
Copy link

ghost commented Aug 22, 2017

Its a Xilinx UltraScale+ MPSoC which has 4 x Cortex-A53 CPU complex. Although there are crypto accelerators in the SoC I doubt anyone (any standard software that is) will make sure of them yet as the part is very new.

See https://www.xilinx.com/products/silicon-devices/soc/zynq-ultrascale-mpsoc.html

@vielmetti
Copy link
Author

@ghost does this MPSoC from Xilinx have an FPGA in it?

@vielmetti
Copy link
Author

Go 1.11beta1 is out, I would like to test performance with it.

@derekcollison
Copy link
Member

We would be interested in what you find, keep us posted.

@vielmetti
Copy link
Author

Thanks @derekcollison I have opened up #695 to address the question of "how do you test performance".

@derekcollison
Copy link
Member

Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants