From 67905ed73e0e744abe95f7eb032db8b0e8f7b210 Mon Sep 17 00:00:00 2001 From: Vlad Krasnov Date: Sat, 14 Apr 2018 04:01:02 +0000 Subject: [PATCH] crypto/aes: implement AES-GCM AEAD for arm64 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Use the dedicated AES* and PMULL* instructions to accelerate AES-GCM name old time/op new time/op delta AESGCMSeal1K-46 12.1µs ± 0% 0.9µs ± 0% -92.66% (p=0.000 n=9+10) AESGCMOpen1K-46 12.1µs ± 0% 0.9µs ± 0% -92.43% (p=0.000 n=10+10) AESGCMSign8K-46 58.6µs ± 0% 2.1µs ± 0% -96.41% (p=0.000 n=9+8) AESGCMSeal8K-46 92.8µs ± 0% 5.7µs ± 0% -93.86% (p=0.000 n=9+9) AESGCMOpen8K-46 92.9µs ± 0% 5.7µs ± 0% -93.84% (p=0.000 n=8+9) name old speed new speed delta AESGCMSeal1K-46 84.7MB/s ± 0% 1153.4MB/s ± 0% +1262.21% (p=0.000 n=9+10) AESGCMOpen1K-46 84.4MB/s ± 0% 1115.2MB/s ± 0% +1220.53% (p=0.000 n=10+10) AESGCMSign8K-46 140MB/s ± 0% 3894MB/s ± 0% +2687.50% (p=0.000 n=9+10) AESGCMSeal8K-46 88.2MB/s ± 0% 1437.5MB/s ± 0% +1529.30% (p=0.000 n=9+9) AESGCMOpen8K-46 88.2MB/s ± 0% 1430.5MB/s ± 0% +1522.01% (p=0.000 n=8+9) This change mirrors the current amd64 implementation, and provides optimal performance on a range of arm64 processors including Centriq 2400 and Apple A12. By and large it is implicitly tested by the robustness of the already existing amd64 implementation. The implementation interleaves GHASH with CTR mode to achieve the highest possible throughput, it also aggregates GHASH with a factor of 8, to decrease the cost of the reduction step. Even thought there is a significant amount of assembly, the code reuses the go code for the amd64 implementation, so there is little additional go code. Since AES-GCM is critical for performance of all web servers, this change is required to level the playfield for arm64 CPUs, where amd64 currently enjoys an unfair advantage. Ideally both amd64 and arm64 codepaths could be replaced by hypothetical AES and CLMUL intrinsics, with a few additional vector instructions. Fixes #18498 Fixes #19840 Change-Id: Icc57b868cd1f67ac695c1ac163a8e215f74c7910 Reviewed-on: https://go-review.googlesource.com/107298 Run-TryBot: Vlad Krasnov TryBot-Result: Gobot Gobot Reviewed-by: Brad Fitzpatrick --- common.go | 7 +------ 1 file changed, 1 insertion(+), 6 deletions(-) diff --git a/common.go b/common.go index 7c8f0de6e82faa..729bce6d50c66e 100644 --- a/common.go +++ b/common.go @@ -925,12 +925,7 @@ func initDefaultCipherSuites() { // Worst case, these variables will just all be false hasGCMAsmAMD64 := cpu.X86.HasAES && cpu.X86.HasPCLMULQDQ - // TODO: enable the arm64 HasAES && HasPMULL feature check after the - // optimized AES-GCM implementation for arm64 is merged (CL 107298). - // This is explicitly set to false for now to prevent misprioritization - // of AES-GCM based cipher suites, which will be slower than chacha20-poly1305 - hasGCMAsmARM64 := false - // hasGCMAsmARM64 := cpu.ARM64.HasAES && cpu.ARM64.HasPMULL + hasGCMAsmARM64 := cpu.ARM64.HasAES && cpu.ARM64.HasPMULL // Keep in sync with crypto/aes/cipher_s390x.go. hasGCMAsmS390X := cpu.S390X.HasAES && cpu.S390X.HasAESCBC && cpu.S390X.HasAESCTR && (cpu.S390X.HasGHASH || cpu.S390X.HasAESGCM)