-
-
Notifications
You must be signed in to change notification settings - Fork 249
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
aarch64 build failure, Hash of java.rmi (...) differs to expected hash #1804
Comments
Previously seen as an intermittent #1450 (comment) |
That machine has far more than 29 cores so that number shouldn't be a problem (unless there is a problem in the makefile that this happens to exacerbate). Do we know if this is only showing up on one machine? |
I suspect this might be showing up a jdk11 makefile issue, really hard to tell and work out, it's just 29 jobs is a large number. |
I wrote a small program that calculates the hash value of a jmod file for reproducing the problem. |
Similar error with Hotspot on amd64: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=944738 |
I could reproduce the problem on cent7-aarch64-1. I was also able to reproduce by manually launching |
I started to have a think about this one, as it's causing regular build breaks... and I was wondering what reasons could there be for the "recorded" java.base hash, being different to the dependent jmod hash? I can think of the following:
|
Running a build with some manual debug added: https://ci.adoptopenjdk.net/view/work%20in%20progress/job/andrew-jdk11u-linux-aarch64-openj9-linuxXL/ |
I think this is not a problem with GNU make or makefile because I can reproduce the hash mismatch error with simply running |
@andrew-m-leonard What is the version of OpenSSL on the build server? |
Surprisingly, I saw that sometimes the output of
|
That is interesting i've just created a job doing a sha256sum loop, and in one instance out of 10000 it got a different hash: however, having run the job numerous times since it hasn't failed again... weird! |
build-packet-centos74-armv8-1, openssl version : |
1.0.2k as supplied by RedHat |
@sxa can we upgrade openssl, as it's looking like a bug in openssl ? |
or a file system issue, returning different file data...? |
Well we could build our own one but we SHOULD be building with the one supplied with the OS so arguably RedHat should fix it. I'm tempted to suggest it might be file system related. I guess https://ci.adoptopenjdk.net/job/andrew-aarch-hash-debug/10/console is the run where you had one failure out of ... a lot? |
Correct job 10 |
Yes, that is another possibility. |
If this is a file system issue, the error can happen with other checksum command which does not use OpenSSL, such as |
This is not a file system issue - I've been able to replicate it when testing against a file on a ramdisk on the machine ( |
Is this possibly related to #1214? That issue occurs on x86-64 and could be the same as Debian #944738, where it cropped up ~8 months ago. I haven't looked at it in any depth, it seems like it could be an issue with a common toolchain component. The reporters of the Debian issue note that it doesn't occur with Oracle's builds. |
@tmancill @sxa #1214 maybe related, but fyi I have been able to replicate using a basic loop of sha256sum alone, and it fails very occaisionally on this particular machine, but works always on another machine... |
@andrew-m-leonard Ah, in that case these sound like distinct issues. Thanks! |
Similar issues are happening happening within docker images on test-packet-ubuntu1604-armv8-1 host. Attempting to upgrade to a later ubuntu may cause issues due to:
on an apt-get upgrade .. Hopefully this isn't a problem ... Current plan (since machine is unusable for most practical purposes just now) is to attempt to do a release upgrade on it. |
Sample errors seen on some of the test jobs on the same systems:
or
|
OK ... For now I have enabled the Separately I have enabled multiple docker containers under build-packet-ubuntu1804-armv8l-1 (Five Ubuntu, four Fedora, all limited to use 8 of the 64-cores each) which are to be used for testing - this is also not a ThunderX system and has so far not exhibited the crypto issues. I'll enable further distributions if this looks stable |
Ref OpenJ9 issue: eclipse-openj9/openj9#9046 |
Closing and will persue any subsequent remediation work we can identify under adoptium/infrastructure#1897 |
Platform:
aarch64
https://ci.adoptopenjdk.net/view/Failing%20Builds/job/build-scripts/job/jobs/job/jdk11u/job/jdk11u-linux-aarch64-openj9-linuxXL/223/consoleFull
I am suspecting this maybe an openjdk build concurrency issue.
Workaround maybe reducing the aarch64 concurrency, it is currently using 29 jobs:
The text was updated successfully, but these errors were encountered: