Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JTReg jdk11-m2 Failure: PatchModuleImgTest_PlatformMod_0 #9048

Closed
M-Davies opened this issue Mar 31, 2020 · 26 comments
Closed

JTReg jdk11-m2 Failure: PatchModuleImgTest_PlatformMod_0 #9048

M-Davies opened this issue Mar 31, 2020 · 26 comments

Comments

@M-Davies
Copy link

M-Davies commented Mar 31, 2020

Failure link

  • test category, sanity.system
  • OS/architecture, aarch_64
openjdk version "11.0.7" 2020-04-14
OpenJDK Runtime Environment AdoptOpenJDK (build 11.0.7+9)
Eclipse OpenJ9 VM AdoptOpenJDK (build openj9-0.20.0-m2, JRE 11 Linux aarch64-64-Bit 20200330_133 (JIT enabled, AOT enabled)
OpenJ9   - c93e4dabc
OMR      - 1b6abd044
JCL      - 644e9abfc0 based on jdk-11.0.7+9)

https://ci.adoptopenjdk.net/job/Test_openjdk11_j9_sanity.system_aarch64_linux_xl/73

Optional info

Failure output (captured from console output)

STF 17:19:26.344 - +------ Step 6 - Create runtime image for PatchModule test
STF 17:19:26.344 - | Run jlink to create runtime image
STF 17:19:26.344 - |   ImageName: XpJVM
STF 17:19:26.344 - |
STF 17:19:26.345 - Running command: /home/jenkins/workspace/Test_openjdk11_j9_sanity.system_aarch64_linux_xl/openjdkbinary/j2sdk-image/jdk-11.0.7+9/bin/jlink --module-path /home/jenkins/workspace/Test_openjdk11_j9_sanity.system_aarch64_linux_xl/openjdkbinary/j2sdk-image/jdk-11.0.7+9/jmods:/home/jenkins/workspace/Test_openjdk11_j9_sanity.system_aarch64_linux_xl/openjdk-tests/TKG/test_output_15855859707042/PatchModuleImgTest_PlatformMod_0/20200330-171915-PatchModuleImageTest/modules --add-modules com.hello,com.hola,com.test,com.helper,com.discreet --output /home/jenkins/workspace/Test_openjdk11_j9_sanity.system_aarch64_linux_xl/openjdk-tests/TKG/test_output_15855859707042/PatchModuleImgTest_PlatformMod_0/20200330-171915-PatchModuleImageTest/tmp/6.XpJVM
STF 17:19:26.345 - Redirecting stderr to /home/jenkins/workspace/Test_openjdk11_j9_sanity.system_aarch64_linux_xl/openjdk-tests/TKG/test_output_15855859707042/PatchModuleImgTest_PlatformMod_0/20200330-171915-PatchModuleImageTest/results/6.JLNK.stderr
STF 17:19:26.345 - Redirecting stdout to /home/jenkins/workspace/Test_openjdk11_j9_sanity.system_aarch64_linux_xl/openjdk-tests/TKG/test_output_15855859707042/PatchModuleImgTest_PlatformMod_0/20200330-171915-PatchModuleImageTest/results/6.JLNK.stdout
STF 17:19:26.348 - Monitoring processes: JLNK
JLNK Error: Hash of java.xml (42716d30f0306920381b018875710d549bd395ce9a1201befaddc4526d092f9a) differs to expected hash (5be50b80d428f41e93deae8078ee7004b60be72f99b96d74e23a9f4966ebd8c6) recorded in java.base
STF 17:19:28.110 - **FAILED** Process JLNK ended with exit code (1) and not the expected exit code/s (0)
STF 17:19:28.110 - Monitoring Report Summary:
STF 17:19:28.111 -   o Process JLNK ended with exit code (1) and not the expected exit code/s (0)
STF 17:19:28.111 - Killing processes: JLNK
STF 17:19:28.112 -   o Process JLNK is not running
**FAILED** at step 6 (Run jlink). Expected return value=0 Actual=1 at /home/jenkins/workspace/Test_openjdk11_j9_sanity.system_aarch64_linux_xl/openjdk-tests/TKG/../TKG/test_output_15855859707042/PatchModuleImgTest_PlatformMod_0/20200330-171915-PatchModuleImageTest/execute.pl line 273.
STF 17:19:28.147 - **FAILED** execute script failed. Expected return value=0 Actual=1
@pshipton
Copy link
Member

@knn-k

@knn-k
Copy link
Contributor

knn-k commented Mar 31, 2020

Never seen this before.

@knn-k
Copy link
Contributor

knn-k commented Apr 1, 2020

I cannot recreate the failure from running 20+ times.

@pshipton
Copy link
Member

pshipton commented Apr 1, 2020

@knn-k where did you try?

@smlambert can @knn-k get access to test-packet-armv8-ubuntu-16-04

@knn-k
Copy link
Contributor

knn-k commented Apr 1, 2020

I tried it in my local environment with quad core.
I don't have access to test-packet-armv8-ubuntu-16-04.

@pshipton
Copy link
Member

pshipton commented Apr 1, 2020

Could also try it on cent7-aarch64-1.

@smlambert
Copy link
Contributor

Alternatively, to gain direct access to an AdoptOpenJDK machine, you'd need to request it by opening an openjdk-infrastructure issue.

@knn-k
Copy link
Contributor

knn-k commented Apr 1, 2020

I ran this test 100 times on cent7-aarch64-1 manually (50 times each with large heap build and compressed refs build), and I saw no failures.
I used the following large heap build.

$ jdk-11.0.7+9/bin/java -version
openjdk version "11.0.7" 2020-04-14
OpenJDK Runtime Environment AdoptOpenJDK (build 11.0.7+9-202003302325)
Eclipse OpenJ9 VM AdoptOpenJDK (build master-917892994, JRE 11 Linux aarch64-64-Bit 20200330_134 (JIT enabled, AOT enabled)
OpenJ9   - 917892994
OMR      - 0ecff81ea
JCL      - 34c3dd7d55 based on jdk-11.0.7+9)

@M-Davies
Copy link
Author

M-Davies commented Apr 2, 2020

https://ci.adoptopenjdk.net/job/Grinder/2722/tapResults/

92/100 failures on test-packet-armv8-ubuntu-16-04

EDIT: Running a new one at https://ci.adoptopenjdk.net/job/Grinder/2744. This time on a different machine (test-aws-rhel76-armv8-4) seeing as it might be infra specific

@knn-k
Copy link
Contributor

knn-k commented Apr 3, 2020

The run on test-aws-rhel76-armv8-4 above was OK.
I have no idea what is making the difference, but it could be the OS (ubuntu vs rhel/centos) ?

@M-Davies
Copy link
Author

M-Davies commented Apr 3, 2020

Grinder on a rhel machine passed https://ci.adoptopenjdk.net/job/Grinder/2744/. Im not sure what the differences between rhel/centos and ubuntu are in terms of this hash however

@M-Davies
Copy link
Author

M-Davies commented Apr 3, 2020

That's interesting. When running on an ubuntu based machine again (https://ci.adoptopenjdk.net/job/Grinder/2746/), the test only failed 1/100 times. The only difference between this and the grinder ran in #9048 (comment) is that the nightly is newer

@knn-k
Copy link
Contributor

knn-k commented Apr 8, 2020

I ran this test on test-packet-ubuntu1604-armv8-1 more than 50 times, and saw no failures.
I used the following build:

OpenJDK Runtime Environment AdoptOpenJDK (build 11.0.7+9-202004052328)
Eclipse OpenJ9 VM AdoptOpenJDK (build master-958d2f9b5, JRE 11 Linux aarch64-64-Bit Compressed References 20200405_262 (JIT enabled, AOT enabled)
OpenJ9   - 958d2f9b5
OMR      - 9d422b0b0
JCL      - 34c3dd7d55 based on jdk-11.0.7+9)

I guess (and I hope) something in recent fixes/changes stabilized this.

@knn-k
Copy link
Contributor

knn-k commented Jun 17, 2020

Possibly related: adoptium/temurin-build#1804

@knn-k
Copy link
Contributor

knn-k commented Jun 24, 2020

This test uses the native OpenSSL in running jlink. I got the following output in Step 6 of the test when I use export IBM_JAVA_OPTIONS=-Djdk.nativeCryptoTrace=true:

STF 07:43:35.303 - Monitoring processes: JLNK
JLNK stderr MessageDigest load - using Native crypto library.

Is the hash value calculated by the native MessageDigest?

@knn-k
Copy link
Contributor

knn-k commented Jun 30, 2020

This happened in https://ci.adoptopenjdk.net/job/Test_openjdk11_j9_sanity.system_aarch64_linux/186/tapResults/.

STF 01:58:11.979 - Monitoring processes: JLNK
JLNK Error: Hash of java.xml (5f00ea2861e07c8d1312700e47b6644c35c83853f9acf72a4012926d174376ec) differs to expected hash (48945f3a95128186c242f8ce18c8329f43ee7acaebce172d5566b656b210380f) recorded in java.base
STF 01:58:13.447 - **FAILED** Process JLNK ended with exit code (1) and not the expected exit code/s (0)

@knn-k
Copy link
Contributor

knn-k commented Jun 30, 2020

This happened in https://ci.adoptopenjdk.net/job/Test_openjdk11_j9_sanity.system_aarch64_linux/186/tapResults/.

STF 01:58:11.979 - Monitoring processes: JLNK
JLNK Error: Hash of java.xml (5f00ea2861e07c8d1312700e47b6644c35c83853f9acf72a4012926d174376ec) differs to expected hash (48945f3a95128186c242f8ce18c8329f43ee7acaebce172d5566b656b210380f) recorded in java.base
STF 01:58:13.447 - **FAILED** Process JLNK ended with exit code (1) and not the expected exit code/s (0)

sha256sum gives the following hash value, which is the same as that recorded in java.base:

$ sha256sum jdk-11.0.8+8/jmods/java.xml.jmod
48945f3a95128186c242f8ce18c8329f43ee7acaebce172d5566b656b210380f  jdk-11.0.8+8/jmods/java.xml.jmod

@DanHeidinga
Copy link
Member

Moving this forward as we've completed the milestone 2 builds for 0.21.0 and it's too late to put this in.

@knn-k
Copy link
Contributor

knn-k commented Jul 2, 2020

@Akira1Saitoh found a case where the native sha256sum returns a wrong hash value on cent7-aarch64-1, as reported in adoptium/temurin-build#1804.
It may mean that this problem is caused by the native OpenSSL instead of by the OpenJ9 runtime.

The version of OpenSSL on cent7-aarch64-1 is 1.0.2k.
Does the CentOS support newer version such as 1.0.2u on the machine?

What is the version of OpenSSL on other test machines?

@pshipton
Copy link
Member

pshipton commented Jul 7, 2020

@jdekonin @AdamBrousseau is there a newer version of openssl available for cent7-aarch64-1?

@jdekonin
Copy link
Contributor

jdekonin commented Jul 7, 2020

It doesn't appear so with the default repos, the system is 100% up-to-date. We could add a newer version to the machine and make it the default, depends on what the lowest operating system supported being targeted is I suppose.

@DanHeidinga
Copy link
Member

@knn-k will this complete for the 0.22 release? Should it be moved forward to 0.23?

@knn-k
Copy link
Contributor

knn-k commented Aug 25, 2020

@DanHeidinga If this is caused by OpenSSL on test servers as the native sha256sum command suggests, the only thing we can do is to stop using the native crypto library from OpenJ9.

@pshipton
Copy link
Member

Do we need this in the milestone plan? I'll remove it.

@0xdaryl
Copy link
Contributor

0xdaryl commented Dec 14, 2020

Evidence so far suggests this is a duplicate of #9046 and is related to the installed version of OpenSSL on the test machines. @Akira1Saitoh was able to demonstrate an incorrect hash value with an independent test outside of OpenJ9 and Java which strongly suggests this is not an OpenJ9 issue. I'm closing this as a dup of #9046.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants