Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Out Of Memory Issue On test-osuosl-aix72-ppc64-5 #3513

Open
steelhead31 opened this issue Apr 9, 2024 · 17 comments
Open

Out Of Memory Issue On test-osuosl-aix72-ppc64-5 #3513

steelhead31 opened this issue Apr 9, 2024 · 17 comments

Comments

@steelhead31
Copy link
Contributor

** ppc64_aix - StreamingBody and ThreadStartTest fail with out of memory issues on test-osuosl-aix72-ppc64-5

java/net/httpclient/StreamingBody.java
...
iteration: 76
03:27:10  [47.812s][warning][os,thread] Failed to start thread "Unknown thread" - pthread_create failed (11=EAGAIN) for attributes: stacksize: 2112k, guardsize: 0k, detached.
03:27:10  [47.812s][warning][os,thread] Failed to start the native thread for java.lang.Thread "HttpClient-377-SelectorManager"
03:27:10  test StreamingBody.test("https://127.0.0.1:58465/https2/streamingbody/z"): failure
03:27:10  java.lang.OutOfMemoryError: unable to create native thread: possibly out of memory or process/resource limits reached
03:27:10  	at java.base/java.lang.Thread.start0(Native Method)
03:27:10  	at java.base/java.lang.Thread.start(Thread.java:809)
03:27:10  	at java.net.http/jdk.internal.net.http.HttpClientImpl.start(HttpClientImpl.java:337)
03:27:10  	at java.net.http/jdk.internal.net.http.HttpClientImpl.create(HttpClientImpl.java:271)
03:27:10  	at java.net.http/jdk.internal.net.http.HttpClientBuilderImpl.build(HttpClientBuilderImpl.java:135)
03:27:10  	at StreamingBody.test(StreamingBody.java:102)


javax/management/mxbean/ThreadStartTest.java
05:44:38  STDOUT:
05:44:38  [0.312s][warning][os,thread] Failed to start thread "Unknown thread" - pthread_create failed (11=EAGAIN) for attributes: stacksize: 2112k, guardsize: 0k, detached.
05:44:38  [0.312s][warning][os,thread] Failed to start the native thread for java.lang.Thread "Thread-911"
05:44:38  STDERR:
05:44:38  java.lang.OutOfMemoryError: unable to create native thread: possibly out of memory or process/resource limits reached
05:44:38  	at java.base/java.lang.Thread.start0(Native Method)
05:44:38  	at java.base/java.lang.Thread.start(Thread.java:809)
05:44:38  	at ThreadStartTest.main(ThreadStartTest.java:53)
05:44:38  	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
05:44:38  	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
05:44:38  	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
05:44:38  	at java.base/java.lang.reflect.Method.invoke(Method.java:568)
05:44:38  	at com.sun.javatest.regtest.agent.MainWrapper$MainTask.run(MainWrapper.java:138)
05:44:38  	at java.base/java.lang.Thread.run(Thread.java:840)

...

Example of new test added via the upstream PR running in this Jenkins job:
Grinder/9156 scratch that, will need a PR to aqa-tests to support this case (new tests coming in from an upstream PR, instead of pulling from merged material from the mirror repository.

Originally posted by @smlambert in adoptium/aqa-tests#5137 (comment)

@sxa
Copy link
Member

sxa commented Nov 1, 2024

@smlambert Is this one still a concern i.e. is it a valid issue with a current test in one of the suites that we need to address (in which case if we can get a grinder link that would be great), or did the problematic test only exist as part of that PR?
Since it sounds like it was a test introduced as part of the PR that was under test, is it possible the test is problematic on AIX generally?

@sxa
Copy link
Member

sxa commented Nov 5, 2024

From Shelley - seems that it may be specific to some machines, so we should re-run on multiple machines to understand which ones have the problem and identify the differences.

@sxa sxa added this to the 2024-11 (November) milestone Nov 5, 2024
@steelhead31
Copy link
Contributor Author

Rerunning tests on problem machine : https://ci.adoptium.net/job/Grinder/11402/

@steelhead31 steelhead31 moved this from Todo to In Progress in 2024 4Q Adoptium Plan Nov 18, 2024
@steelhead31
Copy link
Contributor Author

java/net/httpclient/StreamingBody.java

appears to pass ok on -5 ( and other machines.. ) when run independently  https://ci.adoptium.net/job/Grinder/11406/

@steelhead31
Copy link
Contributor Author

javax/management/mxbean/ThreadStartTest.java

doesnt appear to work on any AIX test machine

@steelhead31
Copy link
Contributor Author

@jiekang
Copy link

jiekang commented Nov 18, 2024

The ThreadStartTest is very simple, it creates and starts 1000 threads. It seems thread management on AIX is a bit... odd.

See
https://bugs.openjdk.org/browse/JDK-8311921

And it's PR:
openjdk/jdk#14845

The conversation there is interesting, and code may be good to examine for the next steps to resolving this. The test is failing relatively consistently around the ~910 threads mark.

Maybe setting MaxExpectedDataSegmentSize to a higher value will help.

@sxa
Copy link
Member

sxa commented Nov 18, 2024

We have LDR_CNTRL variable that impacts memory segments on AIX - that's defined on some of the jenkins agent definitions (and also overridden in the build scripts I believe). May be worth seeing if changing that value affects things (Might be easier if we can replicate on the command line)

@steelhead31
Copy link
Contributor Author

steelhead31 commented Nov 18, 2024

On the machine in question..

core file size              (blocks, -c) unlimited  
data seg size               (kbytes, -d) unlimited  
file size                   (blocks, -f) unlimited  
max memory size             (kbytes, -m) unlimited  
open files                          (-n) 2000  
pipe size                (512 bytes, -p) 64  
stack size                  (kbytes, -s) 4194304  
cpu time                   (seconds, -t) unlimited  
max user processes                  (-u) unlimited  
virtual memory              (kbytes, -v) unlimited

@jiekang
Copy link

jiekang commented Nov 18, 2024

Test ThreadCountLimit has AIX specific flags, and sets MaxExpectedDataSegmentSize to 16G instead of the default 8G.

https://github.com/openjdk/jdk/blob/c59adf68d9ac49b41fb778041e3949a8057e8d7f/test/hotspot/jtreg/runtime/Thread/ThreadCountLimit.java#L39

You could try setting MaxExpectedDataSegmentSize to 16G as well.

@jiekang
Copy link

jiekang commented Nov 18, 2024

The JDK bug that introduced the AIX specific flags for ThreadCountLimit has relevant conversation as well: https://bugs.openjdk.org/browse/JDK-8323964

Seems the MaxExpectedDataSegmentSize (Maximum expected size of the data segment, AIX specific flag)
settings in in some cases too low for the test.
We experienced this a few times in scenarios where a lot of threads are created (and even warn about this, see os_aix.cpp).
So we should better run with a higher than default value of MaxExpectedDataSegmentSize for this test.

@steelhead31
Copy link
Contributor Author

I've tried a simple grinder run with the MaxExpectedDataSegmentSize without success..

https://ci.adoptium.net/job/Grinder/11483/console

@jiekang
Copy link

jiekang commented Nov 18, 2024

Mm... re EAGAIN, apart from "Insufficient resources to create another thread." it also notes:

EAGAIN A system-imposed limit on the number of threads was
              encountered.  There are a number of limits that may
              trigger this error: the RLIMIT_NPROC soft resource limit
              (set via [setrlimit(2)](https://man7.org/linux/man-pages/man2/setrlimit.2.html)), which limits the number of
              processes and threads for a real user ID, was reached; the
              kernel's system-wide limit on the number of processes and
              threads, /proc/sys/kernel/threads-max, was reached (see
              [proc(5)](https://man7.org/linux/man-pages/man5/proc.5.html)); or the maximum number of PIDs,
              /proc/sys/kernel/pid_max, was reached (see [proc(5)](https://man7.org/linux/man-pages/man5/proc.5.html)).

Dunno if anything there is relevant somehow...

@steelhead31
Copy link
Contributor Author

I've tried increasing the number of process/user limit ..

/usr/sbin/chdev -l sys0 -a maxuproc=1200
sys0 changed
root@adopt07:[/etc/security]/usr/sbin/lsattr -E -l sys0 | grep maxuproc
maxuproc        1200                                 Maximum number of PROCESSES allowed per user        True

Still no look, perhaps there is something at the hypervisor level, or a different kernel parameter specific to AIX,..

@steelhead31 steelhead31 moved this from In Progress to Paused/Blocked in 2024 4Q Adoptium Plan Nov 19, 2024
@jiekang
Copy link

jiekang commented Nov 19, 2024

There is a lot of interesting content when you Google 'AIX thread limit', or 'AIX Java thread limit' and things we can consider. But at this point, this issue is probably something we can request assistance from others who have an interest in AIX.

@jiekang
Copy link

jiekang commented Nov 19, 2024

I even found this fun thread that includes @andrew-m-leonard eclipse-openj9/openj9#7503 hahah...

@sxa
Copy link
Member

sxa commented Nov 22, 2024

Note that another test - java/net/httpclient/SpecialHeadersTest.java is mentioned alongside the StreamingBody one in
#3523 (I'd be tempted to close this as a dup but there is a chunk of useful stuff in the comments above, so happy to keep both open for now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Paused/Blocked
Development

No branches or pull requests

3 participants