-
-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
G1 crash in mark_in_next_bitmap #867
Comments
https://bugs.openjdk.org/browse/JDK-8210557 is possible culprit (and only fixed in 12+). There's nothing in the 11.0.20 release notes that's related so it's unlikely that upgrading to that point release will fix. Are you able to run with 17.0.8? |
I think that https://bugs.openjdk.org/browse/JDK-8210557 cannot be the culprit because it is removing an assert() https://hg.openjdk.org/jdk/jdk/rev/b177af763b82, which are not compiled into the official releases from adoptium. As my software is deployed to many end-user systems instead of running on a bunch of servers I own, I cannot change the Java version en-masse easily. |
Here is a similar crash in 11.0.20 - however its stack is different:
|
@Adam- Can you try running with -XX:+VerifyBeforeGC and -XX:+VerifyAfterGC and send in the results after a crash? |
The performance degradation that is caused by these options is not something that we can widely deploy. I will try to get some individual users who are experiencing this issue to run with that. |
We are marking this issue as stale because it has not been updated for a while. This is just a way to keep the support issues queue manageable. |
@Adam- Any luck debugging this? It might also be worth trying 11.0.21 and seeing if it was fixed some other way |
I have not made any further progress debugging this. However we have linked AMD CPB to some of the lower-volume crashes we receive (not this crash specifically - I only filed a bug for this one issue because it is observed on many machines on various processors, including many Intel). I still see this crash commonly, with 84 crashes in the last 2 weeks. I only just began deploying 11.0.21 yesterday and so only about 3% of my VMs have it, but I have not observed this crash on it yet. |
I still see this crash, here is one from a macos machine: g1-11.0.21.txt. The stack is slightly different but it still crashes in
|
We are marking this issue as stale because it has not been updated for a while. This is just a way to keep the support issues queue manageable. |
@Adam- 11.0.22 has been released if you want to give that a go. |
I've seen 40 crashes in the last week from various hosts on 11.0.22:
Here they are, along with their stacktraces: Only about 14% of my total VMs are 11.0.22 |
We are marking this issue as stale because it has not been updated for a while. This is just a way to keep the support issues queue manageable. |
@Adam- Unfortunately we'll need to get some reports with |
I don't think I am able to get those. It is fine to just close this if you want. |
Please provide a brief summary of the bug
We observe rare G1 crashes in G1ConcurrentMark::mark_in_next_bitmap in at least AdoptOpenJDK/Temurin 11.0.4, 11.0.8, 11.0.16, 11.0.16.1, 11.0.18, and 11.0.19. We don't have a way to reproduce the issue, and it seemingly happens at random based on the reports sent to us by users. We have observed this specific crash 200 times on 144 different machines in the last 3 weeks.
I have included the full crash report of one of these crashes here, they are all nearly identical and have an identical native frame stack.
They look like this:
Java VM: OpenJDK 64-Bit Server VM Temurin-11.0.19+7 (11.0.19+7, mixed mode, tiered, compressed oops, g1 gc, windows-amd64)
Mapping the dll offsets to symbols via the pdb files Adoptium provides, yields this stack:
For reference, the code surrounding the crash is:
I have disassembled jvm.dll to determine what is happening.
The compiled code is somewhat dense because the compiler inlines the call to
heap_region_containing
,addr_to_region
,get_by_address
,shift_by
andbiased_base
, as well as the overloaded call tomark_in_next_bitmap
.Inlining them in source form would look something like this:
Note that the CMP instruction accesses
qword ptr [R10 + 0x160]
and also the crash log showsEXCEPTION_ACCESS_VIOLATION (0xc0000005), reading address 0x0000000000000160
. As far as I can tell, this means the value loaded from the _biased_base array is 0x0, which meanshr
is null, and is crashing when doing the access to _next_top_at_mark_start due to a null pointer dereference.I have almost no understanding of the G1 GC or most of the JDK so I don't know where to go from here.
Please provide steps to reproduce where possible
No response
Expected Results
No crash
Actual Results
Crash
What Java Version are you using?
Java VM: OpenJDK 64-Bit Server VM Temurin-11.0.19+7 (11.0.19+7, mixed mode, tiered, compressed oops, g1 gc, windows-amd64)
What is your operating system and platform?
No response
How did you install Java?
No response
Did it work before?
No response
Did you test with the latest update version?
No response
Did you test with other Java versions?
No response
Relevant log output
No response
The text was updated successfully, but these errors were encountered: