-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
JBTM-3966 fix crash tests #26
base: main
Are you sure you want to change the base?
Conversation
Started testing this pull request with LRA profile: https://ci-jenkins-csb-narayana.apps.ocp-c1.prod.psi.redhat.com/job/btny-pulls-lra/PROFILE=LRA,jdk=openJDK17,label=jnlp-agent/92/ |
Started testing this pull request with JACOCO profile: https://ci-jenkins-csb-narayana.apps.ocp-c1.prod.psi.redhat.com/job/btny-pulls-lra/PROFILE=JACOCO,jdk=openJDK17,label=jnlp-agent/92/ |
JACOCO profile tests passed - Job complete https://ci-jenkins-csb-narayana.apps.ocp-c1.prod.psi.redhat.com/job/btny-pulls-lra/PROFILE=JACOCO,jdk=openJDK17,label=jnlp-agent/92/ |
LRA profile tests failed (https://ci-jenkins-csb-narayana.apps.ocp-c1.prod.psi.redhat.com/job/btny-pulls-lra/PROFILE=LRA,jdk=openJDK17,label=jnlp-agent/92/): LRA Test failed with failures in arq profile |
@@ -269,28 +231,6 @@ private void doWait(long millis) { | |||
} | |||
} | |||
|
|||
private int recover() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The recovery is triggered by the WildFly startup
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if we test on a different container?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good point @mmusgrov ! Would you rather keep the recover method for different containers testing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've just restored the recover method
Started testing this pull request with LRA profile: https://ci-jenkins-csb-narayana.apps.ocp-c1.prod.psi.redhat.com/job/btny-pulls-lra/PROFILE=LRA,jdk=openJDK17,label=jnlp-agent/94/ |
Started testing this pull request with JACOCO profile: https://ci-jenkins-csb-narayana.apps.ocp-c1.prod.psi.redhat.com/job/btny-pulls-lra/PROFILE=JACOCO,jdk=openJDK17,label=jnlp-agent/94/ |
JACOCO profile tests passed - Job complete https://ci-jenkins-csb-narayana.apps.ocp-c1.prod.psi.redhat.com/job/btny-pulls-lra/PROFILE=JACOCO,jdk=openJDK17,label=jnlp-agent/94/ |
LRA profile tests passed - Job complete https://ci-jenkins-csb-narayana.apps.ocp-c1.prod.psi.redhat.com/job/btny-pulls-lra/PROFILE=LRA,jdk=openJDK17,label=jnlp-agent/94/ |
Started testing this pull request with JACOCO profile: https://ci-jenkins-csb-narayana.apps.ocp-c1.prod.psi.redhat.com/job/btny-pulls-lra/PROFILE=JACOCO,jdk=openJDK17,label=jnlp-agent/95/ |
Started testing this pull request with LRA profile: https://ci-jenkins-csb-narayana.apps.ocp-c1.prod.psi.redhat.com/job/btny-pulls-lra/PROFILE=LRA,jdk=openJDK17,label=jnlp-agent/95/ |
JACOCO profile tests passed - Job complete https://ci-jenkins-csb-narayana.apps.ocp-c1.prod.psi.redhat.com/job/btny-pulls-lra/PROFILE=JACOCO,jdk=openJDK17,label=jnlp-agent/95/ |
LRA profile tests passed - Job complete https://ci-jenkins-csb-narayana.apps.ocp-c1.prod.psi.redhat.com/job/btny-pulls-lra/PROFILE=LRA,jdk=openJDK17,label=jnlp-agent/95/ |
|
||
assertNotNull("A new LRA should have been added to the object store before the JVM was halted.", shortLRA); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now that we aren't reading the lra from the filesystem (lraId = getFirstLRAFromFS();
) this assert is no longer required.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, I'll remove it.
recover(); | ||
} | ||
} | ||
doWait(SHORT_TIMEOUT + 5000L); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Previously the test waited for SHORT_TIMEOUT + 1 second, do you know why we now need to wait SHORT_TIMEOUT + 5 seconds - the reason I mention it is that too many waits makes the test suite run a lot slower and we have other tests that sleep so the cumulative effect becomes noticeable when running the full test suite.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. The reason was to keep the test more stable, but I can reset the wait as SHORT_TIMEOUT + 1 second, hopefully it is equally stable on CI.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we see failures then we can add such a workaround and raise the priority of replacing the sleeps with an appropriate byteman rule.
// Verifies that the resource was notified that the LRA finished | ||
String listenerStatus = getStatusFromListener(lraListenerURI); | ||
Assert.assertTrue(String.format("LRA %s should have cancelled", shortLRA.toString()), | ||
status == null || status == LRAStatus.Cancelled || status == LRAStatus.Cancelling); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you know why the LRA may still be in the cancelling state even though your have extended the wait by 4 seconds (doWait(SHORT_TIMEOUT + 5000L);
).
Started testing this pull request with JACOCO profile: https://ci-jenkins-csb-narayana.apps.ocp-c1.prod.psi.redhat.com/job/btny-pulls-lra/PROFILE=JACOCO,jdk=openJDK17,label=jnlp-agent/96/ |
Started testing this pull request with LRA profile: https://ci-jenkins-csb-narayana.apps.ocp-c1.prod.psi.redhat.com/job/btny-pulls-lra/PROFILE=LRA,jdk=openJDK17,label=jnlp-agent/96/ |
Started testing this pull request with JACOCO profile: https://ci-jenkins-csb-narayana.apps.ocp-c1.prod.psi.redhat.com/job/btny-pulls-lra/PROFILE=JACOCO,jdk=openJDK17,label=jnlp-agent/97/ |
Started testing this pull request with LRA profile: https://ci-jenkins-csb-narayana.apps.ocp-c1.prod.psi.redhat.com/job/btny-pulls-lra/PROFILE=LRA,jdk=openJDK17,label=jnlp-agent/97/ |
I put back the Hold label because when the LRA timeout is shorter (1 sec) the LRA status will be 'CANCELLING' instead of 'CANCELLED' after the recovery. So I need to further investigate it. |
JACOCO profile tests failed (https://ci-jenkins-csb-narayana.apps.ocp-c1.prod.psi.redhat.com/job/btny-pulls-lra/PROFILE=JACOCO,jdk=openJDK17,label=jnlp-agent/96/): LRA Test failed with failures in arq profile |
LRA profile tests failed (https://ci-jenkins-csb-narayana.apps.ocp-c1.prod.psi.redhat.com/job/btny-pulls-lra/PROFILE=LRA,jdk=openJDK17,label=jnlp-agent/96/): LRA Test failed with failures in arq profile |
JACOCO profile tests failed (https://ci-jenkins-csb-narayana.apps.ocp-c1.prod.psi.redhat.com/job/btny-pulls-lra/PROFILE=JACOCO,jdk=openJDK17,label=jnlp-agent/97/): LRA Test failed with failures in arq profile |
LRA profile tests failed (https://ci-jenkins-csb-narayana.apps.ocp-c1.prod.psi.redhat.com/job/btny-pulls-lra/PROFILE=LRA,jdk=openJDK17,label=jnlp-agent/97/): LRA Test failed with failures in arq profile |
https://issues.redhat.com/browse/JBTM-3966
Recovery is now triggered during the recovery service startup.