Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TemplateTest$Tuples fails sporadically #391

Open
k-wall opened this issue Sep 16, 2024 · 5 comments
Open

TemplateTest$Tuples fails sporadically #391

k-wall opened this issue Sep 16, 2024 · 5 comments

Comments

@k-wall
Copy link
Contributor

k-wall commented Sep 16, 2024

From a CI run, logs attached below.
It is not immediately obvious to me how the failure is occuring.

Error:  Tests run: 4, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 7.952 s <<< FAILURE! -- in io.kroxylicious.testing.kafka.junit5ext.TemplateTest$Tuples
Error:  io.kroxylicious.testing.kafka.junit5ext.TemplateTest$Tuples -- Time elapsed: 7.952 s <<< FAILURE!
org.opentest4j.AssertionFailedError: 

expected: [[1, 1], [3, 1], [3, -1]]
 but was: [[0, -1], [1, 1], [3, 1]]
	at io.kroxylicious.testing.kafka.junit5ext.TemplateTest$Tuples.afterAll(TemplateTest.java:163)
	at java.base/java.lang.reflect.Method.invoke(Method.java:569)
	at java.base/java.util.ArrayList.forEach(ArrayList.java:1511)
	at java.base/java.util.Collections$UnmodifiableCollection.forEach(Collections.java:1092)
	at java.base/java.util.ArrayList.forEach(ArrayList.java:1511)
	at java.base/java.util.ArrayList.forEach(ArrayList.java:1511)

logs_28425902804.zip

@k-wall
Copy link
Contributor Author

k-wall commented Nov 19, 2024

Dup of #294.

@k-wall
Copy link
Contributor Author

k-wall commented Nov 19, 2024

This just failed again.

Error:  Failures: 
Error:    TemplateTest$Tuples.afterAll:163 
expected: [[3, 1], [1, 1], [3, -1]]
 but was: [[0, -1], [1, 1], [3, 1]]
[INFO] 

admin.describeCluster().nodes().get().size() is returning zero which seems weird.

I notice looking at org.apache.kafka.image.publisher.ControllerRegistrationsPublisher#describeClusterControllers that describeClusterControllers consults controllers map which will be null if an appropriate metadata updates has not arrived yet. Could this be giving the race condition?

@showuon (low priority) does this look like a Kafka defect to you?

@showuon
Copy link
Member

showuon commented Nov 20, 2024

Questions:

  1. Is it possible to get logs inside controller/broker nodes?
  2. Does the admin client connect to the broker or controller?
  3. I'd like to know if we re-describe cluster, is the response still the same? I'd guess this is just a temporary state while the nodes are catching up with the metadata logs.

@k-wall
Copy link
Contributor Author

k-wall commented Nov 20, 2024

Questions:

  1. Is it possible to get logs inside controller/broker nodes?

I'll see if I can get a reproduction with logs.

  1. Does the admin client connect to the broker or controller?

broker.

  1. I'd like to know if we re-describe cluster, is the response still the same? I'd guess this is just a temporary state while the nodes are catching up with the metadata logs.

I expect so. I can add a retry loop to show whether that's the case.

This problem is longstanding - so it is not a regression in a newer release.

@k-wall
Copy link
Contributor Author

k-wall commented Nov 22, 2024

I've been trying to get a reproduction with separate broker logs. The only time I can actually get it to fail is when the 3 Brokers are co-located with the same. Even then it is really sporadic.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants