Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Excessive Logging after Group Coordination changes #28

Open
kosta opened this issue Nov 21, 2019 · 0 comments
Open

Excessive Logging after Group Coordination changes #28

kosta opened this issue Nov 21, 2019 · 0 comments

Comments

@kosta
Copy link
Contributor

kosta commented Nov 21, 2019

Hi!

We run kage to monitor our kafka brokers and run in to the following issue:

After a consumer group coordinator changes, the following line is logged a few thousand times each minute:

kage-host kage-kafka-host[9999]: t=2019-11-21T06:25:27+0000 lvl=eror msg="monitor: cannot get group topic offsets 3: kafka server: Request was for a consumer group that is not coordinated by this broker."

I believe this is due to this line:
https://github.com/msales/kage/blob/master/kafka/monitor.go#L295

		for group := range groups.Groups {
			if containsString(m.ignoreGroups, group) {
				continue
			}

			coordinator, err := m.client.Coordinator(group)

According to sarama v1.19.0 which is used here:

https://github.com/Shopify/sarama/blob/v1.19.0/client.go#L68

	// Coordinator returns the coordinating broker for a consumer group. It will
	// return a locally cached value if it's available. You can call
	// RefreshCoordinator to update the cached value. This function only works on
	// Kafka 0.8.2 and higher.
	Coordinator(consumerGroup string) (*Broker, error)

	// RefreshCoordinator retrieves the coordinator for a consumer group and stores it
	// in local cache. This function only works on Kafka 0.8.2 and higher.
	RefreshCoordinator(consumerGroup string) error

So at least once the error above occurs, RefreshCoordinator needs to be called for that group.

The quick&dirty fix would be to set a flag once this error occurs at least once, and call RefreshCoordinator on all groups once if the flag is set.

Would you accept a PR that does that?

Additionally, I think kage should detect if a log message repeats and not write the same message thousands of time, but something like "last message repeated 3123 times".

Would you accept a PR that does that?

Thanks in advance!

Cheers,
Kosta

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant