Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"nil pointer dereference" exception in exporter's Collect func resp.Body.Close() when http connection failed with connection refused #5

Open
lhoss opened this issue Oct 23, 2017 · 2 comments

Comments

@lhoss
Copy link

lhoss commented Oct 23, 2017

bug detected in journalnode-exporter (shares exact same logic) from Datatamer fork.
Before the stacktrace below, got a useful error log that shows the export got a connection refused, just before the NIL pointer exception:

time="2017-10-23T08:33:52Z" level=error msg="Get http://localhost:8480/jmx: dial tcp 127.0.0.1:8480: getsockopt: connection refused" file="journalnode_exporter.go" line=97

exception full stack trace:

panic: runtime error: invalid memory address or nil pointer dereference
[signal 0xb code=0x1 addr=0x0 pc=0x402c8b]

goroutine 335365 [running]:
panic(0x816880, 0xc820010080)
/usr/lib/go-1.6/src/runtime/panic.go:464 +0x3e6
main.(*Exporter).Collect(0xc8200b8120, 0xc8202fe1e0)
/opt/prometheus/downloads/git/hadoop-exporter-52d6c8331c03775ae8d8b92e834024e1cd8959d8/journalnode/journalnode_exporter.go:99 +0x135b
vendor/github.com/prometheus/client_golang/prometheus.(*Registry).Gather.func2(0xc820258190, 0xc8202fe1e0, 0x7f8c9296baa0, 0xc8200b8120)
/opt/prometheus/downloads/git/hadoop-exporter-52d6c8331c03775ae8d8b92e834024e1cd8959d8/src/vendor/github.com/prometheus/client_golang/prometheus/registry.go:433 +0x58
created by vendor/github.com/prometheus/client_golang/prometheus.(*Registry).Gather
/opt/prometheus/downloads/git/hadoop-exporter-52d6c8331c03775ae8d8b92e834024e1cd8959d8/src/vendor/github.com/prometheus/client_golang/prometheus/registry.go:434 +0x360
@miono
Copy link
Contributor

miono commented Jan 8, 2019

I suggest the following solution:
Add an up-gauge that get's set to 1 or 0 in the Collect-function. If the http-request returns an error, set this to 0, Collect just that metric and then return from the function.

This way we won't kill the exporter just because the namenode/resourcemanger get's killed, and we will be able to create alerts where we consume the exporters metrics.

I can add this code if you think it's a good idea @wyukawa

@miono
Copy link
Contributor

miono commented Jan 17, 2019

@wyukawa Any thoughts on this? I have the fix pretty much ready in my head. And a PR could be submitted within a day. What's your thoughts on the approach?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants