Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pebble ConnectionError on logging relation joined #390

Open
gruyaume opened this issue Apr 26, 2024 · 3 comments
Open

Pebble ConnectionError on logging relation joined #390

gruyaume opened this issue Apr 26, 2024 · 3 comments

Comments

@gruyaume
Copy link

Bug Description

We are getting recurrent issues in our integration tests from the loki_push_api v1 library.

CI failure example:

To Reproduce

  1. Deploy charm that uses the loki_push_api lib
  2. Deploy grafana agent
  3. Integrate the two charms

Environment

  • lib version: 1.9
  • juju version: 3.4.2

Relevant log output

Traceback (most recent call last):
  File "/var/lib/juju/agents/unit-sdcore-udr-k8s-0/charm/./src/charm.py", line 639, in <module>
    main(UDROperatorCharm)
  File "/var/lib/juju/agents/unit-sdcore-udr-k8s-0/charm/venv/ops/main.py", line 544, in main
    manager.run()
  File "/var/lib/juju/agents/unit-sdcore-udr-k8s-0/charm/venv/ops/main.py", line 520, in run
    self._emit()
  File "/var/lib/juju/agents/unit-sdcore-udr-k8s-0/charm/venv/ops/main.py", line 509, in _emit
    _emit_charm_event(self.charm, self.dispatcher.event_name)
  File "/var/lib/juju/agents/unit-sdcore-udr-k8s-0/charm/venv/ops/main.py", line 143, in _emit_charm_event
    event_to_emit.emit(*args, **kwargs)
  File "/var/lib/juju/agents/unit-sdcore-udr-k8s-0/charm/venv/ops/framework.py", line 352, in emit
    framework._emit(event)
  File "/var/lib/juju/agents/unit-sdcore-udr-k8s-0/charm/venv/ops/framework.py", line 851, in _emit
    self._reemit(event_path)
  File "/var/lib/juju/agents/unit-sdcore-udr-k8s-0/charm/venv/ops/framework.py", line 941, in _reemit
    custom_handler(event)
  File "/var/lib/juju/agents/unit-sdcore-udr-k8s-0/charm/lib/charms/loki_k8s/v1/loki_push_api.py", line 2550, in _update_logging
    self._update_endpoints(container, loki_endpoints)
  File "/var/lib/juju/agents/unit-sdcore-udr-k8s-0/charm/lib/charms/loki_k8s/v1/loki_push_api.py", line 2562, in _update_endpoints
    _PebbleLogClient.disable_inactive_endpoints(
  File "/var/lib/juju/agents/unit-sdcore-udr-k8s-0/charm/lib/charms/loki_k8s/v1/loki_push_api.py", line 2463, in disable_inactive_endpoints
    pebble_layer = container.get_plan().to_dict().get("log-targets", None)
  File "/var/lib/juju/agents/unit-sdcore-udr-k8s-0/charm/venv/ops/model.py", line 2186, in get_plan
    return self._pebble.get_plan()
  File "/var/lib/juju/agents/unit-sdcore-udr-k8s-0/charm/venv/ops/pebble.py", line 2113, in get_plan
    resp = self._request('GET', '/v1/plan', {'format': 'yaml'})
  File "/var/lib/juju/agents/unit-sdcore-udr-k8s-0/charm/venv/ops/pebble.py", line 1778, in _request
    response = self._request_raw(method, path, query, headers, data)
  File "/var/lib/juju/agents/unit-sdcore-udr-k8s-0/charm/venv/ops/pebble.py", line 1827, in _request_raw
    raise ConnectionError(
ops.pebble.ConnectionError: Could not connect to Pebble: socket not found at '/charm/containers/udr/pebble.socket' (container restarted?)
unit-sdcore-udr-k8s-0: 14:28:09 ERROR juju.worker.uniter.operation hook "logging-relation-joined" (via hook dispatching script: dispatch) failed: exit status 1

Additional context

No response

@Abuelodelanada
Copy link
Contributor

Hi @gruyaume

Although using the following bundle I was not able to reproduce it:

bundle: kubernetes
name: gui

applications:
  grafana-agent-k8s:
    charm: grafana-agent-k8s
    channel: latest/edge
    # revision: 77
    # resources:
    #   agent-image: 42
    scale: 1
    constraints: arch=amd64
    storage:
      data: kubernetes,1,1024M
  sdcore-udr-k8s:
    charm: sdcore-udr-k8s
    channel: 1.5/edge
    # revision: 153
    # resources:
    #   udr-image: 27
    scale: 1
    constraints: arch=amd64
    storage:
      certs: kubernetes,1,1M
      config: kubernetes,1,1M
relations:
- - grafana-agent-k8s:logging-provider
  - sdcore-udr-k8s:logging

seems there is a missing can_connect guard in this line

@cbartz
Copy link

cbartz commented Jun 13, 2024

fwiw, I also see Pebble connection errors (loki rev 132).

juju debug-log
unit-loki-0: 09:38:04 DEBUG unit.loki/0.juju-log ops 2.12.0 up and running.
unit-loki-0: 09:38:05 DEBUG unit.loki/0.loki-pebble-ready Clearing symlinks in /etc/ssl/certs...
unit-loki-0: 09:38:06 DEBUG unit.loki/0.loki-pebble-ready done.
unit-loki-0: 09:38:06 DEBUG unit.loki/0.loki-pebble-ready Updating certificates in /etc/ssl/certs...
unit-loki-0: 09:38:07 WARNING unit.loki/0.loki-pebble-ready rehash: warning: skipping ca-certificates.crt,it does not contain exactly one certificate or CRL
unit-loki-0: 09:38:07 DEBUG unit.loki/0.loki-pebble-ready 137 added, 0 removed; done.
unit-loki-0: 09:38:07 DEBUG unit.loki/0.loki-pebble-ready Running hooks in /etc/ca-certificates/update.d...
unit-loki-0: 09:38:07 DEBUG unit.loki/0.loki-pebble-ready done.
unit-loki-0: 09:38:07 DEBUG unit.loki/0.juju-log This unit's ingress URL: foo
unit-loki-0: 09:38:07 DEBUG unit.loki/0.juju-log This unit's ingress URL: foo
unit-loki-0: 09:38:07 DEBUG unit.loki/0.juju-log This unit's ingress URL: foo
unit-loki-0: 09:38:07 DEBUG unit.loki/0.juju-log no relation on 'tracing': tracing not ready
unit-loki-0: 09:38:07 DEBUG unit.loki/0.juju-log <class '__main__.LokiOperatorCharm'>.<property object at 0x7f41d8133f90> returned None; quietly disabling charm_tracing for the run.
unit-loki-0: 09:38:07 DEBUG unit.loki/0.juju-log Emitting Juju event loki_pebble_ready.
unit-loki-0: 09:38:07 DEBUG unit.loki/0.juju-log Saved alert rules to disk
unit-loki-0: 09:38:07 DEBUG unit.loki/0.juju-log Checking loki alert rules via http://loki-0.loki-endpoints.stg-cos-k8s-ps6-is-charms.svc.cluster.local:3100/loki/api/v1/rules.
unit-loki-0: 09:38:07 ERROR unit.loki/0.juju-log Checking alert rules: [Errno 111] Connection refused
unit-loki-0: 09:38:07 INFO unit.loki/0.juju-log reqs=ResourceRequirements(claims=None, limits={}, requests={'cpu': '0.25', 'memory': '200Mi'}), templated=ResourceRequirements(claims=None, limits=None, requests={'cpu': '250m', 'memory': '200Mi'}), actual=ResourceRequirements(claims=None, limits=None, requests={'cpu': '250m', 'memory': '200Mi'})
unit-loki-0: 09:38:07 DEBUG unit.loki/0.juju-log This unit's ingress URL:foo
unit-loki-0: 09:38:09 DEBUG unit.loki/0.juju-log Checking loki alert rules via foo
unit-loki-0: 09:38:09 ERROR unit.loki/0.juju-log Checking alert rules: [Errno 111] Connection refused
unit-loki-0: 09:38:10 ERROR unit.loki/0.juju-log Uncaught exception while in charm code:
Traceback (most recent call last):
  File "./src/charm.py", line 658, in <module>
    main(LokiOperatorCharm, use_juju_for_storage=True)
  File "/var/lib/juju/agents/unit-loki-0/charm/venv/ops/main.py", line 544, in main
    manager.run()
  File "/var/lib/juju/agents/unit-loki-0/charm/venv/ops/main.py", line 520, in run
    self._emit()
  File "/var/lib/juju/agents/unit-loki-0/charm/venv/ops/main.py", line 509, in _emit
    _emit_charm_event(self.charm, self.dispatcher.event_name)
  File "/var/lib/juju/agents/unit-loki-0/charm/venv/ops/main.py", line 143, in _emit_charm_event
    event_to_emit.emit(*args, **kwargs)
  File "/var/lib/juju/agents/unit-loki-0/charm/venv/ops/framework.py", line 352, in emit
    framework._emit(event)
  File "/var/lib/juju/agents/unit-loki-0/charm/venv/ops/framework.py", line 851, in _emit
    self._reemit(event_path)
  File "/var/lib/juju/agents/unit-loki-0/charm/venv/ops/framework.py", line 941, in _reemit
    custom_handler(event)
  File "/var/lib/juju/agents/unit-loki-0/charm/lib/charms/tempo_k8s/v1/charm_tracing.py", line 547, in wrapped_function
    return callable(*args, **kwargs)  # type: ignore
  File "./src/charm.py", line 248, in _on_loki_pebble_ready
    version = self._loki_version
  File "./src/charm.py", line 636, in _loki_version
    version_output, _ = self._container.exec(["/usr/bin/loki", "-version"]).wait_output()
  File "/var/lib/juju/agents/unit-loki-0/charm/venv/ops/pebble.py", line 1559, in wait_output
    raise ExecError[AnyStr](self._command, exit_code, out_value, err_value)
ops.pebble.ExecError: non-zero exit code 2 executing ['/usr/bin/loki', '-version'], stdout='', stderr='unexpected fault address 0x737ab4fa\nfatal error: fault\n[signal SIGSEGV: segmentation violation code=0x1 addr=0x737ab4fa pc=0x16061a9]\n\ngoroutine 1 [running, locked to thread]:\nruntime.throw({0x28b9ee6?, 0x0?})\n\t/snap/go/current/src/runtime/panic.go:1047 +0x5d fp=0xc000778d18 sp=0xc000778ce8 pc=0x43911d\nruntime.sigpanic()\n\t/snap/go/current/src/runtime/signal_unix.go:855 +0x28a fp=0xc000778d78 sp=0xc000778d18 pc=0x45048a\ngo.opentelemetry.io/otel/attribute.Key.String(...)\n\t/root/parts/loki/build/vendor/go.opentelemetry.io/otel/attribute/key.go:116\ngo.opentelemetry.io/otel/semconv/v1%2e17%2e0.init()\n\t/root/parts/loki/build/vendor/go.opentelemetry.io/otel/semconv/v1.17.0/trace.go:3104 +0xf009 fp=0xc00077f010 sp=0xc000778d78 pc=0x16061a9\nruntime.doInit(0x431b760)\n\t/snap/go/current/src/runtime/proc.go:6527 +0x126 fp=0xc00077f140 sp=0xc00077f010 pc=0x449186\nruntime.doInit(0x43288c0)\n\t/snap/go/current/src/runtime/proc.go:6504 +0x71 fp=0xc00077f270 sp=0xc00077f140 pc=0x4490d1\nruntime.doInit(0x4334f00)\n\t/snap/go/current' [truncated]
unit-loki-0: 09:38:10 ERROR juju.worker.uniter.operation hook "loki-pebble-ready" (via hook dispatching script: dispatch) failed: exit status 1
unit-loki-0: 09:38:10 ERROR juju.worker.uniter pebble poll failed for container "loki": failed to send pebble-ready event: hook failed
unit-loki-0: 09:38:10 DEBUG unit.loki/0.juju-log ops 2.12.0 up and running.
unit-loki-0: 09:38:14 ERROR unit.loki/0.juju-log Uncaught exception while in charm code:
Traceback (most recent call last):
  File "./src/charm.py", line 658, in <module>
    main(LokiOperatorCharm, use_juju_for_storage=True)
  File "/var/lib/juju/agents/unit-loki-0/charm/venv/ops/main.py", line 540, in main
    manager = _Manager(
  File "/var/lib/juju/agents/unit-loki-0/charm/venv/ops/main.py", line 424, in __init__
    self.charm = self._make_charm(self.framework, self.dispatcher)
  File "/var/lib/juju/agents/unit-loki-0/charm/venv/ops/main.py", line 427, in _make_charm
    charm = self._charm_class(framework)
  File "/var/lib/juju/agents/unit-loki-0/charm/lib/charms/tempo_k8s/v1/charm_tracing.py", line 287, in wrap_init
    original_init(self, framework, *args, **kwargs)
  File "./src/charm.py", line 152, in __init__
    self._update_cert()
  File "/var/lib/juju/agents/unit-loki-0/charm/lib/charms/tempo_k8s/v1/charm_tracing.py", line 547, in wrapped_function
    return callable(*args, **kwargs)  # type: ignore
  File "./src/charm.py", line 479, in _update_cert
    self._container.exec(["update-ca-certificates", "--fresh"]).wait()
  File "/var/lib/juju/agents/unit-loki-0/charm/venv/ops/model.py", line 2718, in exec
    return self._pebble.exec(
  File "/var/lib/juju/agents/unit-loki-0/charm/venv/ops/pebble.py", line 2641, in exec
    change = self.wait_change(ChangeID(change_id))
  File "/var/lib/juju/agents/unit-loki-0/charm/venv/ops/pebble.py", line 2016, in wait_change
    return self._wait_change_using_wait(change_id, timeout)
  File "/var/lib/juju/agents/unit-loki-0/charm/venv/ops/pebble.py", line 2037, in _wait_change_using_wait
    return self._wait_change(change_id, this_timeout)
  File "/var/lib/juju/agents/unit-loki-0/charm/venv/ops/pebble.py", line 2053, in _wait_change
    resp = self._request('GET', f'/v1/changes/{change_id}/wait', query)
  File "/var/lib/juju/agents/unit-loki-0/charm/venv/ops/pebble.py", line 1778, in _request
    response = self._request_raw(method, path, query, headers, data)
  File "/var/lib/juju/agents/unit-loki-0/charm/venv/ops/pebble.py", line 1827, in _request_raw
    raise ConnectionError(
ops.pebble.ConnectionError: Could not connect to Pebble: socket not found at '/charm/containers/loki/pebble.socket' (container restarted?)
unit-loki-0: 09:38:14 ERROR juju.worker.uniter.operation hook "loki-pebble-ready" (via hook dispatching script: dispatch) failed: exit status 1
unit-loki-0: 09:38:14 ERROR juju.worker.uniter pebble poll failed for container "loki": failed to send pebble-ready event: hook failed


kubectl logs loki-0

2024-06-13T09:38:14.637Z [container-agent] 2024-06-13 09:38:14 ERROR juju-log Uncaught exception while in charm code:
2024-06-13T09:38:14.637Z [container-agent] Traceback (most recent call last):
2024-06-13T09:38:14.637Z [container-agent]   File "./src/charm.py", line 658, in <module>
2024-06-13T09:38:14.637Z [container-agent]     main(LokiOperatorCharm, use_juju_for_storage=True)
2024-06-13T09:38:14.637Z [container-agent]   File "/var/lib/juju/agents/unit-loki-0/charm/venv/ops/main.py", line 540, in main
2024-06-13T09:38:14.637Z [container-agent]     manager = _Manager(
2024-06-13T09:38:14.637Z [container-agent]   File "/var/lib/juju/agents/unit-loki-0/charm/venv/ops/main.py", line 424, in __init__
2024-06-13T09:38:14.637Z [container-agent]     self.charm = self._make_charm(self.framework, self.dispatcher)
2024-06-13T09:38:14.637Z [container-agent]   File "/var/lib/juju/agents/unit-loki-0/charm/venv/ops/main.py", line 427, in _make_charm
2024-06-13T09:38:14.637Z [container-agent]     charm = self._charm_class(framework)
2024-06-13T09:38:14.637Z [container-agent]   File "/var/lib/juju/agents/unit-loki-0/charm/lib/charms/tempo_k8s/v1/charm_tracing.py", line 287, in wrap_init
2024-06-13T09:38:14.637Z [container-agent]     original_init(self, framework, *args, **kwargs)
2024-06-13T09:38:14.637Z [container-agent]   File "./src/charm.py", line 152, in __init__
2024-06-13T09:38:14.637Z [container-agent]     self._update_cert()
2024-06-13T09:38:14.637Z [container-agent]   File "/var/lib/juju/agents/unit-loki-0/charm/lib/charms/tempo_k8s/v1/charm_tracing.py", line 547, in wrapped_function
2024-06-13T09:38:14.637Z [container-agent]     return callable(*args, **kwargs)  # type: ignore
2024-06-13T09:38:14.637Z [container-agent]   File "./src/charm.py", line 479, in _update_cert
2024-06-13T09:38:14.637Z [container-agent]     self._container.exec(["update-ca-certificates", "--fresh"]).wait()
2024-06-13T09:38:14.637Z [container-agent]   File "/var/lib/juju/agents/unit-loki-0/charm/venv/ops/model.py", line 2718, in exec
2024-06-13T09:38:14.637Z [container-agent]     return self._pebble.exec(
2024-06-13T09:38:14.637Z [container-agent]   File "/var/lib/juju/agents/unit-loki-0/charm/venv/ops/pebble.py", line 2641, in exec
2024-06-13T09:38:14.637Z [container-agent]     change = self.wait_change(ChangeID(change_id))
2024-06-13T09:38:14.637Z [container-agent]   File "/var/lib/juju/agents/unit-loki-0/charm/venv/ops/pebble.py", line 2016, in wait_change
2024-06-13T09:38:14.637Z [container-agent]     return self._wait_change_using_wait(change_id, timeout)
2024-06-13T09:38:14.637Z [container-agent]   File "/var/lib/juju/agents/unit-loki-0/charm/venv/ops/pebble.py", line 2037, in _wait_change_using_wait
2024-06-13T09:38:14.637Z [container-agent]     return self._wait_change(change_id, this_timeout)
2024-06-13T09:38:14.637Z [container-agent]   File "/var/lib/juju/agents/unit-loki-0/charm/venv/ops/pebble.py", line 2053, in _wait_change
2024-06-13T09:38:14.637Z [container-agent]     resp = self._request('GET', f'/v1/changes/{change_id}/wait', query)
2024-06-13T09:38:14.637Z [container-agent]   File "/var/lib/juju/agents/unit-loki-0/charm/venv/ops/pebble.py", line 1778, in _request
2024-06-13T09:38:14.637Z [container-agent]     response = self._request_raw(method, path, query, headers, data)
2024-06-13T09:38:14.637Z [container-agent]   File "/var/lib/juju/agents/unit-loki-0/charm/venv/ops/pebble.py", line 1827, in _request_raw
2024-06-13T09:38:14.637Z [container-agent]     raise ConnectionError(
2024-06-13T09:38:14.637Z [container-agent] ops.pebble.ConnectionError: Could not connect to Pebble: socket not found at '/charm/containers/loki/pebble.socket' (container restarted?)
2024-06-13T09:38:14.906Z [container-agent] 2024-06-13 09:38:14 ERROR juju.worker.uniter.operation runhook.go:180 hook "loki-pebble-ready" (via hook dispatching script: dispatch) failed: exit status 1

@Abuelodelanada
Copy link
Contributor

Hello @gruyaume @cbartz

Please may you update the loki_push_api lib in your charm and let us know if something has changed?

PR: #433

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants