Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DEFECT] trousseau pod fails encryption healthcheck with 403 #158

Open
jficz opened this issue Aug 16, 2022 · 5 comments
Open

[DEFECT] trousseau pod fails encryption healthcheck with 403 #158

jficz opened this issue Aug 16, 2022 · 5 comments
Assignees
Labels
bug Something isn't working

Comments

@jficz
Copy link

jficz commented Aug 16, 2022

Trousseau pods die in a few minutes after they're started

Detailed Description

Even though the Trousseau pod works and secrets are in fact encrypted via Vault transit key,
after 20m timeout the healthcheck apparently fails with a 403 error when trying to perform
a vault operation which results in the pod termination.

This doesn't make sense to me as the pod uses a token which is valid and can, in fact, do
the actual encryption/decryption while the pod is alive. This has been checked by the
kubectl generate secret and etcdcl get /registry/secrets/....

Using Trousseau v1.1.3.

Logs:

{
  "level": "Level(-3)",
  "timestamp": "2022-08-16T10:46:37.626434563Z",
  "caller": "server/health.go:33",
  "msg": "Initialize health check\n"
}
{
  "level": "info",
  "timestamp": "2022-08-16T11:06:34.104487451Z",
  "caller": "encrypt/vault.go:202",
  "msg": "Failed to send request",
  "code": 403,
  "error": "Error making API request.\n\nURL: POST https://vault.internal:8200/v1/transit/encrypt/kube-ktest-kms\nCode: 403. Errors:\n\n* permission denied"
}
{
  "level": "info",
  "timestamp": "2022-08-16T11:06:34.104850154Z",
  "caller": "encrypt/vault.go:285",
  "msg": "Failed to encrypt locked",
  "key": "kube-ktest-kms",
  "error": "forbidden error Error making API request.\n\nURL: POST https://vault.internal:8200/v1/transit/encrypt/kube-ktest-kms\nCode: 403. Errors:\n\n* permission denied"
}
{
  "level": "info",
  "timestamp": "2022-08-16T11:06:34.111772569Z",
  "caller": "encrypt/vault.go:202",
  "msg": "Failed to send request",
  "code": 403,
  "error": "Error making API request.\n\nURL: POST https://vault.internal:8200/v1/transit/decrypt/kube-ktest-kms\nCode: 403. Errors:\n\n* permission denied"
}
{
  "level": "info",
  "timestamp": "2022-08-16T11:06:34.11254637Z",
  "caller": "encrypt/vault.go:305",
  "msg": "Failed to decrypt locked",
  "error": "forbidden error Error making API request.\n\nURL: POST https://vault.internal:8200/v1/transit/decrypt/kube-ktest-kms\nCode: 403. Errors:\n\n* permission denied"
}
{
  "level": "error",
  "timestamp": "2022-08-16T11:06:34.113574164Z",
  "caller": "server/health.go:85",
  "msg": "Encryption failed",
  "original": "healthcheck",
  "decrypted": "",
  "error": "failed to properly decrypt encrypted data",
  "stacktrace": "github.com/ondat/trousseau/internal/server.(*HealthZ).ServeHTTP\n\t/work/internal/server/health.go:85\nnet/http.HandlerFunc.ServeHTTP\n\t/usr/local/go/src/net/http/server.go:2084\nnet/http.(*ServeMux).ServeHTTP\n\t/usr/local/go/src/net/http/server.go:2462\nnet/http.serverHandler.ServeHTTP\n\t/usr/local/go/src/net/http/server.go:2916\nnet/http.(*conn).serve\n\t/usr/local/go/src/net/http/server.go:1966"
}
{
  "level": "info",
  "timestamp": "2022-08-16T11:06:44.102098227Z",
  "caller": "encrypt/vault.go:202",
  "msg": "Failed to send request",
  "code": 403,
  "error": "Error making API request.\n\nURL: POST https://vault.internal:8200/v1/transit/encrypt/kube-ktest-kms\nCode: 403. Errors:\n\n* permission denied"
}
{
  "level": "info",
  "timestamp": "2022-08-16T11:06:44.102356133Z",
  "caller": "encrypt/vault.go:285",
  "msg": "Failed to encrypt locked",
  "key": "kube-ktest-kms",
  "error": "forbidden error Error making API request.\n\nURL: POST https://vault.internal:8200/v1/transit/encrypt/kube-ktest-kms\nCode: 403. Errors:\n\n* permission denied"
}
{
  "level": "info",
  "timestamp": "2022-08-16T11:06:44.110282553Z",
  "caller": "encrypt/vault.go:202",
  "msg": "Failed to send request",
  "code": 403,
  "error": "Error making API request.\n\nURL: POST https://vault.internal:8200/v1/transit/decrypt/kube-ktest-kms\nCode: 403. Errors:\n\n* permission denied"
}
{
  "level": "info",
  "timestamp": "2022-08-16T11:06:44.110369672Z",
  "caller": "encrypt/vault.go:305",
  "msg": "Failed to decrypt locked",
  "error": "forbidden error Error making API request.\n\nURL: POST https://vault.internal:8200/v1/transit/decrypt/kube-ktest-kms\nCode: 403. Errors:\n\n* permission denied"
}
{
  "level": "error",
  "timestamp": "2022-08-16T11:06:44.110413036Z",
  "caller": "server/health.go:85",
  "msg": "Encryption failed",
  "original": "healthcheck",
  "decrypted": "",
  "error": "failed to properly decrypt encrypted data",
  "stacktrace": "github.com/ondat/trousseau/internal/server.(*HealthZ).ServeHTTP\n\t/work/internal/server/health.go:85\nnet/http.HandlerFunc.ServeHTTP\n\t/usr/local/go/src/net/http/server.go:2084\nnet/http.(*ServeMux).ServeHTTP\n\t/usr/local/go/src/net/http/server.go:2462\nnet/http.serverHandler.ServeHTTP\n\t/usr/local/go/src/net/http/server.go:2916\nnet/http.(*conn).serve\n\t/usr/local/go/src/net/http/server.go:1966"
}
{
  "level": "info",
  "timestamp": "2022-08-16T11:06:54.110033047Z",
  "caller": "encrypt/vault.go:202",
  "msg": "Failed to send request",
  "code": 403,
  "error": "Error making API request.\n\nURL: POST https://vault.internal:8200/v1/transit/encrypt/kube-ktest-kms\nCode: 403. Errors:\n\n* permission denied"
}
{
  "level": "info",
  "timestamp": "2022-08-16T11:06:54.110140296Z",
  "caller": "encrypt/vault.go:285",
  "msg": "Failed to encrypt locked",
  "key": "kube-ktest-kms",
  "error": "forbidden error Error making API request.\n\nURL: POST https://vault.internal:8200/v1/transit/encrypt/kube-ktest-kms\nCode: 403. Errors:\n\n* permission denied"
}
{
  "level": "info",
  "timestamp": "2022-08-16T11:06:54.117075111Z",
  "caller": "encrypt/vault.go:202",
  "msg": "Failed to send request",
  "code": 403,
  "error": "Error making API request.\n\nURL: POST https://vault.internal:8200/v1/transit/decrypt/kube-ktest-kms\nCode: 403. Errors:\n\n* permission denied"
}
{
  "level": "info",
  "timestamp": "2022-08-16T11:06:54.117137565Z",
  "caller": "encrypt/vault.go:305",
  "msg": "Failed to decrypt locked",
  "error": "forbidden error Error making API request.\n\nURL: POST https://vault.internal:8200/v1/transit/decrypt/kube-ktest-kms\nCode: 403. Errors:\n\n* permission denied"
}
{
  "level": "error",
  "timestamp": "2022-08-16T11:06:54.117172855Z",
  "caller": "server/health.go:85",
  "msg": "Encryption failed",
  "original": "healthcheck",
  "decrypted": "",
  "error": "failed to properly decrypt encrypted data",
  "stacktrace": "github.com/ondat/trousseau/internal/server.(*HealthZ).ServeHTTP\n\t/work/internal/server/health.go:85\nnet/http.HandlerFunc.ServeHTTP\n\t/usr/local/go/src/net/http/server.go:2084\nnet/http.(*ServeMux).ServeHTTP\n\t/usr/local/go/src/net/http/server.go:2462\nnet/http.serverHandler.ServeHTTP\n\t/usr/local/go/src/net/http/server.go:2916\nnet/http.(*conn).serve\n\t/usr/local/go/src/net/http/server.go:1966"
}
{
  "level": "info",
  "timestamp": "2022-08-16T11:06:54.142295621Z",
  "caller": "kubernetes-kms-vault/main.go:156",
  "msg": "Received shutdown signal\n"
}
{
  "level": "info",
  "timestamp": "2022-08-16T11:06:54.142362222Z",
  "caller": "kubernetes-kms-vault/main.go:139",
  "msg": "Terminating the server\n"
}

Expected Behavior

The healthcheck should come through and the pod shouldn't crash.

Current Behavior

The pod crashes after 20m apparently due to a failed encryption healthcheck.

Steps to Reproduce

  1. Deploy trousseau
  2. Watch logs
  3. Wait 20m
  4. Maybe successfully encrypt and decrypt some secrets in the mean time
  5. the pod crashes with the logs above

Context (Environment)

Puprose-built cluster (kubespray) to test Trousseau (deployed with Terraform, so is trousseau-related Vault config)

Possible Solution/Implementation

It seems like Troussseau uses the wrong token for healthchecks (if any).

@jficz jficz added the bug Something isn't working label Aug 16, 2022
@jficz
Copy link
Author

jficz commented Sep 28, 2022

Is there anything else I can provide or do to help with debugging this?

@mhmxs
Copy link
Contributor

mhmxs commented Sep 28, 2022

@jficz Sorry for the late answer, I tried to reproduce the issue without a success. And on the other side, I don't understand how this should happen, because the health check of Trousseau exactly calls Trousseau via its own Unix socket.

Calls GRPC here: https://sourcegraph.com/github.com/ondat/[email protected]/-/blob/internal/server/health.go?L70
And the call lands at the exact endpoint, than any other request here: https://sourcegraph.com/github.com/ondat/trousseau@cfcf36b09810cbc01a5041c4f3d78f71b2d32c5e/-/blob/internal/server/grpc.go?L79

Are you sure only the health check times out?

@jficz
Copy link
Author

jficz commented Sep 28, 2022

I've read the code and don't understand this either. As far as I can tell it really only are the healthchecks which time out and it is because of the requests to Vault are (again, as far as I can tell) unauthenticated. Until the container fails I can access KMS-encrypted secrets as expected. Beats me, too. I'll do a new test on a new cluster now that new Kubespray is out.

@mhmxs
Copy link
Contributor

mhmxs commented Sep 28, 2022

@jficz could you please test it with https://github.com/ondat/trousseau/releases/tag/v2.0.0-alpha.1? Because if any fix should come, it would be fixed in 2.0. 🙏

@jficz
Copy link
Author

jficz commented Sep 28, 2022

will do, thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants