Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nats: stream not found #131

Open
bbigras opened this issue Aug 15, 2022 · 19 comments
Open

nats: stream not found #131

bbigras opened this issue Aug 15, 2022 · 19 comments

Comments

@bbigras
Copy link
Contributor

bbigras commented Aug 15, 2022

I'm getting this. k3s/kine/nats have been running for about 20 minutes.

{"level":"warn","ts":"2022-08-15T18:06:31.382Z","logger":"etcd-client","caller":"[email protected]/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc000b161c0/127.0.0.1:2379","attempt":0,"error":"rpc error: code = Unknown desc = nats: stream not found"}

docker-compose.yaml

version: '3.6'
services:
  nats:
    # image: nats
    image: nats:2.8.2
    command: >-
      -js
      -DV
  kine:
    image: rancher/kine:v0.9.3-amd64
    restart: on-failure
    depends_on:
      - nats
    ports:
      - 2379:2379
    command: >-
      --endpoint "nats://nats"
      --debug

run k3s

docker run \
  -it --rm \
  --tmpfs /run \
  --tmpfs /var/run \
  --net=host \
  --privileged rancher/k3s \
  server \
  --cluster-init \
  --disable-agent \
  --token=SECRET \
  --datastore-endpoint=http://127.0.0.1:2379

log

[...]
{"level":"warn","ts":"2022-08-15T18:07:15.201Z","logger":"etcd-client","caller":"[email protected]/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc000b161c0/127.0.0.1:2379","attempt":0,"error":"rpc error: code = Unknown desc = nats: stream not found"}
E0815 18:07:27.259776      39 resource_quota_controller.go:413] unable to retrieve the complete list of server APIs: metrics.k8s.io/v1beta1: the server is currently unable to handle the request
W0815 18:07:27.752178      39 garbagecollector.go:747] failed to discover some groups: map[metrics.k8s.io/v1beta1:the server is currently unable to handle the request]
{"level":"warn","ts":"2022-08-15T18:07:49.495Z","logger":"etcd-client","caller":"[email protected]/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc000b161c0/127.0.0.1:2379","attempt":0,"error":"rpc error: code = Unknown desc = nats: stream not found"}
E0815 18:07:57.279814      39 resource_quota_controller.go:413] unable to retrieve the complete list of server APIs: metrics.k8s.io/v1beta1: the server is currently unable to handle the request
W0815 18:07:57.763788      39 garbagecollector.go:747] failed to discover some groups: map[metrics.k8s.io/v1beta1:the server is currently unable to handle the request]
{"level":"warn","ts":"2022-08-15T18:08:23.986Z","logger":"etcd-client","caller":"[email protected]/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc000b161c0/127.0.0.1:2379","attempt":0,"error":"rpc error: code = Unknown desc = nats: stream not found"}
E0815 18:08:27.298772      39 resource_quota_controller.go:413] unable to retrieve the complete list of server APIs: metrics.k8s.io/v1beta1: the server is currently unable to handle the request
W0815 18:08:27.775153      39 garbagecollector.go:747] failed to discover some groups: map[metrics.k8s.io/v1beta1:the server is currently unable to handle the request]

Do I need to create a nats stream manually?

@brandond
Copy link
Member

brandond commented Aug 15, 2022

No, I don't believe you should have to create the stream. Do you perhaps need to specify the port in addition to the container hostname in the nats:// URI? What are the logs from the kine container? Do you get the same error when testing with nats:2.7.4?

@bbigras
Copy link
Contributor Author

bbigras commented Aug 15, 2022

Using nats://nats:4222 seems to do the same.

Same with nats:2.7.4.

kine.log

@brandond
Copy link
Member

brandond commented Aug 15, 2022

That all looks fine - it seems that kine is functioning properly. Its possible that the "stream not found" error is an improperly wrapped return code from nats? I'd have to defer to the nats team, as they contributed the code.

The unable to retrieve the complete list of server APIs: metrics.k8s.io/v1beta1: the server is currently unable to handle the request error is most frequently seen when the metrics-server pod has not yet started. You might investigate why metrics-server isn't working, or start k3s with --disable=metrics-server

@bbigras
Copy link
Contributor Author

bbigras commented Aug 15, 2022

Weird, I still get this, with --disable=metrics-server.

W0815 19:33:12.047943      38 handler_proxy.go:102] no RequestInfo found in the context
E0815 19:33:12.047976      38 controller.go:113] loading OpenAPI spec for "v1beta1.metrics.k8s.io" failed with: Error, could not get list of group versions for APIService
I0815 19:33:12.047986      38 controller.go:126] OpenAPI AggregationController: action for item v1beta1.metrics.k8s.io: Rate Limited Requeue.
W0815 19:33:12.048964      38 handler_proxy.go:102] no RequestInfo found in the context
E0815 19:33:12.049015      38 controller.go:116] loading OpenAPI spec for "v1beta1.metrics.k8s.io" failed with: failed to retrieve openAPI spec, http error: ResponseCode: 503, Body: service unavailable
, Header: map[Content-Type:[text/plain; charset=utf-8] X-Content-Type-Options:[nosniff]]
I0815 19:33:12.049037      38 controller.go:129] OpenAPI AggregationController: action for item v1beta1.metrics.k8s.io: Rate Limited Requeue.
E0815 19:33:13.437910      38 resource_quota_controller.go:413] unable to retrieve the complete list of server APIs: metrics.k8s.io/v1beta1: the server is currently unable to handle the request
W0815 19:33:13.850875      38 garbagecollector.go:747] failed to discover some groups: map[metrics.k8s.io/v1beta1:the server is currently unable to handle the request]

@brandond
Copy link
Member

It looks like it's still there. Make sure you're wiping the nats datastore, otherwise you'll have stuff left over from the previous cluster instance.

@bbigras
Copy link
Contributor Author

bbigras commented Aug 15, 2022

done.

It looks like I only have {"level":"warn","ts":"2022-08-15T19:46:25.050Z","logger":"etcd-client","caller":"[email protected]/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc000871a40/127.0.0.1:2379","attempt":0,"error":"rpc error: code = Unknown desc = nats: stream not found"} now.

@brandond
Copy link
Member

Oh, I see you're starting k3s with both --cluster-init and --datastore-endpoint. The cluster-init flag is used to enable the embedded etcd, and should not be used alongside the datastore-endpoint option. I'm honestly surprised this worked at all.

@bbigras
Copy link
Contributor Author

bbigras commented Aug 15, 2022

If I remove --cluster-init, I still get stream not found but the k3s deploy are ready, even the metrics-server!

❯ kubectl get deploy -A
NAMESPACE     NAME                     READY   UP-TO-DATE   AVAILABLE   AGE
kube-system   coredns                  1/1     1            1           3m44s
kube-system   local-path-provisioner   1/1     1            1           3m44s
kube-system   metrics-server           1/1     1            1           83s
kube-system   traefik                  1/1     1            1           3m5s

@brandond
Copy link
Member

brandond commented Aug 15, 2022

Well you're also starting k3s with --disable-agent, which would prevent you from running any pods unless you are also adding an agent node. Did you remove that flag at the same time you removed --cluster-init, or did you add an agent node?

@bbigras
Copy link
Contributor Author

bbigras commented Aug 15, 2022

Yeah, it seems I removed it. I didn't intend to, but that must explain why the pods are running now.

@bbigras
Copy link
Contributor Author

bbigras commented Aug 16, 2022

So just to be clear, it seems to work except there's still a nats: stream not found warning.

@brandond
Copy link
Member

I can't reproduce that, either using nats directly as the k3s datastore endpoint, or using a standalone kine instance. Do you perhaps have some stuff left over from a previous configuration?

@bbigras
Copy link
Contributor Author

bbigras commented Aug 16, 2022

I don't think so.

Note that I also tested running k3s+kine+nats on fly.io so I would guess there's no left over there, but I still get the message.

Maybe I should try to reproduce it with a docker-compose file with all 3 service, with the image tags pinned.

@nilathedragon
Copy link

I am able to see the same error message using the latest k0s (v1.24.4+k0s) on a clean install.

Steps to reproduce:

  1. Start NATS docker container with docker run -d -p4222:4222 --name=nats nats:2.8.2 -js -DV
  2. Install k0s utility $ curl -sSLf https://get.k0s.sh | sudo sh as stated on https://docs.k0sproject.io/v0.12.1/install/
  3. Create K0s configuration mkdir -p /etc/k0s and k0s config create > /etc/k0s/k0s.yaml
  4. Set storage.type to kine and set storage.kine.datasource to nats://<serverip>:4222
  5. Install single node cluster with k0s install controller --single -c /etc/k0s/k0s.yaml
  6. Start k0s with k0s start

The cluster comes up successfully for me, but the logs keep repeating:
level=info msg="{\"level\":\"warn\",\"ts\":\"2022-09-21T21:33:44.148Z\",\"logger\":\"etcd-client\",\"caller\":\"v3/retry_interceptor.go:62\",\"msg\":\"retrying of unary invoker failed\",\"target\":\"etcd-endpoints://0xc00002fc00//run/k0s/kine/kine.sock:2379\",\"attempt\":0,\"error\":\"rpc error: code = Unknown desc = nats: stream not found\"}" component=kube-apiserver every 30-ish seconds. Sometimes a bit more sometimes a bit less.

I am testing on a Hetzner Cloud Machine, Rocky Linux 8.6 using Docker-CE to run NATS.

@brandond
Copy link
Member

I'm not sure how to help with k0s. Can you reproduce on k3s or a standalone instance of kine ?

@nilathedragon
Copy link

nilathedragon commented Sep 22, 2022

K0s embeds kine, so I assumed that wouldn't matter 😅
I tested it with k3s, kine and NATS again though:

  1. Start NATS docker container with docker run -d -p4222:4222 --name=nats nats:2.8.2 -js -DV
  2. Start Kine docker container with docker run -d -p2379:2379 --name=kine rancher/kine:v0.9.4-amd64 --endpoint "nats://<serverip>:4222" --debug
  3. Start k3s docker container with docker run -it --rm --tmpfs /run --tmpfs /var/run --net=host --privileged rancher/k3s server --token=SECRET --datastore-endpoint=http://<serverip>:2379

Wait for the cluster to settle and the same error as with k0s will start appearing in the logs.
{"level":"warn","ts":"2022-09-22T10:33:23.705Z","logger":"etcd-client","caller":"[email protected]/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc000a64700/<ip>:2379","attempt":0,"error":"rpc error: code = Unknown desc = nats: stream not found"}

Test was conducted on the same host as k0s (After a full reset)

Complete Bash History
[root@nats-kube-test ~]# history
  1  clear
  2  dnf update
  3  dnf config-manager --add-repo=https://download.docker.com/linux/centos/docker-ce.repo
  4  dnf install -y docker-ce
  5  systemctl enable --now docker
  6  docker run -d -p4222:4222 --name=nats nats:2.8.2 -js -DV
  7  docker run -d -p2379:2379 --name=kine rancher/kine:v0.9.4-amd64 --endpoint "nats://<serverip>:4222" --debug
  8  docker run -it --rm --tmpfs /run --tmpfs /var/run --net=host --privileged rancher/k3s server --token=SECRET --datastore-endpoint=http://<serverip>:2379
  9  history

@brandond
Copy link
Member

brandond commented Sep 22, 2022

@caleblloyd @boxboatmatt @jnmoyne can you assist on this? The error appears to be coming from the nats server. I suspect it is being configured incorrectly somehow?

@caleblloyd
Copy link
Contributor

I wonder if this is from the DbSize call - I noticed that call was incorrect and bundled the fix into the version upgrade:

https://github.com/k3s-io/kine/pull/137/files#diff-fefbd7dc68770cb2275bb6c283ace6fe949dc7bf28aeb56f9481a98598bd491aL896

I believe that DbSize currently does return that error

@brandond
Copy link
Member

Interesting. In that case, it should be fine to ignore the error for the time being?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants