Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consul connect services do not reconnect after booting up the cluster #21935

Open
suikast42 opened this issue Oct 21, 2024 · 1 comment
Open
Labels
type/bug Feature does not function as expected

Comments

@suikast42
Copy link

Operating system and Environment details

Nomad 1.6..0 CNI 1.6.0
Consul 1.20.0 CNI 1.6.0

##Job file

job "countdash_app_mesh" {
  datacenters = ["nomadder1"]
  group "api" {
    count = 1
#    constraint {
#      distinct_hosts = true
#    }
#         constraint {
#           attribute    = "${attr.unique.hostname}"
#           set_contains = "worker-02"
#         }
    network {
      mode = "bridge"
      port "api" {
        to = 9001
#        host_network = "public"
      }
    }

    service {
      name = "count-api"
      port = "api"
      address_mode = "alloc"
      connect {
        sidecar_service {}
      }

      check {
        name     = "api_health"
        type     = "http"
        path     = "/health"
        port     = "api"
        interval = "10s"
        timeout  = "2s"
        address_mode = "alloc"
      }

    }

    task "count-api" {
      driver = "docker"

      config {
        image = "hashicorpnomad/counter-api:v3"
        ports = ["api"]
      }

      resources {
        cpu    = 100
        memory = 128
      }
    }
  }

  group "dashboard" {
    count = 1
        # constraint {
        #   attribute    = "${attr.unique.hostname}"
        #   set_contains = "worker-01"
        # }
    network {
      mode = "bridge"

      port "http" {
        to = 9002
      }
    }

    service {
      name = "count-dashboard"
      port = "9002"
      tags = [
        "traefik.enable=true",
        "traefik.consulcatalog.connect=true",
        "traefik.http.routers.count-dashboard.tls=true",
        "traefik.http.routers.count-dashboard.rule=Host(`count.cloud.private`)"
      ]

      connect {
        sidecar_service {
          proxy {
            #            config {
            #              protocol = "http"
            #            }
            upstreams {
              destination_name = "count-api"
              local_bind_port  = 8080
            }
          }
        }
      }
    }

    task "dashboard" {
      driver = "docker"

      env {
        CONSUL_TLS_SERVER_NAME = "localhost"
        COUNTING_SERVICE_URL   = "http://${NOMAD_UPSTREAM_ADDR_count_api}"
      }

      config {
        image = "hashicorpnomad/counter-dashboard:v3"
      }

      resources {
        cpu    = 100
        memory = 128
      }
    }
  }
}

If I deploy this job everthing is ok until I rboot my vms.

After restart of the vms ( 1 worker and 1 master ) the connect services do not come up again

Log connect-dashboard


[2024-10-21 11:06:15.193][1][debug][dns] [source/extensions/network/dns_resolver/cares/dns_impl.cc:173] dns resolution without records for tempo-zipkin.service.consul
[2024-10-21 11:06:15.193][1][debug][dns] [source/extensions/network/dns_resolver/cares/dns_impl.cc:308] dns resolution for tempo-zipkin.service.consul completed with status 0
[2024-10-21 11:06:15.193][1][debug][upstream] [source/extensions/clusters/strict_dns/strict_dns_cluster.cc:201] DNS refresh rate reset for tempo-zipkin.service.consul, refresh rate 5000 ms
[2024-10-21 11:06:20.188][1][debug][main] [source/server/server.cc:237] flushing stats
[2024-10-21 11:06:20.193][1][debug][dns] [source/extensions/network/dns_resolver/cares/dns_impl.cc:391] dns resolution for tempo-zipkin.service.consul started
[2024-10-21 11:06:20.197][1][debug][dns] [source/extensions/network/dns_resolver/cares/dns_impl.cc:173] dns resolution without records for tempo-zipkin.service.consul
[2024-10-21 11:06:20.197][1][debug][dns] [source/extensions/network/dns_resolver/cares/dns_impl.cc:308] dns resolution for tempo-zipkin.service.consul completed with status 0
[2024-10-21 11:06:20.197][1][debug][upstream] [source/extensions/clusters/strict_dns/strict_dns_cluster.cc:201] DNS refresh rate reset for tempo-zipkin.service.consul, refresh rate 5000 ms
[2024-10-21 11:06:20.511][15][debug][filter] [source/common/tcp_proxy/tcp_proxy.cc:264] [Tags: "ConnectionId":"25"] new tcp proxy session
[2024-10-21 11:06:20.511][15][debug][filter] [source/common/tcp_proxy/tcp_proxy.cc:459] [Tags: "ConnectionId":"25"] Creating connection to cluster local_app
[2024-10-21 11:06:20.511][15][debug][misc] [source/common/upstream/cluster_manager_impl.cc:2329] Allocating TCP conn pool
[2024-10-21 11:06:20.511][15][debug][pool] [source/common/conn_pool/conn_pool_base.cc:291] trying to create new connection
[2024-10-21 11:06:20.511][15][debug][pool] [source/common/conn_pool/conn_pool_base.cc:145] creating a new connection (connecting=0)
[2024-10-21 11:06:20.511][15][debug][connection] [source/common/network/connection_impl.cc:1017] [Tags: "ConnectionId":"26"] connecting to 127.0.0.1:9002
[2024-10-21 11:06:20.511][15][debug][connection] [source/common/network/connection_impl.cc:1036] [Tags: "ConnectionId":"26"] connection in progress
[2024-10-21 11:06:20.511][15][debug][conn_handler] [source/common/listener_manager/active_tcp_listener.cc:160] [Tags: "ConnectionId":"25"] new connection from 172.21.2.20:34960
[2024-10-21 11:06:20.511][15][debug][connection] [source/common/network/connection_impl.cc:276] [Tags: "ConnectionId":"25"] closing socket: 0
[2024-10-21 11:06:20.511][15][debug][pool] [source/common/conn_pool/conn_pool_base.cc:669] cancelling pending stream
[2024-10-21 11:06:20.511][15][debug][connection] [source/common/network/connection_impl.cc:150] [Tags: "ConnectionId":"26"] closing data_to_write=0 type=1
[2024-10-21 11:06:20.511][15][debug][connection] [source/common/network/connection_impl.cc:276] [Tags: "ConnectionId":"26"] closing socket: 1
[2024-10-21 11:06:20.511][15][debug][pool] [source/common/conn_pool/conn_pool_base.cc:495] [Tags: "ConnectionId":"26"] client disconnected, failure reason: 
[2024-10-21 11:06:20.511][15][debug][pool] [source/common/conn_pool/conn_pool_base.cc:463] invoking 1 idle callback(s) - is_draining_for_deletion_=false
[2024-10-21 11:06:20.511][15][debug][pool] [source/common/conn_pool/conn_pool_base.cc:463] invoking 0 idle callback(s) - is_draining_for_deletion_=false
[2024-10-21 11:06:20.511][15][debug][conn_handler] [source/common/listener_manager/active_stream_listener_base.cc:136] [Tags: "ConnectionId":"25"] adding to cleanup list

Logs of same instace after restart consul

[2024-10-21 11:18:16.223][1][debug][connection] [source/common/tls/ssl_socket.cc:246] [Tags: "ConnectionId":"64"] remote address:alloc/tmp/consul_grpc.sock,TLS_error:|33554464:system library:OPENSSL_internal:Broken pipe:TLS_error_end
[2024-10-21 11:18:16.224][1][debug][client] [source/common/http/codec_client.cc:107] [Tags: "ConnectionId":"64"] disconnect. resetting 1 pending requests
[2024-10-21 11:18:16.224][1][debug][client] [source/common/http/codec_client.cc:159] [Tags: "ConnectionId":"64"] request reset
[2024-10-21 11:18:16.224][1][debug][pool] [source/common/conn_pool/conn_pool_base.cc:215] [Tags: "ConnectionId":"64"] destroying stream: 0 remaining
[2024-10-21 11:18:16.224][1][debug][router] [source/common/router/router.cc:1384] [Tags: "ConnectionId":"0","StreamId":"1431416887679520993"] upstream reset: reset reason: connection termination, transport failure reason: 
[2024-10-21 11:18:16.224][1][warning][config] [./source/extensions/config_subscription/grpc/grpc_stream.h:188] DeltaAggregatedResources gRPC config stream to local_agent closed: 13, 
[2024-10-21 11:18:16.224][1][debug][config] [source/extensions/config_subscription/grpc/grpc_subscription_impl.cc:125] gRPC update for type.googleapis.com/envoy.config.endpoint.v3.ClusterLoadAssignment failed
[2024-10-21 11:18:16.224][1][debug][config] [source/extensions/config_subscription/grpc/grpc_subscription_impl.cc:125] gRPC update for type.googleapis.com/envoy.config.listener.v3.Listener failed
[2024-10-21 11:18:16.224][1][debug][config] [source/extensions/config_subscription/grpc/grpc_subscription_impl.cc:125] gRPC update for type.googleapis.com/envoy.config.cluster.v3.Cluster failed
[2024-10-21 11:18:16.224][1][debug][pool] [source/common/conn_pool/conn_pool_base.cc:495] [Tags: "ConnectionId":"64"] client disconnected, failure reason: TLS_error:|33554464:system library:OPENSSL_internal:Broken pipe:TLS_error_end
[2024-10-21 11:18:16.224][1][debug][pool] [source/common/conn_pool/conn_pool_base.cc:463] invoking 1 idle callback(s) - is_draining_for_deletion_=false
[2024-10-21 11:18:16.425][1][debug][config] [./source/extensions/config_subscription/grpc/grpc_stream.h:63] Establishing new gRPC bidi stream to local_agent for rpc DeltaAggregatedResources(stream .envoy.service.discovery.v3.DeltaDiscoveryRequest) returns (stream .envoy.service.discovery.v3.DeltaDiscoveryResponse);

[2024-10-21 11:18:16.425][1][debug][router] [source/common/router/router.cc:527] [Tags: "ConnectionId":"0","StreamId":"1952948337656148254"] cluster 'local_agent' match for URL '/envoy.service.discovery.v3.AggregatedDiscoveryService/DeltaAggregatedResources'
[2024-10-21 11:18:16.425][1][debug][router] [source/common/router/router.cc:756] [Tags: "ConnectionId":"0","StreamId":"1952948337656148254"] router decoding headers:
':method', 'POST'
':path', '/envoy.service.discovery.v3.AggregatedDiscoveryService/DeltaAggregatedResources'
':authority', 'local_agent'
':scheme', 'http'
'te', 'trailers'
'content-type', 'application/grpc'
'x-envoy-internal', 'true'
'x-forwarded-for', '172.26.64.100'

[2024-10-21 11:18:16.425][1][debug][pool] [source/common/http/conn_pool_base.cc:78] queueing stream due to no available connections (ready=0 busy=0 connecting=0)
[2024-10-21 11:18:16.425][1][debug][pool] [source/common/conn_pool/conn_pool_base.cc:291] trying to create new connection
[2024-10-21 11:18:16.425][1][debug][pool] [source/common/conn_pool/conn_pool_base.cc:145] creating a new connection (connecting=0)
[2024-10-21 11:18:16.425][1][debug][http2] [source/common/http/http2/codec_impl.cc:1695] [Tags: "ConnectionId":"109"] updating connection-level initial window size to 268435456
[2024-10-21 11:18:16.426][1][debug][connection] [./source/common/network/connection_impl.h:98] [Tags: "ConnectionId":"109"] current connecting state: true
[2024-10-21 11:18:16.426][1][debug][client] [source/common/http/codec_client.cc:57] [Tags: "ConnectionId":"109"] connecting
[2024-10-21 11:18:16.426][1][debug][connection] [source/common/network/connection_impl.cc:1017] [Tags: "ConnectionId":"109"] connecting to alloc/tmp/consul_grpc.sock
[2024-10-21 11:18:16.426][1][debug][connection] [source/common/network/connection_impl.cc:746] [Tags: "ConnectionId":"109"] connected
[2024-10-21 11:18:16.426][1][debug][misc] [source/common/network/io_socket_error_impl.cc:64] Unknown error code 32 details Broken pipe
[2024-10-21 11:18:16.426][1][debug][connection] [source/common/tls/ssl_socket.cc:246] [Tags: "ConnectionId":"109"] remote address:alloc/tmp/consul_grpc.sock,TLS_error:|33554464:system library:OPENSSL_internal:Broken pipe:TLS_error_end
[2024-10-21 11:18:16.426][1][debug][connection] [source/common/network/connection_impl.cc:276] [Tags: "ConnectionId":"109"] closing socket: 0
[2024-10-21 11:18:16.426][1][debug][client] [source/common/http/codec_client.cc:107] [Tags: "ConnectionId":"109"] disconnect. resetting 0 pending requests
[2024-10-21 11:18:16.426][1][debug][pool] [source/common/conn_pool/conn_pool_base.cc:495] [Tags: "ConnectionId":"109"] client disconnected, failure reason: TLS_error:|33554464:system library:OPENSSL_internal:Broken pipe:TLS_error_end
[2024-10-21 11:18:16.426][1][debug][router] [source/common/router/router.cc:1384] [Tags: "ConnectionId":"0","StreamId":"1952948337656148254"] upstream reset: reset reason: remote connection failure, transport failure reason: TLS_error:|33554464:system library:OPENSSL_internal:Broken pipe:TLS_error_end
[2024-10-21 11:18:16.426][1][debug][http] [source/common/http/async_client_impl.cc:182] async http request response headers (end_stream=true):
':status', '200'
'content-type', 'application/grpc'
'grpc-status', '14'
'grpc-message', 'upstream connect error or disconnect/reset before headers. reset reason: remote connection failure, transport failure reason: TLS_error:|33554464:system library:OPENSSL_internal:Broken pipe:TLS_error_end'

[2024-10-21 11:18:16.426][1][debug][config] [./source/extensions/config_subscription/grpc/grpc_stream.h:195] DeltaAggregatedResources gRPC config stream to local_agent closed: 14, upstream connect error or disconnect/reset before headers. reset reason: remote connection failure, transport failure reason: TLS_error:|33554464:system library:OPENSSL_internal:Broken pipe:TLS_error_end
[2024-10-21 11:18:16.426][1][debug][config] [source/extensions/config_subscription/grpc/grpc_subscription_impl.cc:125] gRPC update for type.googleapis.com/envoy.config.endpoint.v3.ClusterLoadAssignment failed
[2024-10-21 11:18:16.426][1][debug][config] [source/extensions/config_subscription/grpc/grpc_subscription_impl.cc:125] gRPC update for type.googleapis.com/envoy.config.listener.v3.Listener failed
[2024-10-21 11:18:16.426][1][debug][config] [source/extensions/config_subscription/grpc/grpc_subscription_impl.cc:125] gRPC update for type.googleapis.com/envoy.config.cluster.v3.Cluster failed
[2024-10-21 11:18:16.426][1][debug][pool] [source/common/conn_pool/conn_pool_base.cc:463] invoking 1 idle callback(s) - is_draining_for_deletion_=false
[2024-10-21 11:18:16.782][1][debug][config] [./source/extensions/config_subscription/grpc/grpc_stream.h:63] Establishing new gRPC bidi stream to local_agent for rpc DeltaAggregatedResources(stream .envoy.service.discovery.v3.DeltaDiscoveryRequest) returns (stream .envoy.service.discovery.v3.DeltaDiscoveryResponse);

[2024-10-21 11:18:16.782][1][debug][router] [source/common/router/router.cc:527] [Tags: "ConnectionId":"0","StreamId":"13244006445772974244"] cluster 'local_agent' match for URL '/envoy.service.discovery.v3.AggregatedDiscoveryService/DeltaAggregatedResources'
[2024-10-21 11:18:16.782][1][debug][router] [source/common/router/router.cc:756] [Tags: "ConnectionId":"0","StreamId":"13244006445772974244"] router decoding headers:
':method', 'POST'
':path', '/envoy.service.discovery.v3.AggregatedDiscoveryService/DeltaAggregatedResources'
':authority', 'local_agent'
':scheme', 'http'
'te', 'trailers'
'content-type', 'application/grpc'
'x-envoy-internal', 'true'
'x-forwarded-for', '172.26.64.100'

[2024-10-21 11:18:16.782][1][debug][pool] [source/common/http/conn_pool_base.cc:78] queueing stream due to no available connections (ready=0 busy=0 connecting=0)
[2024-10-21 11:18:16.782][1][debug][pool] [source/common/conn_pool/conn_pool_base.cc:291] trying to create new connection
[2024-10-21 11:18:16.782][1][debug][pool] [source/common/conn_pool/conn_pool_base.cc:145] creating a new connection (connecting=0)
[2024-10-21 11:18:16.782][1][debug][http2] [source/common/http/http2/codec_impl.cc:1695] [Tags: "ConnectionId":"110"] updating connection-level initial window size to 268435456
[2024-10-21 11:18:16.782][1][debug][connection] [./source/common/network/connection_impl.h:98] [Tags: "ConnectionId":"110"] current connecting state: true
[2024-10-21 11:18:16.782][1][debug][client] [source/common/http/codec_client.cc:57] [Tags: "ConnectionId":"110"] connecting
[2024-10-21 11:18:16.782][1][debug][connection] [source/common/network/connection_impl.cc:1017] [Tags: "ConnectionId":"110"] connecting to alloc/tmp/consul_grpc.sock
[2024-10-21 11:18:16.783][1][debug][connection] [source/common/network/connection_impl.cc:746] [Tags: "ConnectionId":"110"] connected
[2024-10-21 11:18:16.783][1][debug][misc] [source/common/network/io_socket_error_impl.cc:64] Unknown error code 32 details Broken pipe
[2024-10-21 11:18:16.783][1][debug][connection] [source/common/tls/ssl_socket.cc:246] [Tags: "ConnectionId":"110"] remote address:alloc/tmp/consul_grpc.sock,TLS_error:|33554464:system library:OPENSSL_internal:Broken pipe:TLS_error_end
[2024-10-21 11:18:16.783][1][debug][connection] [source/common/network/connection_impl.cc:276] [Tags: "ConnectionId":"110"] closing socket: 0
[2024-10-21 11:18:16.783][1][debug][client] [source/common/http/codec_client.cc:107] [Tags: "ConnectionId":"110"] disconnect. resetting 0 pending requests
[2024-10-21 11:18:16.783][1][debug][pool] [source/common/conn_pool/conn_pool_base.cc:495] [Tags: "ConnectionId":"110"] client disconnected, failure reason: TLS_error:|33554464:system library:OPENSSL_internal:Broken pipe:TLS_error_end
[2024-10-21 11:18:16.783][1][debug][router] [source/common/router/router.cc:1384] [Tags: "ConnectionId":"0","StreamId":"13244006445772974244"] upstream reset: reset reason: remote connection failure, transport failure reason: TLS_error:|33554464:system library:OPENSSL_internal:Broken pipe:TLS_error_end
[2024-10-21 11:18:16.783][1][debug][http] [source/common/http/async_client_impl.cc:182] async http request response headers (end_stream=true):
':status', '200'
'content-type', 'application/grpc'
'grpc-status', '14'
'grpc-message', 'upstream connect error or disconnect/reset before headers. reset reason: remote connection failure, transport failure reason: TLS_error:|33554464:system library:OPENSSL_internal:Broken pipe:TLS_error_end'

[2024-10-21 11:18:16.783][1][debug][config] [./source/extensions/config_subscription/grpc/grpc_stream.h:232] DeltaAggregatedResources gRPC config stream to local_agent closed: 14, upstream connect error or disconnect/reset before headers. reset reason: remote connection failure, transport failure reason: TLS_error:|33554464:system library:OPENSSL_internal:Broken pipe:TLS_error_end
[2024-10-21 11:18:16.783][1][debug][config] [source/extensions/config_subscription/grpc/grpc_subscription_impl.cc:125] gRPC update for type.googleapis.com/envoy.config.endpoint.v3.ClusterLoadAssignment failed
[2024-10-21 11:18:16.783][1][debug][config] [source/extensions/config_subscription/grpc/grpc_subscription_impl.cc:125] gRPC update for type.googleapis.com/envoy.config.listener.v3.Listener failed
[2024-10-21 11:18:16.783][1][debug][config] [source/extensions/config_subscription/grpc/grpc_subscription_impl.cc:125] gRPC update for type.googleapis.com/envoy.config.cluster.v3.Cluster failed
[2024-10-21 11:18:16.783][1][debug][pool] [source/common/conn_pool/conn_pool_base.cc:463] invoking 1 idle callback(s) - is_draining_for_deletion_=false
[2024-10-21 11:18:18.409][1][debug][config] [./source/extensions/config_subscription/grpc/grpc_stream.h:63] Establishing new gRPC bidi stream to local_agent for rpc DeltaAggregatedResources(stream .envoy.service.discovery.v3.DeltaDiscoveryRequest) returns (stream .envoy.service.discovery.v3.DeltaDiscoveryResponse);

[2024-10-21 11:18:18.409][1][debug][router] [source/common/router/router.cc:527] [Tags: "ConnectionId":"0","StreamId":"11959217123621313443"] cluster 'local_agent' match for URL '/envoy.service.discovery.v3.AggregatedDiscoveryService/DeltaAggregatedResources'
[2024-10-21 11:18:18.409][1][debug][router] [source/common/router/router.cc:756] [Tags: "ConnectionId":"0","StreamId":"11959217123621313443"] router decoding headers:
':method', 'POST'
':path', '/envoy.service.discovery.v3.AggregatedDiscoveryService/DeltaAggregatedResources'
':authority', 'local_agent'
':scheme', 'http'
'te', 'trailers'
'content-type', 'application/grpc'
'x-envoy-internal', 'true'
'x-forwarded-for', '172.26.64.100'

[2024-10-21 11:18:18.409][1][debug][pool] [source/common/http/conn_pool_base.cc:78] queueing stream due to no available connections (ready=0 busy=0 connecting=0)
[2024-10-21 11:18:18.410][1][debug][pool] [source/common/conn_pool/conn_pool_base.cc:291] trying to create new connection
[2024-10-21 11:18:18.410][1][debug][pool] [source/common/conn_pool/conn_pool_base.cc:145] creating a new connection (connecting=0)
[2024-10-21 11:18:18.410][1][debug][http2] [source/common/http/http2/codec_impl.cc:1695] [Tags: "ConnectionId":"111"] updating connection-level initial window size to 268435456
[2024-10-21 11:18:18.410][1][debug][connection] [./source/common/network/connection_impl.h:98] [Tags: "ConnectionId":"111"] current connecting state: true
[2024-10-21 11:18:18.410][1][debug][client] [source/common/http/codec_client.cc:57] [Tags: "ConnectionId":"111"] connecting
[2024-10-21 11:18:18.410][1][debug][connection] [source/common/network/connection_impl.cc:1017] [Tags: "ConnectionId":"111"] connecting to alloc/tmp/consul_grpc.sock
[2024-10-21 11:18:18.410][1][debug][connection] [source/common/network/connection_impl.cc:746] [Tags: "ConnectionId":"111"] connected
[2024-10-21 11:18:18.415][1][debug][client] [source/common/http/codec_client.cc:88] [Tags: "ConnectionId":"111"] connected
[2024-10-21 11:18:18.415][1][debug][pool] [source/common/conn_pool/conn_pool_base.cc:328] [Tags: "ConnectionId":"111"] attaching to next stream
[2024-10-21 11:18:18.415][1][debug][pool] [source/common/conn_pool/conn_pool_base.cc:182] [Tags: "ConnectionId":"111"] creating stream
[2024-10-21 11:18:18.415][1][debug][router] [source/common/router/upstream_request.cc:593] [Tags: "ConnectionId":"0","StreamId":"11959217123621313443"] pool ready
@suikast42 suikast42 added the type/bug Feature does not function as expected label Oct 21, 2024
@tgross
Copy link
Member

tgross commented Nov 8, 2024

Looks like from these logs that the tasks are coming up and the Envoy proxy is getting its bootstrap configuration. I'm going to move this issue to the Consul repo as now we're firmly in Consul territory.

@tgross tgross transferred this issue from hashicorp/nomad Nov 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type/bug Feature does not function as expected
Projects
None yet
Development

No branches or pull requests

2 participants