Service Mesh is not balancing the workload evenly across multiple instances in Nomad cluster. #21778

ruslan-y · 2024-09-20T14:37:26Z

Hi!

I have 10 gateway instances (Ingress Gateway) and 10 "Proxy A" (Envoy) instances to which traffic from the gateway goes.
Than I have 2 "Proxy B" instances (Envoy also) that accept traffic from "Proxy A".

Load balancing from "Proxy A" to "Proxy B" working incorrect.

From the network load on the host, I can see that traffic is only going to one instance "Proxy B".

1st instance of Proxy B

2nd instance of Proxy B

If I redeployed nomad job "Proxy A" a workload balancing is going correctly.

1st instance of Proxy B

2nd instance of Proxy B

But when I redeployed the nomad job "Proxy B" the balancing "breaks down" again.

I tried writing the congfiguration to Consul for the "Proxy A"

Kind           = "service-resolver"
Name           = "Proxy A"
LoadBalancer = {
  Policy = "round_robin"
}

and for "Proxy B"

Kind           = "service-resolver"
Name           = "Proxy B"
LoadBalancer = {
  Policy = "round_robin"
}

Envoy configuration file of "Proxy A"

node:
  cluster: test
  id: proxy_a

admin:
  access_log:
  - name: admin_access
    typed_config:
      "@type": type.googleapis.com/envoy.extensions.access_loggers.file.v3.FileAccessLog
      path: {{ env "NOMAD_ALLOC_DIR" }}/logs/admin_access.log
  address:
    socket_address:
      address: 0.0.0.0 
      port_value: 19901

dynamic_resources:
  ads_config:
    api_type: DELTA_GRPC
    transport_api_version: V3
    grpc_services:
      - envoy_grpc:
          cluster_name: xds_grpc

static_resources:
  listeners:
    - name: proxy_a
      address:
        socket_address:
          address: 0.0.0.0
          port_value: 8162
      filter_chains:
        - filters:
          - name: envoy.http_connection_manager
            typed_config:
              "@type": type.googleapis.com/envoy.extensions.filters.network.http_connection_manager.v3.HttpConnectionManager
              stat_prefix: proxy_a
              http2_protocol_options:
                allow_connect: true
              upgrade_configs:
              - upgrade_type: websocket
              rds:
                route_config_name: proxy_a
                config_source:
                  resource_api_version: V3
                  ads: {}
              http_filters:
              - name: envoy.filters.http.grpc_web
                typed_config:
                  "@type": type.googleapis.com/envoy.extensions.filters.http.grpc_web.v3.GrpcWeb
              - name: envoy.filters.http.router
                typed_config:
                  "@type": type.googleapis.com/envoy.extensions.filters.http.router.v3.Router
          transport_socket:
            name: envoy.transport_sockets.tls
            typed_config:
              "@type": type.googleapis.com/envoy.extensions.transport_sockets.tls.v3.DownstreamTlsContext
              require_client_certificate: true
              common_tls_context:
                validation_context:
                  trusted_ca:
                    filename: {{ env "NOMAD_SECRETS_DIR" }}/gateway/tls/gateway-ca.pem
                tls_certificates:
                - certificate_chain:
                    filename: {{ env "NOMAD_SECRETS_DIR" }}/gateway/tls/downstream.pem
                  private_key:
                    filename: {{ env "NOMAD_SECRETS_DIR" }}/gateway/tls/downstream-key.pem
                alpn_protocols: [ "h2,http/1.1" ]

  clusters:
    - name: proxy_b
      connect_timeout: 0.25s
      type: STATIC
      lb_policy: ROUND_ROBIN
      load_assignment:
        cluster_name: proxy_b
        endpoints:
          - lb_endpoints:
              - endpoint:
                  address:
                    socket_address:
                      address: {{ env "NOMAD_UPSTREAM_IP_proxy_b" }}
                      port_value: {{ env "NOMAD_UPSTREAM_PORT_proxy_b" }}
      circuit_breakers:
        thresholds:
          - priority: "DEFAULT"
            max_connections: 1000000000
            max_pending_requests: 1000000000
            max_requests: 1000000000
            max_retries: 1000000000
            retry_budget:
              budget_percent:
                value: 25.0
            track_remaining: true
          - priority: "HIGH"
            max_connections: 1000000000
            max_pending_requests: 1000000000
            max_requests: 1000000000
            max_retries: 1000000000
            retry_budget:
              budget_percent:
                value: 25.0
            track_remaining: true
      typed_extension_protocol_options:
          envoy.extensions.upstreams.http.v3.HttpProtocolOptions:
            "@type": type.googleapis.com/envoy.extensions.upstreams.http.v3.HttpProtocolOptions
            explicit_http_config:
              http2_protocol_options: {}

    - name: xds_grpc
      load_assignment:
        cluster_name: xds_grpc
        endpoints:
          - lb_endpoints:
              - endpoint:
                  address:
                    socket_address:
                      address: {{ env "NOMAD_UPSTREAM_IP_[[ .my.xds_upstream ]]" }}
                      port_value: {{ env "NOMAD_UPSTREAM_PORT_[[ .my.xds_upstream ]]" }}
      circuit_breakers:
        thresholds:
          - priority: "DEFAULT"
            max_connections: 1000000000
            max_pending_requests: 1000000000
            max_requests: 1000000000
            max_retries: 1000000000
            retry_budget:
              budget_percent:
                value: 25.0
            track_remaining: true
          - priority: "HIGH"
            max_connections: 1000000000
            max_pending_requests: 1000000000
            max_requests: 1000000000
            max_retries: 1000000000
            retry_budget:
              budget_percent:
                value: 25.0
            track_remaining: true
      typed_extension_protocol_options:
          envoy.extensions.upstreams.http.v3.HttpProtocolOptions:
            "@type": type.googleapis.com/envoy.extensions.upstreams.http.v3.HttpProtocolOptions
            upstream_http_protocol_options:
              auto_sni: true
            common_http_protocol_options:
              idle_timeout: 1s
            explicit_http_config:
              http2_protocol_options:
                max_concurrent_streams: 100

Server nomad version

Nomad v1.8.3
BuildDate 2024-08-13T07:37:30Z
Revision 63b636e5cbaca312cf6ea63e040f445f05f00478

Server consul version

Consul v1.19.1
Revision 9f62fb41
Build Date 2024-07-11T14:47:27Z

Client nomad version

Nomad v1.5.6
BuildDate 2023-05-19T18:26:13Z
Revision 8af70885c02ab921dedbdf6bc406a1e886866f80

Client consul version

Consul v1.14.7
Revision d97acc0a
Build Date 2023-05-16T01:36:41Z

The text was updated successfully, but these errors were encountered:

blake changed the title ~~Service Mash is not balancing the workload evenly across multiple instances in Nomad cluster.~~ Service Mesh is not balancing the workload evenly across multiple instances in Nomad cluster. Oct 3, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Service Mesh is not balancing the workload evenly across multiple instances in Nomad cluster. #21778

Service Mesh is not balancing the workload evenly across multiple instances in Nomad cluster. #21778

ruslan-y commented Sep 20, 2024

Service Mesh is not balancing the workload evenly across multiple instances in Nomad cluster. #21778

Service Mesh is not balancing the workload evenly across multiple instances in Nomad cluster. #21778

Comments

ruslan-y commented Sep 20, 2024

Server nomad version

Server consul version

Client nomad version

Client consul version