Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Service Mesh is not balancing the workload evenly across multiple instances in Nomad cluster. #21778

Open
ruslan-y opened this issue Sep 20, 2024 · 0 comments

Comments

@ruslan-y
Copy link

Hi!

I have 10 gateway instances (Ingress Gateway) and 10 "Proxy A" (Envoy) instances to which traffic from the gateway goes.
Than I have 2 "Proxy B" instances (Envoy also) that accept traffic from "Proxy A".

Load balancing from "Proxy A" to "Proxy B" working incorrect.

From the network load on the host, I can see that traffic is only going to one instance "Proxy B".

1st instance of Proxy B
image

2nd instance of Proxy B
image

If I redeployed nomad job "Proxy A" a workload balancing is going correctly.

1st instance of Proxy B
image

2nd instance of Proxy B
image

But when I redeployed the nomad job "Proxy B" the balancing "breaks down" again.

I tried writing the congfiguration to Consul for the "Proxy A"

Kind           = "service-resolver"
Name           = "Proxy A"
LoadBalancer = {
  Policy = "round_robin"
}

and for "Proxy B"

Kind           = "service-resolver"
Name           = "Proxy B"
LoadBalancer = {
  Policy = "round_robin"
}
Envoy configuration file of "Proxy A"
node:
  cluster: test
  id: proxy_a

admin:
  access_log:
  - name: admin_access
    typed_config:
      "@type": type.googleapis.com/envoy.extensions.access_loggers.file.v3.FileAccessLog
      path: {{ env "NOMAD_ALLOC_DIR" }}/logs/admin_access.log
  address:
    socket_address:
      address: 0.0.0.0 
      port_value: 19901

dynamic_resources:
  ads_config:
    api_type: DELTA_GRPC
    transport_api_version: V3
    grpc_services:
      - envoy_grpc:
          cluster_name: xds_grpc

static_resources:
  listeners:
    - name: proxy_a
      address:
        socket_address:
          address: 0.0.0.0
          port_value: 8162
      filter_chains:
        - filters:
          - name: envoy.http_connection_manager
            typed_config:
              "@type": type.googleapis.com/envoy.extensions.filters.network.http_connection_manager.v3.HttpConnectionManager
              stat_prefix: proxy_a
              http2_protocol_options:
                allow_connect: true
              upgrade_configs:
              - upgrade_type: websocket
              rds:
                route_config_name: proxy_a
                config_source:
                  resource_api_version: V3
                  ads: {}
              http_filters:
              - name: envoy.filters.http.grpc_web
                typed_config:
                  "@type": type.googleapis.com/envoy.extensions.filters.http.grpc_web.v3.GrpcWeb
              - name: envoy.filters.http.router
                typed_config:
                  "@type": type.googleapis.com/envoy.extensions.filters.http.router.v3.Router
          transport_socket:
            name: envoy.transport_sockets.tls
            typed_config:
              "@type": type.googleapis.com/envoy.extensions.transport_sockets.tls.v3.DownstreamTlsContext
              require_client_certificate: true
              common_tls_context:
                validation_context:
                  trusted_ca:
                    filename: {{ env "NOMAD_SECRETS_DIR" }}/gateway/tls/gateway-ca.pem
                tls_certificates:
                - certificate_chain:
                    filename: {{ env "NOMAD_SECRETS_DIR" }}/gateway/tls/downstream.pem
                  private_key:
                    filename: {{ env "NOMAD_SECRETS_DIR" }}/gateway/tls/downstream-key.pem
                alpn_protocols: [ "h2,http/1.1" ]

  clusters:
    - name: proxy_b
      connect_timeout: 0.25s
      type: STATIC
      lb_policy: ROUND_ROBIN
      load_assignment:
        cluster_name: proxy_b
        endpoints:
          - lb_endpoints:
              - endpoint:
                  address:
                    socket_address:
                      address: {{ env "NOMAD_UPSTREAM_IP_proxy_b" }}
                      port_value: {{ env "NOMAD_UPSTREAM_PORT_proxy_b" }}
      circuit_breakers:
        thresholds:
          - priority: "DEFAULT"
            max_connections: 1000000000
            max_pending_requests: 1000000000
            max_requests: 1000000000
            max_retries: 1000000000
            retry_budget:
              budget_percent:
                value: 25.0
            track_remaining: true
          - priority: "HIGH"
            max_connections: 1000000000
            max_pending_requests: 1000000000
            max_requests: 1000000000
            max_retries: 1000000000
            retry_budget:
              budget_percent:
                value: 25.0
            track_remaining: true
      typed_extension_protocol_options:
          envoy.extensions.upstreams.http.v3.HttpProtocolOptions:
            "@type": type.googleapis.com/envoy.extensions.upstreams.http.v3.HttpProtocolOptions
            explicit_http_config:
              http2_protocol_options: {}

    - name: xds_grpc
      load_assignment:
        cluster_name: xds_grpc
        endpoints:
          - lb_endpoints:
              - endpoint:
                  address:
                    socket_address:
                      address: {{ env "NOMAD_UPSTREAM_IP_[[ .my.xds_upstream ]]" }}
                      port_value: {{ env "NOMAD_UPSTREAM_PORT_[[ .my.xds_upstream ]]" }}
      circuit_breakers:
        thresholds:
          - priority: "DEFAULT"
            max_connections: 1000000000
            max_pending_requests: 1000000000
            max_requests: 1000000000
            max_retries: 1000000000
            retry_budget:
              budget_percent:
                value: 25.0
            track_remaining: true
          - priority: "HIGH"
            max_connections: 1000000000
            max_pending_requests: 1000000000
            max_requests: 1000000000
            max_retries: 1000000000
            retry_budget:
              budget_percent:
                value: 25.0
            track_remaining: true
      typed_extension_protocol_options:
          envoy.extensions.upstreams.http.v3.HttpProtocolOptions:
            "@type": type.googleapis.com/envoy.extensions.upstreams.http.v3.HttpProtocolOptions
            upstream_http_protocol_options:
              auto_sni: true
            common_http_protocol_options:
              idle_timeout: 1s
            explicit_http_config:
              http2_protocol_options:
                max_concurrent_streams: 100

Server nomad version

Nomad v1.8.3
BuildDate 2024-08-13T07:37:30Z
Revision 63b636e5cbaca312cf6ea63e040f445f05f00478

Server consul version

Consul v1.19.1
Revision 9f62fb41
Build Date 2024-07-11T14:47:27Z

Client nomad version

Nomad v1.5.6
BuildDate 2023-05-19T18:26:13Z
Revision 8af70885c02ab921dedbdf6bc406a1e886866f80

Client consul version

Consul v1.14.7
Revision d97acc0a
Build Date 2023-05-16T01:36:41Z
@blake blake changed the title Service Mash is not balancing the workload evenly across multiple instances in Nomad cluster. Service Mesh is not balancing the workload evenly across multiple instances in Nomad cluster. Oct 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant