Talaria rehasher is dropping all the WS connections when a new talaria instance is added in consul. #54
Replies: 8 comments 17 replies
-
Do you have any metrics from the affected talarias? It would be helpful to see |
Beta Was this translation helpful? Give feedback.
-
There are docker files in each of our repos that you can use to verify where the problem might be. The talaria Dockerfile will run a talaria that you can configure locally. You will probably have to tweak your configuration a bit, as your consul setup might not work exactly the same. |
Beta Was this translation helpful? Give feedback.
-
I don't know if your description of your configuration is accurate, or if you have actually mis-configured Kratos. If your client is configured to connect directly to a Talaria instance (instead of a Petasos instance), then that client doesn't know how to connect to other instances. It is stuck to that one Talaria instance. In addition, if your cluster configuration changes from 1 Talaria nodes to 4 Talaria nodes, you should expect a great deal of flapping. The cluster make up as reflected in Consul isn't atomic. The change as seen by a single Talaria may be "1 -> 3 -> 4", or "1 -> 2 -> 4" or some other combination. The algorithm allows for eventual consistency, but it does not prescribe the path to that consistency unless the delta is exactly 1. |
Beta Was this translation helpful? Give feedback.
-
Hi @JC000 @johnabass |
Beta Was this translation helpful? Give feedback.
-
Hi @JC000 and @johnabass Here are the logs for 6 simulators and Kratos. |
Beta Was this translation helpful? Give feedback.
-
I just noticed your consul section may not be identifying each talaria appropriately:
Are you configuring each talaria to register with a unique service id? If not, then Since our deployments have each talaria using a different FQDN, we use the FQDN as our service id. But it doesn't matter what you use, as long as it's unique for each talaria instance. Also, make sure services are registered with different addresses. More than one service with the same address is allowed by |
Beta Was this translation helpful? Give feedback.
-
@Sachin4403 : After looking through the log, the issue that @johnabass mentioned is your problem. He pointed out the
Note that the incorrect use of an URL caused the instance to have |
Beta Was this translation helpful? Give feedback.
-
Hi @JC000 @johnabass, Thanks for your inputs, it was a configuration issue which I had corrected it post that rehasher is working fine as it is expected. cc: @schmidtw |
Beta Was this translation helpful? Give feedback.
-
Hi Team,
I am using Kratos to create the WS connection with a talaria instance called talaria0. In the end, talaria0 will have 3000 WS connections. now i am adding 3 new talaria instances in consul. ideally, rehasher should rebalance it like 750 WS at each talaria but whenever we are receiving an event from the consul about new talaria instances it is dropping all of the connections.
We are using consul
31.1
version with server enabled https://github.com/hashicorp/consul-helm/blob/6a1a2d3cd5d69b3a3f8109dce3d4663089559d0b/values.yaml#L301 and UI enabled https://github.com/hashicorp/consul-helm/blob/6a1a2d3cd5d69b3a3f8109dce3d4663089559d0b/values.yaml#L1022Talaria version
v0.5.11
Talaria Configuration
Please let me know if you need anything else from my side.
cc: @schmidtw
Beta Was this translation helpful? Give feedback.
All reactions