Redid handling of link down to ensure tear down of old path #583

Ktmi · 2024-12-04T16:48:03Z

Closes #517

Summary

Replaces the procedure for handling of link down events to ensure that the old_path is torn down before fully committing changes to the database.

Local Tests

This still needs some tweaking, but the general shape of the solution is here. Will have to develop a test harness which can replicate the old issue reliably.

E2E Tests

As mentioned with the local tests, this issue wasn't caught by the E2E tests, so a test harness needs to be developed.

viniarck

@Ktmi I read in the description your still working on it and with TODOs in the code. Overall, it's headed in the right direction. Yes, first and foremost we definitely need a local way of reproducing this either deterministically or with a high likelihood to assess with concrete data how it's performing in terms of correctness and failover convergence control plane and data plane switchover flows performance, and e2e test maybe we don't need fully exact one but one that simulates a closely double failure, and other cases like failing both current_path and failover_path when they're not fully disjoint.

Other than that, when you need a final review let me know. I did a partial review with some things that were a bit obvious, other parts I won't assume their status or what happened since I don't know if you'll still address based on what we discussed.

viniarck · 2024-12-04T17:33:25Z

main.py

-                    # tradditional way
-                    if (
+        with ExitStack() as exit_stack:
+            exit_stack.enter_context(self._lock)


handle_link_down will always be executed single threaded, self._lock was meant only for consistency check related execute loop, I wouldn't recommend mixing it here, so until we get to consistency part I wouldn't even touch this.

Each evc.lock though yes, those ones we need to make sure that when acquired by this single threaded exec no other concurrent threads would be performing other side effects. The ExitStack() usage is great, if it'll be used, needs to also ensure that deletions are also executed atomically. So, if you can review this.

In summary: all affected EVCs need to build/map the flows and perform the flows removals and installations with the bulk request that we discussed in the issue.

Alright, I'll change it to a separate lock. A lock is necessary here if we where to have multiple threads that wanted to hold multiple EVC locks, so I'll add one that is named explicitly for such purposes.

viniarck · 2024-12-04T17:39:08Z

main.py

-                    ):
-                        evcs_normal.append(evc)
+                    )):
+                        redeploy.append(evc)


if evc.is_affected_by_link(link) is True
and evc.is_failover_path_affected_by_link(link) is True

failover_path needs to be cleaned too. Otherwise this can end up keeping a failover_path that isn't UP. Can you review this conditional and test it?

Well, that's what the old logic was saying to do here. But I do see your point. Looking at the original code, it wouldn't ever tear down the failover path. Just tear down, then setup the current path and try to deploy a failover path if it doesn't already exist.

Redid handling of link down

8b24f76

Ktmi requested a review from a team as a code owner December 4, 2024 16:48

Linter cleanup, added some todos

d381d42

viniarck reviewed Dec 4, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Redid handling of link down to ensure tear down of old path #583

Redid handling of link down to ensure tear down of old path #583

Ktmi commented Dec 4, 2024

viniarck left a comment •

edited

Loading

viniarck Dec 4, 2024

viniarck Dec 4, 2024

Ktmi Dec 4, 2024

viniarck Dec 4, 2024

Ktmi Dec 4, 2024

Redid handling of link down to ensure tear down of old path #583

Are you sure you want to change the base?

Redid handling of link down to ensure tear down of old path #583

Conversation

Ktmi commented Dec 4, 2024

Summary

Local Tests

E2E Tests

viniarck left a comment • edited Loading

Choose a reason for hiding this comment

viniarck Dec 4, 2024

Choose a reason for hiding this comment

viniarck Dec 4, 2024

Choose a reason for hiding this comment

Ktmi Dec 4, 2024

Choose a reason for hiding this comment

viniarck Dec 4, 2024

Choose a reason for hiding this comment

Ktmi Dec 4, 2024

Choose a reason for hiding this comment

viniarck left a comment •

edited

Loading