Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

networkmanager: add support for "routing-policy" (LP: #2086544) #526

Merged
merged 3 commits into from
Nov 14, 2024

Conversation

drafnel
Copy link
Contributor

@drafnel drafnel commented Nov 2, 2024

Description

NetworkManager supports ip rules via the "routing-ruleX" field. Let's teach the networkmanager backend to emit "routing-ruleX" entries for each "routing-policy" entry in the netplan yaml.

For example, the following netplan configuration snippet:

    eth0:
      addresses:
      - "10.1.2.3/24"
      routing-policy:
        - to: 8.7.6.5
          table: 254
          priority: 10
        - to: 1.2.3.4
          from: 8.7.6.5
          table: 253
          mark: 18
          type-of-service: 4
          priority: 11

..should generate something like the following in the NetworkManager configuration:

    [ipv4]
    method=manual
    address1=10.1.2.3/24
    routing-rule1=to 8.7.6.5 table 254 priority 10
    routing-rule2=from 8.7.6.5 to 1.2.3.4 tos 4 fwmark 18 table 253 priority 11

Checklist

  • Runs make check successfully.
  • Retains code coverage (make check-coverage).
  • New/changed keys in YAML format are documented.
  • (Optional) Adds example YAML for new feature.
  • (Optional) Closes an open bug in Launchpad.

@drafnel drafnel force-pushed the nm-support-routing-policy branch 3 times, most recently from f7e1f21 to cbc9908 Compare November 3, 2024 17:19
@drafnel
Copy link
Contributor Author

drafnel commented Nov 4, 2024

Are the two checks that are failing broken? Guidance here would be appreciated.

@slyon slyon self-requested a review November 6, 2024 14:27
@slyon slyon changed the title networkmanager: add support for "routing-policy" networkmanager: add support for "routing-policy" (LP: #2086544) Nov 6, 2024
@slyon
Copy link
Collaborator

slyon commented Nov 6, 2024

Are the two checks that are failing broken? Guidance here would be appreciated.

Yes, those failures seem to be unrelated.

Copy link
Collaborator

@slyon slyon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you very much for your contribution to Netplan! This is a valuable change and much appreciated.

I especially like the fact that you adopted the corresponding integration tests, so we can be sure it works on real systems!

For future work, may I suggest implementing the other way around, too? E.g. update src/parse-nm.c (similar to the existing parse_routes() function), to read a NM keyfile, which should then be picked up automatically by src/netplan.c to emit corresponding YAML. But this could be done in a follow-up PR, does not need to be part of this PR.

src/nm.c Outdated Show resolved Hide resolved
src/nm.c Show resolved Hide resolved
src/nm.c Outdated Show resolved Hide resolved
tests/integration/routing.py Show resolved Hide resolved
src/parse.c Outdated Show resolved Hide resolved
@drafnel
Copy link
Contributor Author

drafnel commented Nov 6, 2024

For future work, may I suggest implementing the other way around, too? E.g. update src/parse-nm.c (similar to the existing parse_routes() function), to read a NM keyfile, which should then be picked up automatically by src/netplan.c to emit corresponding YAML.

I've been looking at that. Do you have any thoughts about how best to handle NetworkManager routing-rules that contain elements not supported by netplan? We can obviously skip them and they will be imported as pass-through NetworkManager options. The problem is when there are multiple routing-ruleX entries and some can be represented natively by the netplan yaml and some cannot. In that case, I suspect we cannot just naively convert some of the rules to native netplan and allow the others to be recorded as pass-through entries because the NetworkManager rules are numbered, like "routing-rule2". If we record some of the rules in native netplan and some as pass-through rules and then generate the NetworkManager configuration, then new NetworkManager rules will be emitted starting with "routing-rule1", "routing-rule2", potentially clobbering pass-through rules with the same numbering.

For example, consider a NetworkManager keyfile containing:

routing-rule1=from 1.2.3.4 to 5.6.7.8 tos 4 fwmark 0xca9e iif eth0 oif eth1 pref uidrange 1000-1010 ipproto 6 sport 1024-2048 dport 4096-8192 tun_id 13 priority 99 table 254
routing-rule2=priority 99 to 1.2.3.4 table 254

routing-rule2 would be imported and recorded natively as a netplan routing-policy entry while routing-rule1 would be recorded as a pass-through rule since it contains fields not supported by netplan. When the NetworkManager config is then generated from the netplan yaml, it would emit a "routing-rule1" from its own "routing-policy" and then pass through the original "routing-rule1" from the pass-through section.

The easiest solution may be to just have an all-or-none policy when importing the NetworkManager routing-ruleXs. Otherwise it seems like we'd need to be aware of any routing-ruleXs in the pass-through section and take them into account when generating the NetworkManager config. Thoughts?

@slyon
Copy link
Collaborator

slyon commented Nov 12, 2024

For example, consider a NetworkManager keyfile containing:

routing-rule1=from 1.2.3.4 to 5.6.7.8 tos 4 fwmark 0xca9e iif eth0 oif eth1 pref uidrange 1000-1010 ipproto 6 sport 1024-2048 dport 4096-8192 tun_id 13 priority 99 table 254
routing-rule2=priority 99 to 1.2.3.4 table 254

routing-rule2 would be imported and recorded natively as a netplan routing-policy entry while routing-rule1 would be recorded as a pass-through rule since it contains fields not supported by netplan. When the NetworkManager config is then generated from the netplan yaml, it would emit a "routing-rule1" from its own "routing-policy" and then pass through the original "routing-rule1" from the pass-through section.

Yes, this is a very valid concern!

The easiest solution may be to just have an all-or-none policy when importing the NetworkManager routing-ruleXs. Otherwise it seems like we'd need to be aware of any routing-ruleXs in the pass-through section and take them into account when generating the NetworkManager config. Thoughts?

I think one relatively easy approach could be parse all the routing-ruleXs into native Netplan routing-policy, logging a warning when we cannot map any of them. But then do not clear any of them from the keyfile/passthrough (don't call _kf_clear_key(...) for any routing-ruleX). This way we will be able to save a Netplan native mapping, but in the worst case everything would be overwritten by the passthrough later on.

Then, in src/validation.c, we can check if the amount of native routing-policy rules matches the routing-ruleXs in NM.passthrough, and clear all the passthrough values. WDYT about this approach?

PS: As mentioned above, please consider this additional work as a non-blocking request, which can easily be delivered in a follow-up PR in the future.

@drafnel drafnel force-pushed the nm-support-routing-policy branch from cbc9908 to cc5d82d Compare November 13, 2024 04:20
Copy link
Collaborator

@slyon slyon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you very much for addressing my concerns. This is now ready to be merged!

@drafnel Do you want to add anything on top of it (from the "future work" discussion), or shall we get this merged, first?

@drafnel
Copy link
Contributor Author

drafnel commented Nov 13, 2024 via email

@drafnel
Copy link
Contributor Author

drafnel commented Nov 14, 2024

The easiest solution may be to just have an all-or-none policy when importing the NetworkManager routing-ruleXs. Otherwise it seems like we'd need to be aware of any routing-ruleXs in the pass-through section and take them into account when generating the NetworkManager config. Thoughts?

I think one relatively easy approach could be parse all the routing-ruleXs into native Netplan routing-policy, logging a warning when we cannot map any of them. But then do not clear any of them from the keyfile/passthrough (don't call _kf_clear_key(...) for any routing-ruleX). This way we will be able to save a Netplan native mapping, but in the worst case everything would be overwritten by the passthrough later on.

Then, in src/validation.c, we can check if the amount of native routing-policy rules matches the routing-ruleXs in NM.passthrough, and clear all the passthrough values. WDYT about this approach?

Were you thinking of doing this check in parse_nm.c:read_passthrough()? It doesn't look like the Netplan object is available within that function to be able to count the number of native routing-policy rules.

It seems like it would be fine to do this in the same function that parses the rules (i.e. parse_ip_rules()).

But, how concerned are you about having two versions of the routing policy configuration in the same file and the possibility of someone editing the netplan yaml by hand? It sounds like that's what you were suggesting if we parse each routing-ruleX, if possible, into a native routing-policy config and keep all of the routing-ruleX entries as pass-through configs when we fail to parse any of the routing-ruleXs into routing-policy configs. So we could end up with something like:

    routing-policy:
    - table: 254
      priority: 99
      to: "1.2.3.4"
    - table: 252
      priority: 97
      to: "5.6.7.8"
    networkmanager:
      passthrough:
        ipv4.routing-rule1: priority 99 to 1.2.3.4 table 254
        ipv4.routing-rule2: priority 98 iif eth1 dport 1024-2048 table 253
        ipv4.routing-rule3: priority 97 to 5.6.7.8 table 252

I had thought that a reasonable approach, based on your original suggestion, might be to clear each passthrough rule as long as we've been able to successfully convert the routing-ruleX into a routing-policy rule, but once we encounter a rule that cannot be converted, stop clearing the routing-ruleXs. We still convert as many routing-ruleXs as possible into routing-policy rules, but the ones that are converted after the one we failed to parse will be overwritten by the pass-through rules. For example, the above snippet would become something like:

    routing-policy:
    - table: 254
      priority: 99
      to: "1.2.3.4"
    - table: 252
      priority: 97
      to: "5.6.7.8"
    networkmanager:
      passthrough:
        ipv4.routing-rule2: priority 98 iif eth1 dport 1024-2048 table 253
        ipv4.routing-rule3: priority 97 to 5.6.7.8 table 252

i.e. since routing-rule1 was successfully parsed, it was cleared. The routing-rule2 would fail to convert since we don't support 'iif' etc. so then we stop clearing the routing-ruleXs. routing-rule3 is successfully converted to a routing-policy, but we still keep the routing-rule3. When this is converted back to NetworkManager key file, then the first routing-policy would become routing-rule1, routing-rule2 would override the second routing-policy (which was derived from routing-rule3), and routing-rule3 would pass through.

But that seems a little fragile and confusing to me to mix the two configuration forms like this. Is there actually any value in writing the routing-policy rules if they'll be overridden or supplemented by the routing-ruleX rules? I don't have enough experience with netplan to feel strongly about this either way. Or should we just take an all-or-none approach and refrain from emitting any routing-policy entries if we encounter a routing-ruleX that we cannot convert?

NetworkManager supports ip rules via the "routing-ruleX" field.  Let's
teach the networkmanager backend to emit "routing-ruleX" entries for
each "routing-policy" entry in the netplan yaml.

Update the tests for the new behavior and add additional tests to
exercise the new code paths.
Since the check of whether r->table == nd->vrf_table has already failed
in the initial 'if' conditional, r->table is necessarily !=
nd->vrf_table.  It is not necessary to explicitly check for this state
in the 'else if' conditional.  Let's not.
The iproute2 guidance says "Each rule should have an explicitly set
unique priority value", and further warns:

    For historical reasons ip rule add does not require any priority
    value and allows the priority value to be non-unique. If the user
    had not supplied a priority value then one was assigned by the
    kernel. If the user requested creating a rule with a priority value
    which already existed then the kernel did not reject the request and
    added the new rule before all old rules of the same priority. This
    is a mistake in the current design, nothing more. It should be fixed
    by the time you read this so please do not rely on this feature. You
    should always use explicit priorities when creating rules.

So let's inspect each routing-policy entry and emit a warning if a
priority has not been specified.

ref. http://www.policyrouting.org/iproute2.doc.html#ss9.6.1
Copy link
Collaborator

@slyon slyon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was going to ask how you wanted to do that. I've implemented your suggestion to emit a warning when priority is not set for a rule, which I think can safely be added on top. I'll push this out later today.

ACK, thanks! LGTM.
I took the freedom to rebase your branch on top of origin/main, added curly braces for the new policy validation warning (as that's what we want to do everywhere these days), and prefixed two of the commit messages with ATTN: so that we can consider the stricter schema parsing/validation logic when backporting those changes in the distro.

All CI is now green as well (cf. #528). So let's get this merged!

The other change, to parse the NetworkManager config, will require more consideration.

Let's move this work to an independent PR. More on this in a follow-up comment.

@slyon slyon force-pushed the nm-support-routing-policy branch from bb0f8a7 to e0c8801 Compare November 14, 2024 11:00
@slyon slyon merged commit ae09e2c into canonical:main Nov 14, 2024
17 checks passed
@@ -523,10 +523,13 @@ adopt_and_validate_vrf_routes(__unused const NetplanParser *npp, GHashTable *net
if (nd->ip_rules) {
for (size_t i = 0; i < nd->ip_rules->len; i++) {
NetplanIPRule* r = g_array_index(nd->ip_rules, NetplanIPRule*, i);
if (r->priority == NETPLAN_IP_RULE_PRIO_UNSPEC) {
g_warning("%s: No priority specified for routing-policy %zu", nd->id, i);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note (non-blocking): I just see (after merging) that this only triggers for NETPLAN_DEF_TYPE_VRF NetDefs. We might want to have it for all interface types, eventually.

@slyon
Copy link
Collaborator

slyon commented Nov 14, 2024

The easiest solution may be to just have an all-or-none policy when importing the NetworkManager routing-ruleXs. Otherwise it seems like we'd need to be aware of any routing-ruleXs in the pass-through section and take them into account when generating the NetworkManager config. Thoughts?

I think one relatively easy approach could be parse all the routing-ruleXs into native Netplan routing-policy, logging a warning when we cannot map any of them. But then do not clear any of them from the keyfile/passthrough (don't call _kf_clear_key(...) for any routing-ruleX). This way we will be able to save a Netplan native mapping, but in the worst case everything would be overwritten by the passthrough later on.
Then, in src/validation.c, we can check if the amount of native routing-policy rules matches the routing-ruleXs in NM.passthrough, and clear all the passthrough values. WDYT about this approach?

Were you thinking of doing this check in parse_nm.c:read_passthrough()? It doesn't look like the Netplan object is available within that function to be able to count the number of native routing-policy rules.

It seems like it would be fine to do this in the same function that parses the rules (i.e. parse_ip_rules()).

My initial idea was to handle it in src/validation.c along the lines of validation.c:adopt_and_validate_vrf_routes(). But looking into it more closely, I agree that it could probably be done in a parse_ip_rules() function directly, as we always have the full NetworkManager connection-profile/keyfile available and there are no cross-references/dependencies that influece the routing-policy rules of a keyfile.

So yes, I agree it should be done in a new parse-nm.c:parse_ip_rules() function; the more isolated and self-contained, the better!

note (non-blocking): Looking into parse-nm.c:parse_routes(), I realize we have the very same issue of potentially conflicting routeX[_options] from networkmanager.passthrough and Netplan native routes there as well. Once we come up with a good solution for routing-policy, we maybe should apply that to routes, too.

But, how concerned are you about having two versions of the routing policy configuration in the same file and the possibility of someone editing the netplan yaml by hand? It sounds like that's what you were suggesting if we parse each routing-ruleX, if possible, into a native routing-policy config and keep all of the routing-ruleX entries as pass-through configs when we fail to parse any of the routing-ruleXs into routing-policy configs. So we could end up with something like:

    routing-policy:
    - table: 254
      priority: 99
      to: "1.2.3.4"
    - table: 252
      priority: 97
      to: "5.6.7.8"
    networkmanager:
      passthrough:
        ipv4.routing-rule1: priority 99 to 1.2.3.4 table 254
        ipv4.routing-rule2: priority 98 iif eth1 dport 1024-2048 table 253
        ipv4.routing-rule3: priority 97 to 5.6.7.8 table 252

Right. In the end, we want to have all information (as much as possible) available in the defined native Netplan schema, so that external consumers that try to query the configuration (e.g. using netplan get network.DEF_TYPE.IFACE.routing-policy) can receive all of it. The problem here is that even though having the supported rules in native routing-policy, the passthrough would still be authoritative and overwrite the native routing-policy, resulting in data received via netplan get could potentially be wrong. That's bad.

I had thought that a reasonable approach, based on your original suggestion, might be to clear each passthrough rule as long as we've been able to successfully convert the routing-ruleX into a routing-policy rule, but once we encounter a rule that cannot be converted, stop clearing the routing-ruleXs. We still convert as many routing-ruleXs as possible into routing-policy rules, but the ones that are converted after the one we failed to parse will be overwritten by the pass-through rules. For example, the above snippet would become something like:

    routing-policy:
    - table: 254
      priority: 99
      to: "1.2.3.4"
    - table: 252
      priority: 97
      to: "5.6.7.8"
    networkmanager:
      passthrough:
        ipv4.routing-rule2: priority 98 iif eth1 dport 1024-2048 table 253
        ipv4.routing-rule3: priority 97 to 5.6.7.8 table 252

i.e. since routing-rule1 was successfully parsed, it was cleared. The routing-rule2 would fail to convert since we don't support 'iif' etc. so then we stop clearing the routing-ruleXs. routing-rule3 is successfully converted to a routing-policy, but we still keep the routing-rule3. When this is converted back to NetworkManager key file, then the first routing-policy would become routing-rule1, routing-rule2 would override the second routing-policy (which was derived from routing-rule3), and routing-rule3 would pass through.

thought: What happens when the routing-ruleXs are non-ordered in the keyfile? E.g. ipv4.routing-rule3, then ipv4.routing-rule2, then routing-rule1. Actually, I think that isn't a problem, because we'll always walk through them in an ordered way (for loop, starting at 1 and increasing the "X").

But that seems a little fragile and confusing to me to mix the two configuration forms like this. Is there actually any value in writing the routing-policy rules if they'll be overridden or supplemented by the routing-ruleX rules? I don't have enough experience with netplan to feel strongly about this either way. Or should we just take an all-or-none approach and refrain from emitting any routing-policy entries if we encounter a routing-ruleX that we cannot convert?

I agree, it's confusing. As stated above, the value would be for consumers to query the Netplan config state in a programmatic way (e.g. via netplan get); but that's only the case if that really reflects the actual configuration (i.e. when it doesn't get overwritten by passthrough settings). The "passthrough" section is supposed to only be a stop-gap solution, covering the unsupported cases. Eventually, we don't want to have anything left-over in there.

So, how can we solve this? What about this approach, all implemented inside a single, new parse-nm.c:parse_ip_rules() function :

  • We parse any routing-ruleX that we can (i.e. currently supported by Netplan schema) into native routing-policy, and clear them from the keyfile
  • We skip over unsupported routing-ruleXs, and keep the in the keyfile
  • We count the native routing-policy rules that we were able to parse; let's call that "COUNT"
  • After the initial loop, we run a 2nd loop over the routing-ruleX remaining in the keyfile and re-write them as routing-ruleCOUNT+1, routing-ruleCOUNT+2, ...

This should solve the duplicated entries in Netplan YAML issue and the issue of "routing-policy" entries not being authorative.
Re-using your example from above, that should leave us with something like this:

    routing-policy:
    - table: 254
      priority: 99
      to: "1.2.3.4"
    - table: 252
      priority: 97
      to: "5.6.7.8"
    networkmanager:
      passthrough:
        ipv4.routing-rule3: priority 98 iif eth1 dport 1024-2048 table 253

We still have one problem, though: What happens if somebody manually modifies the Netplan YAML config, adding a 3rd routing-policy rule? – It would be in conflict with "routing-rule3" from passthrough. It's an edge case, but we should still handle it nicely and make users aware of it.

To mitigate that, we could modify nm.c:write_fallback_key_value() to log a warning (or even bail out??) when an existing routing-ruleX keyfile setting already exists. I think we're doing something along those lines already, only printing a g_debug message about "NetworkManager: fallback override: ..." – maybe that should be warning, especially for "routing-ruleXs".

@drafnel WDYT?

@drafnel
Copy link
Contributor Author

drafnel commented Nov 18, 2024

note (non-blocking): Looking into parse-nm.c:parse_routes(), I realize we have the very same issue of potentially conflicting routeX[_options] from networkmanager.passthrough and Netplan native routes there as well. Once we come up with a good solution for routing-policy, we maybe should apply that to routes, too.

Yeah, probably.

So, how can we solve this? What about this approach, all implemented inside a single, new parse-nm.c:parse_ip_rules() function :

* We parse any `routing-ruleX` that we can (i.e. currently supported by Netplan schema) into native `routing-policy`, and clear them from the keyfile

* We skip over unsupported `routing-ruleXs`, and keep the in the keyfile

* We count the native routing-policy rules that we were able to parse; let's call that "COUNT"

* After the initial loop, we run a 2nd loop over the `routing-ruleX` remaining in the keyfile and re-write them as `routing-ruleCOUNT+1`, `routing-ruleCOUNT+2`, ...

Yeah, I think we have to bite the bullet and go ahead and rewrite the rules.

We still have one problem, though: What happens if somebody manually modifies the Netplan YAML config, adding a 3rd routing-policy rule?

Or if a rule is deleted, leaving a gap between the numbering of the routing-ruleXs that will be generated in the NetworkManager config from the routing-policy rules and the numbering of the passthrough rules.

– It would be in conflict with "routing-rule3" from passthrough. It's an edge case, but we should still handle it nicely and make users aware of it.

To mitigate that, we could modify nm.c:write_fallback_key_value() to log a warning (or even bail out??) when an existing routing-ruleX keyfile setting already exists. I think we're doing something along those lines already, only printing a g_debug message about "NetworkManager: fallback override: ..." – maybe that should be warning, especially for "routing-ruleXs".

A warning, at a minimum, seems appropriate.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants