Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reset ring role to disable in case MRP is destroyed #1

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

systec-awe
Copy link

The Kernel callback port_mrp_del_ring_role is only called if ring_role is set to BR_MRP_RING_ROLE_DISABLED. Role is never said by userspace application so it might by the best place here.

The Kernel callback `port_mrp_del_ring_role` is only called if ring_role
is set to `BR_MRP_RING_ROLE_DISABLED`. Role is never said by userspace
application so it might by the best place here.

Signed-off-by: Andre Werner <[email protected]>
@HoratiuVultur
Copy link
Collaborator

Hi,

Thanks for the PR. I have one question about this one. I don't have a setup to actually test the DSA part of the kernel, so I have tried your change with a switchdev driver. But I have seen that with this change the ring role will be set twice:

  1. One time when you call mrp_netlink_set_ring_role(mrp, BR_MRP_RING_ROLE_DISABLED);
  2. One time when the function mrp_netlink_del(mrp). This is the stack trace for this call.
    [ 18.722294] WARNING: CPU: 0 PID: 173 at drivers/net/ethernet/microchip/sparx5/sparx5_mrp.c:331 sparx5_handle_mrp_ring_role_del+0x88/0x100
    [ 18.734619] Modules linked in:
    [ 18.737660] CPU: 0 PID: 173 Comm: mrp_server Tainted: G W 6.6.51-00291-g438b23b478e1-dirty #1342
    [ 18.747813] Hardware name: lan969x ev23x71a (pcb8398) (DT)
    [ 18.753282] pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
    [ 18.760225] pc : sparx5_handle_mrp_ring_role_del+0x88/0x100
    [ 18.765780] lr : sparx5_handle_mrp_ring_role_del+0x88/0x100
    [ 18.771335] sp : ffff80008184b2a0
    [ 18.774634] x29: ffff80008184b2a0 x28: ffff000096033900 x27: 0000000000000000
    [ 18.781751] x26: ffff80008184b458 x25: ffff80008184b907 x24: ffff800080d21488
    [ 18.788869] x23: ffff80008184b458 x22: ffff800080b14748 x21: ffff000063b95000
    [ 18.795986] x20: ffff000063876800 x19: ffff000063876000 x18: 0000000000000000
    [ 18.803104] x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000000000
    [ 18.810222] x14: 0000b4466fdf828a x13: 02d120539d67838a x12: 00000000fa83b2da
    [ 18.817339] x11: 00000000000001bb x10: 00000000000009d0 x9 : ffff80008184b050
    [ 18.824457] x8 : ffff000096034330 x7 : 00000000000003f8 x6 : 000000000000b83e
    [ 18.831574] x5 : 0000000000000000 x4 : 0000000000000000 x3 : 0000000000000000
    [ 18.838692] x2 : 0000000000000000 x1 : ffff000096033900 x0 : 0000000000000026
    [ 18.845810] Call trace:
    [ 18.848241] sparx5_handle_mrp_ring_role_del+0x88/0x100
    [ 18.853448] sparx5_switchdev_blocking_event+0x378/0x9bc
    [ 18.858743] blocking_notifier_call_chain+0x6c/0xa0
    [ 18.863604] switchdev_port_obj_notify+0x48/0xa8
    [ 18.868204] switchdev_port_obj_del+0x3c/0xb8
    [ 18.872544] br_mrp_switchdev_set_ring_role+0xcc/0x104 <----- this is one will set the port role to disable
    [ 18.877665] br_mrp_del_impl+0x74/0x214
    [ 18.881484] br_mrp_del+0x34/0x4c
    [ 18.884783] br_mrp_instance_parse+0xd4/0x174
    [ 18.889123] br_mrp_parse+0xb0/0x424
    [ 18.892682] br_afspec+0x28c/0x300
    [ 18.896067] br_dellink+0x78/0xf8
    [ 18.899365] rtnl_bridge_dellink+0xdc/0x1fc
    [ 18.903532] rtnetlink_rcv_msg+0x11c/0x354
    [ 18.907611] netlink_rcv_skb+0x58/0x124
    [ 18.911431] rtnetlink_rcv+0x18/0x24
    [ 18.914990] netlink_unicast+0x1c0/0x29c
    [ 18.918895] netlink_sendmsg+0x190/0x3f8
    [ 18.922801] ____sys_sendmsg+0x21c/0x250
    [ 18.926707] ___sys_sendmsg+0xac/0xf4
    [ 18.930353] __sys_sendmsg+0x78/0xc4
    [ 18.933912] __arm64_sys_sendmsg+0x24/0x30
    [ 18.937991] invoke_syscall.constprop.0+0x50/0xec
    [ 18.942679] do_el0_svc+0x40/0xc8
    [ 18.945977] el0_svc+0x48/0x148
    [ 18.949102] el0t_64_sync_handler+0x120/0x12c
    [ 18.953442] el0t_64_sync+0x190/0x194
    [ 18.957088] ---[ end trace 0000000000000000 ]---

Here is the link to the source code which should change the role to disable when the MRP instance is deleted:
https://elixir.bootlin.com/linux/v6.6.21/source/net/bridge/br_mrp.c#L442

BTW: I have few patches for sparx5 regarding MRP that are not yet in the mainline kernel.

Or are you using some configuration without switchdev support?

Thanks,
/Horatiu

@systec-awe
Copy link
Author

systec-awe commented Jan 24, 2025 via email

@HoratiuVultur
Copy link
Collaborator

Hi André,

Can you please share the userspace command that you run when "When I detach the MRP server from a used port"
It would be also nice if you share the patch with me.

As mention before, I have not managed to test the MRP on DSA driver as I didn't have a setup for testing this (Also now I don't have a setup to try it). So it might that you have found an issue.

Thanks,
/Horatiu

@systec-awe
Copy link
Author

systec-awe commented Jan 27, 2025 via email

@HoratiuVultur
Copy link
Collaborator

Hi André,

Thanks for the commands and the patch. I don't fully understand why you don't get those callbacks if you don't explicitly set the role to BR_MRP_RING_ROLE_DISABLED, even if the delete will do that. I am looking here:
https://elixir.bootlin.com/linux/v5.15.177/source/net/bridge/br_mrp.c#L442
When you delete the instance does the callback reach here?

Just few more comments about your patches:

  1. I think your function 'ksz9477_port_mrp_add/del' looks good.
  2. In the function 'ksz9477_port_mrp_add_ring_role' you will need to check also the flag 'mrp->sw_backup' this means that if the role is MRM and sw_backup is false, then it means that the HW will generate the test frames and it would figure out when it stop receiving the test frames. So my suggestion is to return -EOPNOTSUPP because you will be call back again with mrp->sw_backup set to false and in this case the SW will generate the test frame and figure out when it stop receiving the test frames.
  3. Also you need to make sure that you trap to the CPU and allow the CPU to forward control frames
  4. If you want to be MRC with offload support then you need to know which is the partner port on which you want to forward the frame, because it is possible to have multiple ports under the bridge and you don't want to forward these MRP test frames to all the other ports.
  5. I suggest to have a look at this file: https://elixir.bootlin.com/linux/v5.15.177/source/drivers/net/ethernet/mscc/ocelot_mrp.c as I think this will match what you need to do for ksz switch.

Thanks,
/Horatiu

@systec-awe
Copy link
Author

systec-awe commented Jan 30, 2025 via email

@systec-awe
Copy link
Author

Hi André,

Thanks for the commands and the patch. I don't fully understand why you don't get those callbacks if you don't explicitly set the role to BR_MRP_RING_ROLE_DISABLED, even if the delete will do that. I am looking here: https://elixir.bootlin.com/linux/v5.15.177/source/net/bridge/br_mrp.c#L442 When you delete the instance does the callback reach here?

Just few more comments about your patches:

  1. I think your function 'ksz9477_port_mrp_add/del' looks good.
  2. In the function 'ksz9477_port_mrp_add_ring_role' you will need to check also the flag 'mrp->sw_backup' this means that if the role is MRM and sw_backup is false, then it means that the HW will generate the test frames and it would figure out when it stop receiving the test frames. So my suggestion is to return -EOPNOTSUPP because you will be call back again with mrp->sw_backup set to false and in this case the SW will generate the test frame and figure out when it stop receiving the test frames.

This is clear to me and I added this as you suggested.

  1. Also you need to make sure that you trap to the CPU and allow the CPU to forward control frames

This is something I do not fully understand as the bridge runs on the Linux CPU and the applications as well. So all necessary packages arrive there automatically, I guess?

  1. If you want to be MRC with offload support then you need to know which is the partner port on which you want to forward the frame, because it is possible to have multiple ports under the bridge and you don't want to forward these MRP test frames to all the other ports.

I thought this was done by adding static lookup table entries for the MRP frame MAC addresses to the switch. So if a package arrives on any of these configured ports, they are forwarded to the ports mapped via static lookup table entry. Or do you mean I may have several rings on different ports of one switch? So I need to link the same MAC address of topics to dedicated port pairs for each ring? This is out of the scope of our use case but then you are right, I guess.

  1. I suggest to have a look at this file: https://elixir.bootlin.com/linux/v5.15.177/source/drivers/net/ethernet/mscc/ocelot_mrp.c as I think this will match what you need to do for ksz switch.

Thanks. I try to understand your vcap feature. This is what is necessary for my answer above, right?

Thanks, /Horatiu

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

2 participants