Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Testing Batman-adv on ethernet #703

Closed
wants to merge 12 commits into from

Conversation

ilario
Copy link
Member

@ilario ilario commented Apr 28, 2020

I did not test this PR yet, please test or wait before testing.

In dd07aef, the ethernet interfaces were excluded from the bat0 interface so that they're not used by Batman-adv. Please check out the commit comments for the reasons. Actually, the WAN ethernet port always is included in bat0 as is configured via interface-specific configuration by lime-hwd-openwrt-wan package.

Addresses #56
This PR reverts dd07aef and offers an alternative solution to #189 removing the previous solution implemented in ae0672b (based on the creation of dummy0 interface, that was working most but not all the times. Due to a race condition in which the dummy0 interface could be not the first to be included in bat0).
Instead of adding the dummy0 interface, the MAC addresses of each of the interfaces included in bat0 is changed randomly (based on the interface name), so that they have all different MAC addresses.

An alternative solution would be to include br-lan inside bat0 instead of bat0 inside br-lan, similarly to #631.

@spiccinini
Copy link
Contributor

Hi! Great that you are investigating this. I really don't know how to test it, which things to test exactly. Maybe we can compile a list of tests? Some ideas:

  • Test that there is no loop when connecting two nodes with wifi and LAN.
  • Test that DHCP for clients is working as expected (that only one of the nodes is responding?)

@ilario
Copy link
Member Author

ilario commented Apr 30, 2020

Maybe we can compile a list of tests? Some ideas:

Yep, that would be useful!
Some hints are in @altergui comment here dd07aef

  • "fragmentation (since mtu is not enough) which impacts performance, in some cases severely (i.e. when the ethernet link is wirelessly bridged)": I suppose that the fragmentation decreasing performances will still be present, maybe could be worth to decrease the MTU of AP interfaces also to 1496? For being honest, I have no idea of these things.
  • "instability (racy conditions, it seems) with BLA2": @altergui could you detail this?
  • "spurious warnings about 'received packet with own address' on br-lan": I think I solved this randomizing the MAC addresses, do you think this solution could have drawbacks?
  • "complicates things when two different clouds are accidentally connected by an plain ethernet cable between them": no idea why, they are on different VLANs anyway...
* Test that there is no loop when connecting two nodes with wifi and LAN.

Yess

* Test that DHCP for clients is working as expected (that only one of the nodes is responding?)

All nodes connected to br-lan are already answering, this is already broken and this PR does not solve it (but this change should be taken into consideration when we will try to fix that), see #658.

@ilario
Copy link
Member Author

ilario commented Jun 4, 2020

Daniel Golle proposed a solution here: #56 (comment)

If we decide to go that way, I can remove the eth mesh-related commit and leave this PR only about the removal of dummy0 interface.

@dangowrt
Copy link
Member

dangowrt commented Jun 7, 2020

Testing the current version of the patch (rebased so it would apply) and looks good till now.

  • Would be nice to try increasing MTU on batman hardifs to 1532, so full 1500 bytes could be carried. This will most likely not work with all Ethernet switches, but works well with wifi interfaces of any kind (and that's where batman-adv is actually carrying traffic). So we should raise the MTU only for wifi devices and not mess with wired interfaces.
  • BLA seems to do the job, traffic goes through the plain bridge rather than batman-adv.
  • no more received packet with own address on br-lan or the like

local uci = config.get_uci_cursor()
uci:set("network", owrtDeviceName, "mtu", mtu)
uci:set("network", owrtDeviceName, "macaddr", table.concat(randomMac, ":"))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This used to break linux 802.1{ad,q} at least when bridges and vlans operates on same hard interface, if the mac of the vlan is different from the one on the hard interface (what you are doing here) the packets get sucked by the bridge and never get to the vlan interface. I haven't tested this recently but this happened at least until last libremesh stable release.

@G10h4ck
Copy link
Member

G10h4ck commented Jun 8, 2020

* Would be nice to try increasing MTU on batman hardifs to 1532, so full 1500 bytes could be carried. This will most likely not work with all Ethernet switches, but works well with wifi interfaces of any kind (and that's where batman-adv is actually carrying traffic). So we should raise the MTU only for wifi devices and not mess with wired interfaces.

AFAIU we are already increasing the MTU on wireless hard interfaces for the sake of not fragmenting batman-adv

https://github.com/libremesh/lime-packages/blob/master/packages/lime-proto-batadv/files/usr/lib/lua/lime/proto/batadv.lua#L87

Also it would be really nice to have a way to detect ethernet interface which would support increased MTU (check if they are gigabit?)

@dangowrt
Copy link
Member

dangowrt commented Jun 9, 2020 via email

@G10h4ck
Copy link
Member

G10h4ck commented Jun 9, 2020

even that wouldn't not necessarily be true, because some devices got an external switch chip which doesn't support jumbo frames (eg. lantiq devices with external admtek/infineon/lantiq switch chip). i guess this will only be solved properly by using DSA instead of swconfig, as then the call to set the MTU to anything higher than supported would just return -EFAIL.

Does OpenWrt plan to move from swconfig to DSA any time soon?

@dangowrt
Copy link
Member

dangowrt commented Jun 9, 2020 via email

@G10h4ck
Copy link
Member

G10h4ck commented Jun 9, 2020

Does OpenWrt plan to move from swconfig to DSA any time soon?
Yes, but for some outdated targets it will most likely never happen. It has already happened for ramips/mt7621, lantiq/xrx200. People are working on AR9331, some initial work was done for Ralink-ESW as found on MT7628. Once this is more complete, we can try supporting all the older Ralink/MediaTek stuff as well, because the switch IP in those SoCs is very similar. For ath79 it will certainly not be ready this year, because there are too many boards and many got external switch chips (which then also need DSA drivers, of course).

First of all thanks for the report!

So we could eventually test if DSA is supported (this is as simple as see if some command is in PATH?), if so then test if MTU can be increased, and if one of those two tests fails act as if MTU is 1500?

When we designed LibreRouter I remember to insist on having gigabit ethernet mainly because I thought that would guarantee increased MTU support, do you remember if the chip we used in LR plus it's drivers does supports that well?

Cheers!

--! Randomize MAC address for each of the interfaces included in Batman-adv.
local id = utils.get_id(ifname)
local randomMac = network.primary_mac();
randomMac[1] = id[1]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some bits of first byte of the mac have special meanings, so it cannot be totally random. Also at first glance I think we can just change the firs two bytes with something fixed making sure multicast bit is 0 and locally administered bit is 1 https://en.wikipedia.org/wiki/MAC_address#/media/File:MAC-48_Address.svg

@G10h4ck
Copy link
Member

G10h4ck commented Jun 12, 2020

I have reworked this a bit in this PR ilario#1 and started testing it, at first glance it seems I could not reproduce the old problematic behavior which was that when the vlan interface had it's mac address changed, unicast packets never got there (probably sucked in by the bridge), while multicast packets arrived without problem so routing protocols kept discovering the paths but then unicast traffic was lost. Did something big changed in 802.1q or bridge code in the kernel in this years?

I'll keep testing this in this a bit more and if there are no problems we can finally merge this with the extra fixup i made.

Hopefully loops when both WiFi and ethernet link are available will disappear with this.

@ilario
Copy link
Member Author

ilario commented Jun 14, 2020

The commits here got messy, sorry for that, continuing on #724

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants