Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lets remove unneded vlans from protocol interfaces #581

Open
spiccinini opened this issue Sep 8, 2019 · 17 comments
Open

Lets remove unneded vlans from protocol interfaces #581

spiccinini opened this issue Sep 8, 2019 · 17 comments

Comments

@spiccinini
Copy link
Contributor

The scenarios we are using in the networks doesn't need vlans for the protocols. Let's remove them to reduce complexity and bugs (like #580 )

@ilario
Copy link
Member

ilario commented Sep 10, 2019

+1
(I added the release-blocker label as this is a non-retrocompatible change, it has to be sorted out before next release)
See also discussion on https://lists.libremesh.org/pipermail/lime-dev/2019-August/001144.html and unconfirmed report on https://forum.openwrt.org/t/mediatek-and-vlan-802-1ad-on-ethernet/42346

@G10h4ck
Copy link
Member

G10h4ck commented Sep 10, 2019

I agree in removing not neded vlans!

@G10h4ck
Copy link
Member

G10h4ck commented Sep 10, 2019

It also would solve the problem with mediatek hardware switches

@ilario
Copy link
Member

ilario commented Sep 17, 2019

So, not needed VLANs are Babeld, BMX*, OLSR* ones, right?
But we would keep the BATMAN-adv one (if I remember correctly, @G10h4ck said that Batman needs to be on an interface by its own, and we use the VLAN=%N1 for breaking the Batman domains in different areas of the networks), right?

@spiccinini
Copy link
Contributor Author

spiccinini commented Sep 17, 2019

So, not needed VLANs are Babeld, BMX*, OLSR* ones, right?

Yes. We can start with babel protocol and see how it goes.

But we would keep the BATMAN-adv one (if I remember correctly, @G10h4ck said that Batman needs to be on an interface by its own, and we use the VLAN=%N1 for breaking the Batman domains in different areas of the networks), right?

Yes. that's right

@spiccinini
Copy link
Contributor Author

vlan=0 can be used to disable the vlan. I added some fixes for this use in #593

@ilario
Copy link
Member

ilario commented Jan 11, 2020

Just noticed that Batman-adv has implemented (since long time) some VLAN mechanism on top of bat0 interface, see here and here.
Does anyone understand if the usage of Batman-adv's VLAN can replace the usage of classical VLANs?
If this is correct, the hardware switches would not recognize these packets as tagged ever, correct?

@ilario
Copy link
Member

ilario commented Mar 8, 2020

From here https://www.open-mesh.org/projects/batman-adv/wiki/Tweaking#VLAN-handling :

The batX mesh interface created by batman-adv also supports VLANs which enables the administrator to configure virtual networks with independent settings on top of a single mesh cloud.

which sounds like the broadcast packets from a client will be limited on a single VLAN zone but the Batman-adv hello packages would go everywhere, does anyone know if this is correct?

@nicopace nicopace removed their assignment Mar 8, 2020
@G10h4ck
Copy link
Member

G10h4ck commented Jun 9, 2020

From here https://www.open-mesh.org/projects/batman-adv/wiki/Tweaking#VLAN-handling :

The batX mesh interface created by batman-adv also supports VLANs which enables the administrator to configure virtual networks with independent settings on top of a single mesh cloud.

which sounds like the broadcast packets from a client will be limited on a single VLAN zone but the Batman-adv hello packages would go everywhere, does anyone know if this is correct?

Yes, but a single VLAN zone built on top of batman-adv potentially span over the whole batman-adv mesh. Resuming vlans on top of batman-adv doesn't help with this issue.

@ilario
Copy link
Member

ilario commented Jun 9, 2020

but a single VLAN zone built on top of batman-adv potentially span over the whole batman-adv mesh

Why should we use a single one? What I was thinking was to use the Batman-adv's VLANs in a very similar way as we are using VLANs: the ID would depend on the network SSID.

@G10h4ck
Copy link
Member

G10h4ck commented Jun 9, 2020

but a single VLAN zone built on top of batman-adv potentially span over the whole batman-adv mesh

Why should we use a single one? What I was thinking was to use the Batman-adv's VLANs in a very similar way as we are using VLANs: the ID would depend on the network SSID.

That would eventually split the broadcast domain, but not the topology of the L2 network, so it would have scalability problems.

Also we don't use vlans for batman just to improve scalability, We use them also because batman-adv (or maybe just how they implemented the bat-adv configuration interface on owrt) "monopolize" (become master) of the interfaces it uses to send OGM etc, and interfaces can have only one master (so you cannot put the same interface both inside a bridge and use them for batman-adv OGM), when we create a vlan it is like it is another interface, so we put the raw interface inside the bridge and the vlan to batman-adv

@gmarcos87 gmarcos87 removed their assignment Jun 18, 2020
@dangowrt
Copy link
Member

I agree with @G10h4ck about the use of VLAN to allow interoperability with linux bridge on the same physical interface. I know this is a problem with some (few) switches which refuse operating with both, tagged and untagged frames on the same port (which is why LiMe uses 802.1ad...?).
Almost all switches do allow setting a PVID for untagged frames (which will then arrive tagged on the CPU port).
Writing auto-configuration logic for that which detects swconfg features/restrictions of interfaces is beyond the current capabilities though. Things will get much easier once DSA is more common (ie. used for qca,ar9331-switch for SoC-built-in switch on ath79 https://github.com/torvalds/linux/blob/master/drivers/net/dsa/qca/ar9331.c, qca8k for the QCA gigE switches, MT7530 on all Ralink/MediaTek with gigE, https://github.com/stroese/linux/blob/gardena-v5.5/drivers/net/dsa/mt7628-esw.c on all Ralink/MediaTek with FE, Lantiq is transisioning as well openwrt/openwrt#3085, almost all common external Marvell, Broadcom and RealTek switch ICs are already supported in vanilla Linux). It doesn't look like it's going to happen for the 20.x release though for most targets, but hopefully 20.x will be the last swconfig-based release.

@ilario
Copy link
Member

ilario commented Jun 26, 2020

Part of the reason why I thought having Batman-adv without VLAN was very important was the bug happening on Mediatek-based YouHua WR1200JS devices I reported here https://forum.openwrt.org/t/mediatek-and-vlan-802-1ad-on-ethernet/42346

But testing #726, this does not seem to be a terrible problem: the Batman-adv hello packets get crippled adding zeros and random data, but Batman-adv does not care and likes them anyway. Then the actual data gets routed through other interfaces (e.g. the Babeld one if 802.1q is used instead of 802.1ad i.e. adding a suffix in the list protocols 'babeld:17:8021q' LibreMesh configuration line).

So, in order to have these devices working also with ethernet we should just take care of removing VLAN usage from Babeld, for example with #631.

@dangowrt
Copy link
Member

dangowrt commented Jun 26, 2020 via email

@ilario
Copy link
Member

ilario commented Jun 26, 2020

Thanks! I'm going to test with snapshot code from OpenWrt downloads website :)

@ilario
Copy link
Member

ilario commented Sep 21, 2020

Update on the Mediatek 802.1ad bug: I confirmed it on OpenWrt snapshot with DSA and seems that the origin is in a small bug in the kernel:
https://forum.openwrt.org/t/mediatek-and-vlan-802-1ad-on-ethernet/42346/9

The pre-DSA bug was different but could have a very similar origin. This means that currently VLAN 802.1ad should not work on any device with a MediaTek switch.

@ilario
Copy link
Member

ilario commented Nov 11, 2024

Update on the Mediatek 802.1ad bug: I confirmed it on OpenWrt snapshot with DSA and seems that the origin is in a small bug in the kernel: https://forum.openwrt.org/t/mediatek-and-vlan-802-1ad-on-ethernet/42346/9

Small update:
the DSA + 802.1ad bug with MediaTek has been fixed long time ago torvalds/linux@9200f51

Still, the reasons for avoiding unneeded VLANs are still solid. #631 got outdated but if there is interest I can update it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: No status
Development

No branches or pull requests

8 participants