Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Firewalld-0.9.11-1 update drastically slowed ipset importing on AlmaLinux 8 #186

Open
ashleycawley opened this issue Dec 8, 2023 · 11 comments

Comments

@ashleycawley
Copy link

ashleycawley commented Dec 8, 2023

OS: AlmaLinux 8

In late Sept 2023 an update for firewalld was released (firewalld-0.9.11-1) and we believe since that update we have seen a massive decrease in performance with regards reloading the firewall when large ipsets are in use (ie. a ipset that contains 12,000 IP addresses).

I'd be intrigued to see if anyone else has also been impacted by this bug or if anyone can help to confirm if they see the same?

Our use case: we have hundreds of AlmaLinux machines which have a custom ipset which contains 12,000 IP addresses which periodically rotate, the ipset always contains that quantity (12k). On the previous version of firewalld-0.9.3-13 when we would reload the firewall with firewall-cmd --reload it would complete in around 6 seconds - at that point firewalld would pass the ipset to the backend firewall.

Our firewalld setup is using nftables behind the scenes not iptables.

After upgrading from firewalld-0.9.3-13 to firewalld-0.9.11-1 the same reloading operation will take a matter of minutes, it can vary from 2 - 5 minutes and causes disruption to network connections which is noticeable.

If we downgrade the firewalld package it reverts to taking a mere 6 seconds to reload. Bearing in mind firewalld is ingesting our .xml ipset which contains 12k ip addresses and passing that to nftables during that time, then reloading.

If the firewall backend is swapped out (switching from nftables to iptables in the firewalld.conf) then the problem goes away. So it appears to be a bug in the latest version of firewalld + nftables when ingesting large ipsets - it has drastically slowed down the act of reloading and can cause disruption.

From our testing it is taking anywhere from 25-55x longer to reload, so quite the performance decrease. This has been tested on a variety of AlmaLinux boxes with varying specifications.

Is there anyone else out there handling ipsets which are 10k+ in combination with firewalld + nftables? If so, do you notice firewalld taking significantly longer to reload nowadays?
​​
​​If it helps anyone to validate to see if you get the same results as us then I will share the ipset XML file below that contains 12k of IP addresses:

I'm hoping someone may be able to corroborate that when reloading firewalld with this ipset in place with the package firewalld-0.9.11-1 (post Sept 2023) it'll take ages, yet the pre Sept package firewalld-0.9.3-13 will reload in seconds.

The XML contents below could be placed in a file at: /etc/firewalld/ipsets/beacon.xml

It wouldn't let me attach a .xml so I've published it here:
https://drive.google.com/file/d/1JGrTS60hRTpCffnBi-dAGu04HLgXR8g0/view?usp=sharing

Then:


firewall-cmd --permanent --zone=drop --add-source=ipset:beacon

Then:


firewall-cmd --reload

If anyone is curious what these IP addresses are - they are "bad" IPs which we have witnessed trying to brute-force their way into our systems.

@codyro
Copy link
Member

codyro commented Dec 12, 2023

This looks like it's a known issue--you can find some upstream references here:

While I was able to replicate the high resource utilization of firewalld when using a large ipset, I was not able to replicate the huge performance change between versions:

[root@el8 ~]# rpm -q firewalld
firewalld-0.9.3-13.el8.noarch
[root@el8 ~]# time firewall-cmd --reload
success

real    1m17.009s
user    0m0.372s
sys     0m0.057s

[root@el8 ~]# rpm -q firewalld
firewalld-0.9.11-1.el8_8.noarch
[root@el8 ~]# time firewall-cmd --reload
success

real    2m53.163s
user    0m0.361s
sys     0m0.041s

I was curious if the latest version from upstream fixed this, as it looks like they're on version 0.9.11-4, however it looks like it experiences the same issue (and based off the above linked issues, it may partially be an issue with the version of nftables being used):

[root@el8 ~]# rpm -q firewalld
firewalld-0.9.11-4.el8.noarch

[root@el8 ~]# time firewall-cmd --reload
success

real    3m1.829s
user    0m0.337s
sys     0m0.059s

When not using the /etc/firewalld/ipsets/beacon.xml, everything returns to normal. It is worth noting that you do not need to run the firewall-cmd --permanent --zone=drop --add-source=ipset:beacon command to replicate this issue, just place the beacon.xml in /etc/firewalld/ipsets/beacon.xml and run firewall-cmd --reload.

[root@el8 ~]# mv /etc/firewalld/ipsets/beacon.xml{,.bak}
[root@el8 ~]# time firewall-cmd --reload
success

real    0m1.264s
user    0m0.349s
sys     0m0.049s

There was a suggestion in the bugzilla above which suggests it could be auditd being a tad too verbose. You can see if it helps squelching that a bit with auditctl -A exclude,never -F msgtype=NETFILTER_CFG to see if it helps.

What are the specs of the machine you're running this on? Are you also seeing firewalld eat 100% CPU on a single core when using the large ipset?

@ashleycawley
Copy link
Author

Hi Cody,

Thank you for your observations and for taking the time to test.

I should mention that we did long ago encounter the separate memory issues due to the auditd being too verbose with its logging and we had previously remedied that using the filter you mentioned ( auditctl -A exclude,never -F msgtype=NETFILTER_CFG ) so we were aware of that one and had overcome it, however we believe this to be a new bug independent to those previous observations.

Can I just check if there were any typos or mis-phrasing in your opening statements? Because whilst you said "I was not able to replicate the huge performance change between versions:" your results seem to demonstrate it slowing significantly (it taking 1 minute and 36 seconds longer to complete than before), which seems to reinforce what we are also seeing (a substantial performance decrease going from firewalld-0.9.3-13 > firewalld-0.9.11-1

I have performed fresh tests this morning on a AlmaLinux 8.9 VM with 2 CPU cores & 4GB of RAM running firewalld-0.9.11-1.el8_8.noarch. During the reloading I witnessed 1 of the 2 CPU cores being pinned, around 20MB of RAM consumption, plenty of free memory and it took 1min 35 seconds to complete. On firewalld-0.9.3-13 I believe this would complete in just a few seconds.

By chance I just stumbled across this whilst looking for AlmaLinux rpm's the date of 3 months ago and last three filenames caught my eye:

0008-v1.1.0-fix-ipset-reduce-cost-of-entry-overlap-detect.patch
0009-v1.1.0-test-ipset-huge-set-of-entries-benchmark.patch
0010-v1.1.0-fix-ipset-further-reduce-cost-of-entry-overla.patch

Which has led me to wonder if some developers have also witnessed and have been testing a performance issue/decrease with ipsets over the past few months?

I couldn't as easily replicate the side-by-side effects this time around, the downgrade process I was using previously has changed due to the old package being pulled from the repo now so cannot yum downgrade. I will have to do some further testing, but this problem appeared to be clear as anything when myself and a colleague looked at it the other week, we could flip between two firewalld versions using the same ipset file, one would take numerous minutes to reload, the other just a few seconds.

In recent testing I have found the hardware my VM can change things significantly time wise, I moved my test VM around between physical nodes, on one node it would take like 1.5 minutes to reload, on another just 17 seconds. Even still, I believe a significant performance decrease in the code has been introduced which is breaking network connections during the reload and introducing disruption.

Thanks for your time and help all.

@andrewlukoshko
Copy link
Member

andrewlukoshko commented Dec 15, 2023

BTW currently CentOS Stream 8 is on 0.9.11-4 version and changelog includes ipset fixes:

* Fri Nov 03 2023 Eric Garver <[email protected]> - 0.9.11-4
- fix(nftables): always flush main table on start

* Fri Nov 03 2023 Eric Garver <[email protected]> - 0.9.11-3
- fix(ipset): fix configuring IP range for ipsets with nftables

* Fri Nov 03 2023 Eric Garver <[email protected]> - 0.9.11-2
- fix(ipset): exception on overlap checking empty set

@ashleycawley Could you check it by installing packages from CS8 to your system?
You can take them from this build: https://kojihub.stream.centos.org/koji/buildinfo?buildID=40764

@ashleycawley
Copy link
Author

Hi Andrew, thank you for that. I have tried downgrading my test VM to the RPMs on that page, however I'm still seeing the same performance decrease I believe.

I'm using yum install and then specifying the URL to each RPM which appears to be downgrading or upgrading them appropriately during testing, but we're not seeing the same clear-cut results flicking between versions that we were seeing when we were testing previously on the 21st November, back then we were using the yum downgrade command.

Checking yum history we have confirmed it was only downgrading packages:

  • firewalld
  • firewalld-filesystem
  • python3-firewall

Which are the ones I've been focusing my testing on. But it feels like perhaps something else has changed (another package?) since the 21st of Nov, either that or my testing / downgrading methods are flawed. I have been trying different / clean VM's too.

If you have any advice on how I should be swapping out these packages and anything else to try in between that would be greatly appreciated, thank you.

@ashleycawley
Copy link
Author

ashleycawley commented Dec 15, 2023

Thank you for your time and patience all, we've done a lot more testing today and I wanted to update you on my findings so far: we think we have narrowed it down possibly to a previous nftables update that only becomes apparent after a restart of firewalld.

In this example below I purposely start with a clean but outdated AlmaLinux8 image which contains old packages which have not yet been updated, take a look at this:

[root@almalinux8 ~]# hostnamectl
   Static hostname: almalinux8.localdomain
         Icon name: computer-vm
           Chassis: vm
        Machine ID: 57d39f375b524e4dadcefcf6cb93c120
           Boot ID: 6d16323d10bf41f3a3d27f87e761ec76
    Virtualization: oracle
  Operating System: AlmaLinux 8.8 (Sapphire Caracal)
       CPE OS Name: cpe:/o:almalinux:almalinux:8::baseos
            Kernel: Linux 4.18.0-477.10.1.el8_8.x86_64
      Architecture: x86-64
[root@almalinux8 ~]# 
[root@almalinux8 ~]# rpm -q firewalld nftables # NOTE these will be outdated packages to demonstrate:
firewalld-0.9.3-13.el8.noarch
nftables-0.9.3-26.el8.x86_64
[root@almalinux8 ~]# # Checking reload speed of standard firewalld with no ipsets on this sys as baseline:
[root@almalinux8 ~]# time firewall-cmd --reload
success

real	0m1.098s
user	0m0.123s
sys	0m0.028s
[root@almalinux8 ~]# cp /root/our-12k-ipset/beacon.xml /etc/firewalld/ipsets/
[root@almalinux8 ~]# firewall-cmd --permanent --zone=drop --add-source=ipset:beacon
success
[root@almalinux8 ~]# time firewall-cmd --reload
success

real	0m2.721s
user	0m0.125s
sys	0m0.020s
[root@almalinux8 ~]# # Took just 2.7s to import ipset containing 12k ip addresses! :-)
[root@almalinux8 ~]# # OK lets keep firewalld on the same (outdated) version but try updating just nftables:
[root@almalinux8 ~]# yum update nftables -y
AlmaLinux 8 - BaseOS                                                                  3.1 MB/s | 4.0 MB     00:01    
AlmaLinux 8 - AppStream                                                               4.1 MB/s |  11 MB     00:02    
AlmaLinux 8 - Extras                                                                   43 kB/s |  20 kB     00:00    
Dependencies resolved.
======================================================================================================================
 Package                          Architecture           Version                         Repository              Size
======================================================================================================================
Upgrading:
 libnftnl                         x86_64                 1.2.2-3.el8                     baseos                  86 k
 nftables                         x86_64                 1:1.0.4-3.el8_9                 baseos                 379 k
 python3-nftables                 x86_64                 1:1.0.4-3.el8_9                 baseos                  30 k

Transaction Summary
======================================================================================================================
Upgrade  3 Packages

Total download size: 495 k
Downloading Packages:
(1/3): python3-nftables-1.0.4-3.el8_9.x86_64.rpm                                      269 kB/s |  30 kB     00:00    
(2/3): libnftnl-1.2.2-3.el8.x86_64.rpm                                                612 kB/s |  86 kB     00:00    
(3/3): nftables-1.0.4-3.el8_9.x86_64.rpm                                              1.6 MB/s | 379 kB     00:00    
----------------------------------------------------------------------------------------------------------------------
Total                                                                                 860 kB/s | 495 kB     00:00     
AlmaLinux 8 - BaseOS                                                                  3.3 MB/s | 3.4 kB     00:00    
Importing GPG key 0xC21AD6EA:
 Userid     : "AlmaLinux <[email protected]>"
 Fingerprint: E53C F5EF 91CE B0AD 1812 ECB8 51D6 647E C21A D6EA
 From       : /etc/pki/rpm-gpg/RPM-GPG-KEY-AlmaLinux
Key imported successfully
Running transaction check
Transaction check succeeded.
Running transaction test
Transaction test succeeded.
Running transaction
  Preparing        :                                                                                              1/1 
  Running scriptlet: libnftnl-1.2.2-3.el8.x86_64                                                                  1/1 
  Upgrading        : libnftnl-1.2.2-3.el8.x86_64                                                                  1/6 
  Running scriptlet: libnftnl-1.2.2-3.el8.x86_64                                                                  1/6 
  Upgrading        : nftables-1:1.0.4-3.el8_9.x86_64                                                              2/6 
  Running scriptlet: nftables-1:1.0.4-3.el8_9.x86_64                                                              2/6 
  Upgrading        : python3-nftables-1:1.0.4-3.el8_9.x86_64                                                      3/6 
  Cleanup          : python3-nftables-1:0.9.3-26.el8.x86_64                                                       4/6 
  Running scriptlet: nftables-1:0.9.3-26.el8.x86_64                                                               5/6 
  Cleanup          : nftables-1:0.9.3-26.el8.x86_64                                                               5/6 
  Running scriptlet: nftables-1:0.9.3-26.el8.x86_64                                                               5/6 
  Cleanup          : libnftnl-1.1.5-5.el8.x86_64                                                                  6/6 
  Running scriptlet: libnftnl-1.1.5-5.el8.x86_64                                                                  6/6 
  Verifying        : libnftnl-1.2.2-3.el8.x86_64                                                                  1/6 
  Verifying        : libnftnl-1.1.5-5.el8.x86_64                                                                  2/6 
  Verifying        : nftables-1:1.0.4-3.el8_9.x86_64                                                              3/6 
  Verifying        : nftables-1:0.9.3-26.el8.x86_64                                                               4/6 
  Verifying        : python3-nftables-1:1.0.4-3.el8_9.x86_64                                                      5/6 
  Verifying        : python3-nftables-1:0.9.3-26.el8.x86_64                                                       6/6 

Upgraded:
  libnftnl-1.2.2-3.el8.x86_64      nftables-1:1.0.4-3.el8_9.x86_64      python3-nftables-1:1.0.4-3.el8_9.x86_64     

Complete!
[root@almalinux8 ~]# systemctl restart firewalld
[root@almalinux8 ~]# time firewall-cmd --reload
ERROR:dbus.proxies:Introspect error on :1.22:/org/fedoraproject/FirewallD1: dbus.exceptions.DBusException: org.freedesktop.DBus.Error.NoReply: Did not receive a reply. Possible causes include: the remote application did not send a reply, the message bus security policy blocked the reply, the reply timeout expired, or the network connection was broken.
success

real	2m2.651s
user	0m0.136s
sys	0m0.020s

The restart of firewalld after the upgrade of nftables is important, if firewalld is not restarted then you don't witness the issue.
So prior to the nftables upgrade it took 2.7s to handle a 12k ipset, after the nftables update it took 2 minutes 3 seconds, a remarkable difference.

I'm wondering if an earlier nftables update (July?) introduced a possible issue / conflict with firewalld (peformance wise) which didn't immediately become apparent because the update itself wasn't enough to see the issue, as a restart of firewalld was also required and the presence of a large ipset.

To clarify the timeline we started to see this issue across a fleet of hundreds of systems around 21st-22nd September, I wonder if the nftables was updated months prior to that, but it perhaps took the act of another package being updated (firewalld or other) before it restarted the firewalld service and then the issue was apparent.

Any thoughts or advice appreciated as always 🙏🏻

@codyro
Copy link
Member

codyro commented Dec 15, 2023

Can I just check if there were any typos or mis-phrasing in your opening statements? Because whilst you said "I was not able to replicate the huge performance change between versions:" your results seem to demonstrate it slowing significantly (it taking 1 minute and 36 seconds longer to complete than before), which seems to reinforce what we are also seeing (a substantial performance decrease going from firewalld-0.9.3-13 > firewalld-0.9.11-1

I was wondering if I should rephrase it. I ran the tests about 5 times each with wildly inconsistent numbers across versions--some being faster then the newer version, some being slower. I don't think it's the regression in performance you're seeing. Sorry for the confusion.

By chance I just stumbled across this whilst looking for AlmaLinux rpm's the date of 3 months ago and last three filenames caught my eye:

That's why I tested the latest build from upstream--to test those patches :). They didn't fix anything, unfortunately.

I was curious if the latest version from upstream fixed this, as it looks like they're on version 0.9.11-4, however it looks like it experiences the same issue (and based off the above linked issues, it may partially be an issue with the version of nftables being used)

@ashleycawley Could you check it by installing packages from CS8 to your system?
You can take them from this build: https://kojihub.stream.centos.org/koji/buildinfo?buildID=40764

This was already tested above.

So prior to the nftables upgrade it took 2.7s to handle a 12k ipset, after the nftables update it took 2 minutes 3 seconds, a remarkable difference.

The restart of firewalld after the upgrade of nftables is important, if firewalld is not restarted then you don't witness the issue.
So prior to the nftables upgrade it took 2.7s to handle a 12k ipset, after the nftables update it took 2 minutes 3 seconds, a remarkable difference.

This is interesting. I'll see if I can get those results and see if I can find anything useful :)

@ashleycawley
Copy link
Author

Yeah I just did a fresh test on another system, downloaded & ran a copy of AlmaLinux-8.8-x86_64-minimal.iso and then out of the box (no updates):

[root@nftablestesting ~]# cd /etc/firewalld/ipsets/
[root@nftablestesting ipsets]# touch beacon.xml
[root@nftablestesting ipsets]# vi beacon.xml 
[root@nftablestesting ipsets]# # Pasted a 12k ipset into that file ^^
[root@nftablestesting ipsets]# firewall-cmd --permanent --zone=drop --add-source=ipset:beacon
success
[root@nftablestesting ipsets]# time firewall-cmd --reload
success

real	0m4.489s
user	0m0.296s
sys	0m0.043s
[root@nftablestesting ipsets]# yum update nftables -y &>/dev/null
[root@nftablestesting ipsets]# systemctl restart firewalld
[root@nftablestesting ipsets]# time firewall-cmd --reload
ERROR:dbus.proxies:Introspect error on :1.19:/org/fedoraproject/FirewallD1: dbus.exceptions.DBusException: org.freedesktop.DBus.Error.NoReply: Did not receive a reply. Possible causes include: the remote application did not send a reply, the message bus security policy blocked the reply, the reply timeout expired, or the network connection was broken.

success

real	5m51.622s
user	0m0.300s
sys	0m0.035s

On a low spec VM (2 CPU, 2GB RAM) it went from 4.5s before nftables update to almost 6 minutes after.

@andrewlukoshko
Copy link
Member

Could you please try to test the same in AlmaLinux 8.10 beta?

@ashleycawley
Copy link
Author

Thanks for the update @andrewlukoshko I will be happy to setup the same test scenario with 8.10 beta and will report back within the next few hours if I can.

@ashleycawley
Copy link
Author

OK, tested on the same hardware and spec as the earlier post, with AlmaLinux 8.10 beta, out of the box with the 12k ipset in place (same one as before) a firewall-cmd --reload this time took 1min 14secs.

Unlike my previous post where this process used to take a mere 4 seconds and then got worse (almost 6 minutes) after an update to nftables, there was no update available to nftables on this release so no update was applied to that package during this round of testing.

So arguably you could say it has improved from the last state of play; dropping from 6min to 1min, however, its still no where near as fast or as effecient as it originally was for quite some time where it would take just 4 seconds.

Thank you so much for your time and help on tracking this issue down.

If I can help with any further testing then please let me know.

@SirStephanikus
Copy link

I know, this is Alma Linux, but I've the same issue with Rocky-8 (latest updates) --> here it takes up to 19 minutes to reload.

Since Alma and Rocky are almost identical, I thought I may give my feedback here too.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants