Reboots - Looking for community input #13397

TOoSmOotH · 2024-07-28T16:59:44Z

TOoSmOotH
Jul 28, 2024
Maintainer

We have had lots of feedback from our customers and our community concerning reboots. Let's talk about the history of rebooting in past versions. This only applies to non airgap deployments:

16.04 - All kernel, OS, and Security Onion updates were handled through soup. This means your packages did not auto update.

2.3 - We decoupled the OS and kernel updates from the Security Onion updates. This allowed auto updating of the kernel and the OS. We added a message to the MOTD every node in the grid so that if you connected via ssh you would see a list of machines that needed a reboot due to kernel updates.

2.4 - With the introduction of the web interface, the need to log in via ssh is a lot less frequent. We needed a way to communicate that certain systems in your grid needed to be rebooted so we added it to the grid screen. At first we used the blue warning on the menu side but removed that because it looked like there was a problem with grid health which is really not the case. If you ssh in you will still see the MOTD with the hosts needing to be rebooted.

We have had several users from our customer base and community that have asked for the ability to do auto reboots. I think it is important to discuss the implications of that. If you have a large amounts of Elastic data then auto reboots are a really bad idea. For a home user this might not be a big deal as it doesn't take a long time to load the indices back after the reboot. If you have 50TB of Elastic data you could be down for almost an hour as the shards initialize. In instances like this you would want to stagger your reboots in order to keep your cluster healthy so you don't experience downtime. Another potential scenario is you are in the middle of a live incident and your grid starts rebooting.

The reason we did not lock the kernel is we wanted to give users the freedom to keep their systems up to date and not require us to release something in order to do this. We do hold the salt versions and the docker to ensure stability in the product. An example of this was the recent ssh vulnerability that was automatically patched if you had auto updates turned on. Users did not have to wait for SO version 2.4.X for this update. Keep in mind that not all kernel updates are critical and everyone has different change windows in which to do maintenance.

I created this poll below to get your input to see if there is something that we can do that makes sense.

The only thing I would feel comfortable about rebooting en masse would be sensors. You can do this with the following command:

salt "*_sensor" -b 5 system.reboot

This will reboot all of your sensors in batches of 5.

Thanks for your input!

Which method do you like for handling rebooting?

Keep it how it is

45%

Keep it green but instead of saying OK change it to Reboot

18%

Lock the kernel versions so that reboots are only required when updating via soup

0%

Automatically reboot everything

9%

Give the option in the GUI to reboot all the sensors and keep the rest of the nodes manual

27%

11 votes

TOoSmOotH · 2024-08-05T13:53:53Z

TOoSmOotH
Aug 5, 2024
Maintainer Author

Still looking for more feedback!

1 reply

oneCrazyAdmin Aug 28, 2024

I would think perhaps a list of pending updates in the GUI would be helpful as well. I can see some needing reboots before others would.

petehalatsis · 2024-10-30T20:42:07Z

petehalatsis
Oct 30, 2024

I suggest adding an option to schedule reboots via the gui. List what changes are pending and present a window to schedule.

Also, I would like to know if there is a specific order that reboots are preferred? Manager first or or last? Should we wait for the search/manager to be up before updating the XXX node?

0 replies

kspringer-maf · 2024-12-19T12:34:18Z

kspringer-maf
Dec 19, 2024

I rebooted all my nodes 5 days ago. Yesterday afternoon I rebooted them all again because the Gui showed that they needed it. All showed OK. This morning all nodes on the grid show that they need rebooted again, only 16 hours later. How often is the system updating the kernel? This seems a bit frantic. At this rate should I expect the grid's 'normal' state to always show that everything needs Rebooted?

It seems that the Grid section is gathering a lot of real-time info about each node already, can't we get a bit more info about these reboot requests?

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reboots - Looking for community input #13397

{{title}}

Replies: 3 comments 1 reply

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Reboots - Looking for community input #13397

TOoSmOotH Jul 28, 2024 Maintainer

Replies: 3 comments · 1 reply

TOoSmOotH Aug 5, 2024 Maintainer Author

oneCrazyAdmin Aug 28, 2024

petehalatsis Oct 30, 2024

kspringer-maf Dec 19, 2024

TOoSmOotH
Jul 28, 2024
Maintainer

Replies: 3 comments 1 reply

TOoSmOotH
Aug 5, 2024
Maintainer Author

petehalatsis
Oct 30, 2024

kspringer-maf
Dec 19, 2024