-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] 3005 broke syndication #62577
Comments
Can you please include you configs and some logs from the master and syndics? |
Servers:
Upgraded from 3004.2 to 3005. (Debian 11 package from Salt's repositories) MoM runs master and minion, and sharing Syndic runs master, minion and syndic, and sharing syndic connect to MoM, but also to the local master, and overflood it with start event like
|
on the MoM event bus, on syndic startup, I have
Nothing else |
I am not sure if it's related to the syndic setup. However after upgrading the "syndic" server stops working correctly after a couple of minutes with the errors seen below in the log. What might be the reason for this problem? I am running Debian 10 with packages from the official Salt repository.
|
Since Debian 10 has an old version of python3-zmq this might be similar to #62550 and #62550 (comment) might be a workaround. |
@d--j Tested pyzmq for hang issue on Debian 10 and saw no problem with OS provided python3-zmq, see #62550 (comment) |
At least @tkaehn is running Debian 10, not Debian 11. Debian 10 has python3-zmq version 17.1.2. See https://packages.debian.org/buster/python3-zmq |
Downgrading the Salt packages to 3004.2+ds-1 on the "syndic" server solved the issue. The "mom" system is still running on 3005+ds-1. Both systems on Debian 10. |
After upgrading both servers from Debian 10 to 11 and both to Salt 3005 everything is running stable again. So the problem is solved for me. However it seems that 3005 introduced an incompatibility with Debian 10 that should be fixed. |
I have problem with Syndic on Salt 3005 and Rocky Linux 8.6. It went into infinite loop of "minion start" events. This causes Salt Master to stop behaving right after few seconds of start… Event Bus is getting spammed all over with this weird events and Master just stops responding right after Syndic startup after it spams event bus with all of this:
|
@ebusto Would it be possible to upgrade the version of pyzmq on Debian 10 system which is exhibiting the problems with syndic, a version of pyzmq >= 20.0.0. The following example command should work:
Been sure to restart salt-master and salt-minion after installing the update pyzmq, just to ensure the update is picked up. On the Debian 10 I tested the command, pyzmq v23.2.1 was installed. |
bump ^ can anyone confirm the issue is related to the version of pyzmq as @dmurphy18 outlined. |
Just checking in again. Can anyone confirm the pyzmq upgrade @dmurphy18 outlined solves this issue? |
This did not help me as im on |
@satellite-no can you post your entire versions report: I'm guessing this is a dependency issue since its working on a OS that has different version dependencies. |
Please see the version reports below. As I've said before everything is working again (on the surface).
When targeting a lot of minions (~100) at the same time it happens occasionally that some of them do not return. This happened before 3005 as well. Is it possible that increasing worker_threads (currently 32) could solve this problem?
|
Sure thing! Master Versions Report
Syndic Versions Report
This is the error I see on the master
|
Thanks. The only thing that stands out is there are different pyzmq versions on your syndic and master. @satellite-no would you mind upgrading your pyzmq version on your syndic, restart the services and see if that makes a difference? |
@Ch3LL So silly question, I'm not sure how to do that LOL.. According to my versions-report Salt is using This is an yum install of salt so if you know where salt installs its python version let me know (I attempted a find but with no avail). |
Did you install the onedir packages or classic packages? |
I used the onedir package. |
okay so your master is classic packages and syndic is onedir? If you want to upgrade a package on onedir you can use the |
This is not good. Why is syndic still broken in 3005.1? Is there no movement on fixing this bug? |
I've tried to upgrade to the onedir packages to see if this makes any difference for me.
And also like this:
I am very disappointed that a new release breaks so much and that there seems to be no quick solution. What is your advise? Downgrading to 3004? From the documentation:
How can this work if targeting around 100 minions out of a few hundred results in occasional timeouts? Running again all minions do respond. This happened with 3004 as well. |
I'm unsure if this is the same issue we're facing, but after upgrading to 3005, our systems loadavg goes haywire if the masters local minion is connecting to itself, and the syndic is connecting to a different master. If the minion connects to the same master as the syndic, there's no issues... Salt Version: Dependency Versions: System Versions: |
We are experiencing all of the issues reported here, and on a hunch from @Jookadin's comment we moved the minion on the syndic to be connected to the MoM and the problems have gone away. I don't know if this helps, but on our syndic when salt-minion is connected to the master on the syndic it goes into a loop with the following error:
During this period the salt-syndic process is throwing the following error constantly:
Worth noting we are also seeing the behavior reported above with the logs littered with messages about an extra return from minions:
|
For those interested/able, in
to
It's probably not the right fix, but at least in my limited testing it appears to work, at least if the problem is just the syndic returning to the MoM. You can verify that's your problem by checking on one of your minions off the syndic master:
Assuming there's nothing there, from your MoM, given a minion of bob:
Then back on bob, If you do try my above change, let me know how it works! |
seeing exactly this problem on ubuntu22.04 running 3005.1 installed from repo.saltproject.io. master, syndic and minion running on a single device. Worked fine on 3004.2 on ubuntu20.04. Salt version report
|
#63257 should sort this out 👍 |
Fixed by #63382 |
Description
3005 broke syndication.
Setup
Upgrade an existing 3004.2 environment to 3005. Watch syndicated masters no longer connect to the syndication master.
Expected behavior
Syndication to be actually tested before releases.
The text was updated successfully, but these errors were encountered: