Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RHEL8 minion does not run states after installation of minion #63223

Open
2 tasks done
infantvin opened this issue Dec 6, 2022 · 10 comments
Open
2 tasks done

RHEL8 minion does not run states after installation of minion #63223

infantvin opened this issue Dec 6, 2022 · 10 comments
Labels
Bug broken, incorrect, or confusing behavior needs-triage Salt-Cloud Transport

Comments

@infantvin
Copy link

infantvin commented Dec 6, 2022

Description
Creating a new RHEL8 salt-minion using salt-cloud. This automatically installs the latest salt-minion version for 3005.x

Post the salt configuration, the minion does not execute any state passed from the master. The error in the minion log says

2022-12-06 16:21:18,435 [salt.utils.event :821 ][DEBUG ][4602] Sending event: tag = _salt_error; data = {'message': 'The minion function caused an exception', 'args': ('The minion function caused an exception',)

I am attaching a debug based output of the minion log to this report so that all the details are available.

Setup

Fresh install of salt-master 3005.1-2 on RHEL9. Trying to use salt-cloud to create a new RHEL8 minion (RHEL 8.0)

The platform is VMware and the salt-master and minion are both VMware virtual machines.

There is no firewall running on either master or minion.

Both are in the same network, so no VLAN etc configuration

Trying to use onedir 3005 as bootstrap arguments. Same thing happens with any method for the minion install git/stable etc

Please be as specific as possible and give set-up details.

  • on-prem machine
  • [ X] VM (VMware 7.x/Vsphere 7)
  • [ X] classic packaging
  • [ X] onedir packaging
  • used bootstrap to install - this is running via salt-cloud command arguments.

Steps to Reproduce the behavior
Attaching the logs to this report.

Steps:

  • Fresh install of salt-master with salt-cloud version 3005.1-2 on RHEL9
  • create a configuration for a new RHEL8 system and call the salt-cloud command to create the VM
  • After the minion spins up and salt installs on it, no state will work on the minion. The above errors mentioned are seen.
  • Basic module commands from master to ping the minion etc will work. However any state execution via sls will hang indefinitely and return with a timeout. The errors will get thrown in the minion logs.

Expected behavior

The state should execute successfully. Never happens

Screenshots
If applicable, add screenshots to help explain your problem.

Versions Report

$ salt-master --versions-report Salt Version: Salt: 3005.1

Dependency Versions:
cffi: 1.14.5
cherrypy: Not Installed
dateutil: 2.8.1
docker-py: Not Installed
gitdb: Not Installed
gitpython: Not Installed
Jinja2: 2.11.3
libgit2: 1.3.0
M2Crypto: 0.38.0
Mako: Not Installed
msgpack: 1.0.3
msgpack-pure: Not Installed
mysql-python: Not Installed
pycparser: 2.20
pycrypto: 3.16.0
pycryptodome: 3.14.0
pygit2: 1.7.1
Python: 3.9.14 (main, Nov 7 2022, 00:00:00)
python-gnupg: Not Installed
PyYAML: 5.4.1
PyZMQ: 22.3.0
smmap: Not Installed
timelib: Not Installed
Tornado: 4.5.3
ZMQ: 4.3.4

System Versions:
dist: rhel 9.0 Plow
locale: utf-8
machine: x86_64
release: 5.14.0-70.13.1.el9_0.x86_64
system: Linux
version: Red Hat Enterprise Linux 9.0 Plow

salt-minion --versions-report

Salt Version:
Salt: 3005.1

Dependency Versions:
cffi: 1.14.6
cherrypy: 18.6.1
dateutil: 2.8.1
docker-py: Not Installed
gitdb: Not Installed
gitpython: Not Installed
Jinja2: 3.1.0
libgit2: Not Installed
M2Crypto: Not Installed
Mako: Not Installed
msgpack: 1.0.2
msgpack-pure: Not Installed
mysql-python: Not Installed
pycparser: 2.21
pycrypto: Not Installed
pycryptodome: 3.9.8
pygit2: Not Installed
Python: 3.9.15 (main, Nov 8 2022, 03:47:03)
python-gnupg: 0.4.8
PyYAML: 5.4.1
PyZMQ: 23.2.0
smmap: Not Installed
timelib: 0.2.4
Tornado: 4.5.3
ZMQ: 4.3.4

System Versions:
dist: rhel 8.0 Ootpa
locale: utf-8
machine: x86_64
release: 4.18.0-80.el8.x86_64
system: Linux
version: Red Hat Enterprise Linux 8.0 Ootpa

(Provided by running salt --versions-report. Please also mention any differences in master/minion versions.) Both master and minion are at same level.
any simple sls file can be tried.  Say a small sls to install gcc.  In our case, we have a startup state to rename the host based on the VM name in Vmware.
[salt-master-exception.txt](https://github.com/saltstack/salt/files/10169518/salt-master-exception.txt)
[salt-minion-log.txt](https://github.com/saltstack/salt/files/10169521/salt-minion-log.txt)

Additional context
We have a master running 3003 version of salt and everything works fine there. This seems to be 3005 specific. We need 3005 to move our master to the newer version. But this is breaking.

Please let me know if you need any other information from me.

@infantvin infantvin added Bug broken, incorrect, or confusing behavior needs-triage labels Dec 6, 2022
@OrangeDog
Copy link
Contributor

I am attaching a debug based output of the minion log to this report so that all the details are available.

Where is it? In particular, the actual exception should be logged somewhere.

@OrangeDog OrangeDog added the info-needed waiting for more info label Dec 7, 2022
@infantvin
Copy link
Author

Hi
I was positive I added those text files from the master and minion.

I am adding these again. Please confirm when you see them.

salt-master-exception.txt
salt-minion-log.txt

@infantvin
Copy link
Author

I am adding the exception seen on the master here. Its present in the txt file attached before as well if needed.

The minion function caused an exception: Traceback (most recent call last): File "/usr/lib/python3.9/site-packages/salt/minion.py", line 1935, in _thread_return return_data = minion_instance._execute_job_function( File "/usr/lib/python3.9/site-packages/salt/minion.py", line 1894, in _execute_job_function return_data = self.executors[fname](opts, data, func, args, kwargs) File "/usr/lib/python3.9/site-packages/salt/loader/lazy.py", line 149, in __call__ return self.loader.run(run_func, *args, **kwargs) File "/usr/lib/python3.9/site-packages/salt/loader/lazy.py", line 1228, in run return self._last_context.run(self._run_as, _func_or_method, *args, **kwargs) File "/usr/lib/python3.9/site-packages/salt/loader/lazy.py", line 1243, in _run_as return _func_or_method(*args, **kwargs) File "/usr/lib/python3.9/site-packages/salt/executors/direct_call.py", line 10, in execute return func(*args, **kwargs) File "/usr/lib/python3.9/site-packages/salt/loader/lazy.py", line 149, in __call__ return self.loader.run(run_func, *args, **kwargs) File "/usr/lib/python3.9/site-packages/salt/loader/lazy.py", line 1228, in run return self._last_context.run(self._run_as, _func_or_method, *args, **kwargs) File "/usr/lib/python3.9/site-packages/salt/loader/lazy.py", line 1243, in _run_as return _func_or_method(*args, **kwargs) File "/usr/lib/python3.9/site-packages/salt/modules/state.py", line 793, in apply_ return sls(mods, **kwargs) File "/usr/lib/python3.9/site-packages/salt/modules/state.py", line 1394, in sls high_, errors = st_.render_highstate({opts["saltenv"]: mods}) File "/usr/lib/python3.9/site-packages/salt/state.py", line 4461, in render_highstate statefiles = fnmatch.filter(self.avail[saltenv], sls_match) File "/usr/lib/python3.9/site-packages/salt/state.py", line 3562, in __getitem__ self._avail[saltenv] = self._hs.client.list_states(saltenv) File "/usr/lib/python3.9/site-packages/salt/fileclient.py", line 379, in list_states for path in self.file_list(saltenv): File "/usr/lib/python3.9/site-packages/salt/fileclient.py", line 1363, in file_list return self.channel.send(load) File "/usr/lib/python3.9/site-packages/salt/utils/asynchronous.py", line 125, in wrap raise exc_info[1].with_traceback(exc_info[2]) File "/usr/lib/python3.9/site-packages/salt/utils/asynchronous.py", line 131, in _target result = io_loop.run_sync(lambda: getattr(self.obj, key)(*args, **kwargs)) File "/usr/lib/python3.9/site-packages/salt/ext/tornado/ioloop.py", line 459, in run_sync return future_cell[0].result() File "/usr/lib/python3.9/site-packages/salt/ext/tornado/concurrent.py", line 249, in result raise_exc_info(self._exc_info) File "<string>", line 4, in raise_exc_info File "/usr/lib/python3.9/site-packages/salt/ext/tornado/gen.py", line 1064, in run yielded = self.gen.throw(*exc_info) File "/usr/lib/python3.9/site-packages/salt/channel/client.py", line 295, in send ret = yield self._crypted_transfer(load, timeout=timeout, raw=raw) File "/usr/lib/python3.9/site-packages/salt/ext/tornado/gen.py", line 1056, in run value = future.result() File "/usr/lib/python3.9/site-packages/salt/ext/tornado/concurrent.py", line 249, in result raise_exc_info(self._exc_info) File "<string>", line 4, in raise_exc_info File "/usr/lib/python3.9/site-packages/salt/ext/tornado/gen.py", line 1064, in run yielded = self.gen.throw(*exc_info) File "/usr/lib/python3.9/site-packages/salt/channel/client.py", line 252, in _crypted_transfer ret = yield _do_transfer() File "/usr/lib/python3.9/site-packages/salt/ext/tornado/gen.py", line 1056, in run value = future.result() File "/usr/lib/python3.9/site-packages/salt/ext/tornado/concurrent.py", line 249, in result raise_exc_info(self._exc_info) File "<string>", line 4, in raise_exc_info File "/usr/lib/python3.9/site-packages/salt/ext/tornado/gen.py", line 1064, in run yielded = self.gen.throw(*exc_info) File "/usr/lib/python3.9/site-packages/salt/channel/client.py", line 233, in _do_transfer data = yield self.transport.send( File "/usr/lib/python3.9/site-packages/salt/ext/tornado/gen.py", line 1056, in run value = future.result() File "/usr/lib/python3.9/site-packages/salt/ext/tornado/concurrent.py", line 249, in result raise_exc_info(self._exc_info) File "<string>", line 4, in raise_exc_info File "/usr/lib/python3.9/site-packages/salt/ext/tornado/gen.py", line 1064, in run yielded = self.gen.throw(*exc_info) File "/usr/lib/python3.9/site-packages/salt/transport/zeromq.py", line 914, in send ret = yield self.message_client.send(load, timeout=timeout) File "/usr/lib/python3.9/site-packages/salt/ext/tornado/gen.py", line 1056, in run value = future.result() File "/usr/lib/python3.9/site-packages/salt/ext/tornado/concurrent.py", line 249, in result raise_exc_info(self._exc_info) File "<string>", line 4, in raise_exc_info File "/usr/lib/python3.9/site-packages/salt/ext/tornado/gen.py", line 1064, in run yielded = self.gen.throw(*exc_info) File "/usr/lib/python3.9/site-packages/salt/transport/zeromq.py", line 624, in send recv = yield future File "/usr/lib/python3.9/site-packages/salt/ext/tornado/gen.py", line 1056, in run value = future.result() File "/usr/lib/python3.9/site-packages/salt/ext/tornado/concurrent.py", line 249, in result raise_exc_info(self._exc_info) File "<string>", line 4, in raise_exc_info salt.exceptions.SaltReqTimeoutError: Message timed out ERROR: Minions returned with non-zero exit code

@infantvin
Copy link
Author

infantvin commented Dec 7, 2022

Update:

I tried to create a RHEL8.0 VM based minion again by using an older minion version (3003.4) and even that has the same problem. Same error as above from the salt master after a few minutes of waiting when a state.apply is executed.

@OrangeDog
Copy link
Contributor

This might be another case of #62881. There are a lot of timeout errors in that log.

Are you sure the minion can contact the master on both ports? You might need to adjust firewall rules.

@OrangeDog OrangeDog added Transport and removed info-needed waiting for more info labels Dec 7, 2022
@infantvin
Copy link
Author

infantvin commented Dec 8, 2022 via email

@OrangeDog
Copy link
Contributor

Ah, two masters? #62577, #62318

@infantvin
Copy link
Author

infantvin commented Dec 8, 2022 via email

@infantvin
Copy link
Author

infantvin commented Dec 8, 2022

I just verified that both the ports on the master are reachable from the minion

telnet salt-master-3005 4506
Trying xxx...
Connected to salt-master-3005.
Escape character is '^]'.
^]

telnet salt-master-3005 4505
Trying xxxx...
Connected to salt-master-3005.
Escape character is '^]'.
^]

There is also a connection established from the minion side as per the netstat output. However the communication is not taking place

ps -ef|grep salt
root 926 1 0 14:38 ? 00:00:00 /usr/libexec/platform-python /usr/bin/salt-minion
root 1423 926 0 14:38 ? 00:00:00 /usr/libexec/platform-python /usr/bin/salt-minion
root 1427 1423 0 14:38 ? 00:00:00 /usr/libexec/platform-python /usr/bin/salt-minion
root 1880 1423 0 14:38 ? 00:00:00 /usr/libexec/platform-python /usr/bin/salt-minion

netstat -anp|grep pyt
tcp 0 0 10.246.66.86:47148 10.246.67.21:4505 ESTABLISHED 1423/platform-pytho

And whenever we try something like test.ping, the connection gets establised on port 4506 of the master as well.

So, something is going wrong between the master and minion communication when a state is run. The question is what is it?

@infantvin
Copy link
Author

infantvin commented Dec 13, 2022

Hi

After a long weekend of continuous tests, this looks like an IPV6 related problem.
If IPV6 is enabled on the master, this request timeout error is seen whenever a minion is created using salt-cloud. This happens even with the master.conf has IPV6 set to off on it.

On disabling IPV6 completely in the network configuration of the ethernet card on the master node, the minion works normally.

I think I have to investigate if the IPV6 network (which is actively used in our environment) has any problems.
If there is none, then I will open a new issue to you guys again.

Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug broken, incorrect, or confusing behavior needs-triage Salt-Cloud Transport
Projects
None yet
Development

No branches or pull requests

2 participants