This ARM describes the goals and motivation for managing system reboots, and proposes a type and provider for doing so. The proposal is intended to support multiple operating systems, but is focused on a Windows-specific implementation as this is something every Windows sysadmin has to manage.
- Provide a DSL for managing reboots.
- Allow reboot behavior to be associated with any resource.
- Provide a Windows-specific implementation.
- Cleanly reboot the system, so no lock files should be left behind.
- Support environments where reboots are only handled via an external orchestration agent, e.g. mcollective.
- Puppet should never infer that a reboot may be necessary and reboot the system automatically. System reboots should always be explicitly modeled by the manifest author.
- Fully automate the provisioning of Puppet Labs' Windows Jenkins slaves.
Users have attempted to workaround the reboot problem on Windows using a refreshonly
exec
resource. In the example below, the shutdown
command is only triggered after the NetFx3
(.Net Framework 3) feature is installed via Dism:
dism { 'NetFx3':
ensure => present
}
exec { 'c:\windows\system32\shutdown.exe /r /t 30':
subscribe => Dism['NetFx3'],
refreshonly => true
}
Using an exec
resource makes for a poor user-experience. Also, the installation of NetFx3
may fail if a reboot is already pending. In other words, puppet may need to reboot the system before it can apply the NetFx3
resource.
We would like to model two aspects of reboots:
- Install a package and, if necessary, reboot the system to complete installation.
- Prior to installing a package, check if the system is in the reboot-pending state, and if so, reboot, and then install the package.
In the first case, puppet is only rebooting the system due to an event generated by another resource. In the second case, puppet is actively managing the reboot-pending state of the system.
The first aspect of reboots described above could be modeled as:
package { 'somepackage':
ensure => installed,
}
reboot { 'Puppet needs to reboot the system to complete installation':
subscribe => Package['somepackage']
}
Puppet would only reboot the system if puppet installs somepackage
and the system requires a reboot as a result.
The reboot
type has a when
property whose default value is refreshed
, which means, "only perform a reboot as a result of being refreshed by another resource." As a result, we can omit the when
property for refreshed-based reboots.
In the example above, the reboot
resource subscribes to a package
resource, but it could just as easily be a dism
, registry_value
, etc. resource.
The second aspect of reboots described above could be modeled as:
reboot { 'Puppet needs to reboot the system':
when => pending,
before => Package['somepackage']
}
package { 'somepackage':
ensure => installed,
}
Here, the reboot
resource is evaluated prior to the package
. We've also specified the when
property of the reboot
resource with value pending
, which means, "puppet should check if a reboot is pending, and if so, reboot the system." After the system comes back up, install somepackage
.
It is possible that puppet may need to reboot before and after installing a package, which is a composition of the two examples above:
reboot { 'Puppet needs to reboot the system':
when => pending,
before => Package['somepackage']
}
package { 'somepackage':
ensure => installed,
}
reboot { 'Puppet needs to reboot the system to complete installation':
subscribe => Package['somepackage']
}
Some users don't want puppet to ever reboot the system. Instead, they want to handle that out-of-band during change windows, etc. This could be accomplished using mcollective:
$ mco rpc puppetral create type=reboot name='mco initiated reboot' when=pending
However, this type of unfiltered query means every puppet agent in the collective would receive the message, and have to act on it.
A more scalable approach is to specify fact-based filter:
$ mco rpc puppetral create type=reboot name='mco initiated reboot' when=pending -F reboot_pending=true
where reboot_pending
is a fact. But this approach is also a bit verbose. It would probably be better to add this logic to the puppet plugin:
$ mco puppet reboot -F reboot_pending=true
If the provider requests a system reboot, puppet will need to skip the remaining resources, send its report, and gracefully exit. It is important that the report contain references to the skipped resources, otherwise, from a reporting standpoint, it's unclear why those resources were not evaluated -- were they omitted from the catalog? Puppet can accomplish this by marking each remaining resource as noop
.
It is also important that puppet resumes when the machine boots up. On Windows, we get this behavior for free by running as an Automatic
service. On Unix, we get this behavior by running as an /etc/init.d
daemon. Cases where puppet runs as a Manual
service, or via cron mean that the system won't reach consistency until puppet next runs.
After puppet applies the catalog, it sends a report containing events about each processed resource in the catalog. The report is processed synchronously on the puppet master, and will take varying amounts of time. As a result, there is no guarantee that puppet will exit cleanly before the reboot occurs.
A more robust solution would allow providers to register commands to be executed after sending the report and prior to exiting, similar in idea to the postrun_command setting, though likely different in implementation.
We do not plan on supporting batching of reboot requests, e.g. install 5 packages and only reboot at the end. There are some core issues in puppet that make this hard, e.g. see http://projects.puppetlabs.com/issues/2198. Also, it's not clear what should happen if 4 packages install successfully, but 1 fails.
-
We could create a
rebootable
property (likeensurable
) and add it to thepackage
,dism
,registry_key
, etc types, but restrict it to providers that implement themanages_reboot
feature. For example, the Windowspackage
provider would implement themanages_reboot
feature, and mixin code for performing a reboot.The benefit of this approach is that the provider is in the best position to know under what conditions a reboot is necessary. For example, if a package is installed, then a reboot is only necessary if msiexec returns 3010, but not if it returns 0.
The downside is that it tightly couples reboot behavior with specific type/providers, which could be problematic when using third-party modules. For this reason, I don't recommend this approach.
-
In order to eliminate the race condition mentioned earlier, we could create a watcher process, to wait for puppet to exit, and then execute the desired commands. Doing so has its own issues, such as the lack of fork on windows, which makes it harder to serialize the block of code to execute in the watcher process.
We're assuming that users actually want puppet to reboot their systems. If this is not the case, then providing a reboot_pending
fact would be sufficient. But based on user feedback, I believe some users do want puppet to reboot their systems, mostly for the refresh-based reboot scenario.
If the system does not clear the reboot_pending
state after a reboot, such as the registry key PendingFileRenameOperations, then the system could get into a reboot loop.
Both puppet and facter would need to be able to detect if a reboot is pending. Either the code should be shared or copied. Another option is if puppet can force facter to re-lookup the reboot_pending
fact each time, e.g. volatile fact.
This feature will enable users to manage a critical piece of configuration management on Windows.