RFC | Title | Author | Status | Type |
---|---|---|---|---|
35 |
Audit-Mode for Chef Client |
Claire McQuin <[email protected]> |
Accepted |
Standards Track |
Audit mode is an new phase in Chef which allows you to evaluate custom rules, defined in your recipes, on every node during each chef-client run. Use audits to ensure nodes fall into existing "known states" categories even before Chef converges, and to validate your infrastructure after Chef converges.
As an inheritor of a non-Chef-managed infrastructure
I want to run chef-client and collect data on each node without converging
so that I can determine the existing state of the inherited infrastructure.
As a maintainer of a Chef-managed infrastructure
I want to write custom rules defining expected state
so that I can validate my infrastructure.
Audits are evaluated in their own phase. During a full chef-client
run,
auditing occurs independently after client converges.
By default, client will converge the node
and executed audits. Chef can be configured to skip audit mode via the command
line flag --no-audit-mode
or the configuration file option audit_mode :disabled
.
Alternatively, converge can be skipped via the command line flag --audit-mode
or the configuration file option audit_mode :audit_only
.
As controls are evaluated during the audit phase, results will be streamed to `Chef::Config[:log_location]`` in an easy to read format using RSpec's documentation formatter.
The Chef::EventDispatch::Base
will be updated to support the following events
Event Name | Context |
---|---|
converge_failed(error) |
client did not converge successfully with error |
audit_phase_start(run_status) |
audit phase started |
audit_phase_complete |
audit phase finished |
audit_phase_failed(exception) |
an uncaught exception occurred during the audit phase |
control_group_started(name) |
signifies the start of a controls group with a defined name |
control_example_success(control_group_name, example_data) |
an example in a control_group_name group completed successfully |
control_example_failure(control_group_name, example_data, error) |
an example in a control_group_name group failed with error |
The example_data
hash contains the informational fields
- the
name
of the evaluated audit rule - the full
desc
of the evaluated audit rule (includesname
) - the
resource_type
evaluated, if any - the name of the evaluated resource,
resource_name
- any containing scope is saved in
context
- the
line_number
of the failed audit
Audits are written inside recipe files. Audits can be written in a separate
recipe or can be added into recipes defining resources. Audits are collected
within a named controls
block, which does not get evaluated until the audit
phase begins.
Audit rules are defined within a controls
group using RSpec's it
syntax.
Rules can be grouped together using the control
method, or any other RSpec
example group method (e.g., describe
or context
). RSpec's built-in matchers
are available, as well as Serverspec types and matchers. The use of :should
is explicitly disabled, as this is deprecated in RSpec 3.
Audits can be written to help ensure compliance requirements, such as asserting
nothing is listening on port 111. Depending on your distribution and its version,
your portmap service may be named "portmap" or "rpcbind", and could be renamed
after a version bump. Your recipe may use the correct service provider but the
init script may have been removed, preventing any service resource :stop
action
from completing successfully.
The ports::audit
recipe ensures nothing is listening on port 111:
# cookbook: ports
# recipe: audit
controls "port compliance" do
control port(111) do
it "has nothing listening"
expect(port(111)).to_not be_listening
end
end
When ports::audit
is added to the run-list and chef-client
is run with
audit mode enabled, you would expect the log output to contain
port compliance
Port "111"
has nothing listening
Finished in 0.08615 seconds (files took 0.67889 seconds to load)
1 example, 0 failures
When an audit fails, the failed example is marked in the log output for debugging. At the end of the client run, Chef will exit with exit status 1.
Suppose port 111 was not shut down correctly and someone is listening on it. When
the ports::audit
recipe is run, the log output would contain something similar
to
port compliance
Port "111"
has nothing listening (FAILED - 1)
Failures:
1) port compliance Port "111" has nothing listening
Failure/Error: expect(port(111)).to_not be_listening
expected Port "111" not to be listening
# cookbooks/ports/recipes/audit.rb:7:in `block (3 levels) in from_file'
Finished in 0.12515 seconds (files took 0.70174 seconds to load)
1 example, 1 failure
Failed examples:
rspec cookbooks/ports/recipes/audit.rb:6 # port compliance Port "111" has nothing listening
These exceptions can be raised during a client run due to errors in the audits included in recipes in the run list:
Chef::Exceptions::AuditNameMissing
: Raised whencontrols
is declared without a name.Chef::Exceptions::NoAuditsProvided
: Raise whencontrols
is declared but defines no audits.Chef::Exceptions::AuditControlGroupDuplicate
: Raised when twocontrols
are declared with the same name. Multiplecontrols
groups can be defined in the same recipe, as this may happen when usinginclude_recipe
. However, no twocontrols
groups in the run list can have the same name.
Errors occurring in the converge phase do not affect the execution of the audit phase. Similarly, errors occurring in the audit phase do not affect later phases. Errors are collected to be provided to the appropriate error handlers once each phase completes.
Cookbooks support versioning and are an effective medium for distributing code. Including audits in recipes help to maintain a flat directory structure, and don't require the addition of a new server segment.
Even though it's possible to build this logic as external libraries (see minitest-chef-handler) building it as a first class citizen with config options, CLI options and hooks for event handlers and maintaining it overtime will be a challenge.
Also to achieve usability, any TDI (test driven infrastructure) related logic should be available out of the box inside the Client omnibus packages. As long as functionality is available out of the box, building it into core as an alpha feature vs implementing it as an external gem is only an implementation choice. This doesn't change any compatibility commitments.
In the future we can definitely come up with a better DSL than RSpec. But we would like to reuse the awesome tool Serverspec and its practices as well as we would like to provide a generic interface for the power users.
The use case for auditing without converging is to support an existing Chef customer absorbing a non-Chef managed infrastructure. In this instance, they can only run audits until cookbooks have been prepared for the new infrastructure. Similarly, an audit-only phase can help new users convert their unmanaged infrastructure to a Chef-managed infrastructure.
So that you can validate your infrastructure is still in a state consistent with your expectations. Ideally, when converge fails your audits should still pass. It's reassuring to have that sanity check.
This work is in the public domain. In jurisdictions that do not allow for this, this work is available under CC0. To the extent possible under law, the person who associated CC0 with this work has waived all copyright and related or neighboring rights to this work.