Skip to content

Commit

Permalink
(feat): Preserve trusted cpu/mem resource configurations (#381)
Browse files Browse the repository at this point in the history
Preserve trusted cpu/mem resource configurations
  • Loading branch information
bsquizz authored Jul 9, 2024
1 parent 11860b2 commit a1151a1
Show file tree
Hide file tree
Showing 5 changed files with 317 additions and 61 deletions.
36 changes: 33 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@ As an example, typing `bonfire deploy host-inventory` leads to the host-inventor
- [Examples for Common Use Cases](#examples-for-common-use-cases)
- [Commonly Used CLI Options](#commonly-used-cli-options)
- [Deploying/Processing](#deployingprocessing)
- [Trusted/Untrusted Resource Configurations](#trusteduntrusted-resource-configurations)
- [Interactions with Ephemeral Namespace Operator](#interactions-with-ephemeral-namespace-operator)
- [Interactions with Clowder Operator](#interactions-with-clowder-operator)
- [ClowdEnvironments](#clowdenvironments)
Expand Down Expand Up @@ -120,7 +121,7 @@ Once logged in, you should be able to type `bonfire deploy advisor`

This will cause `bonfire` to do the following:
1. reserve a namespace
2. fetch the templates for all components in the 'advisor' app and process them. By default, bonfire will look up the template and commit hash using the 'main'/'master' branch of each component defined in the app. It will look up the git commit hash for this branch and automatically set the `IMAGE_TAG` parameter passed to the templates.
2. fetch the templates for all components in the 'advisor' app and process them. By default, bonfire will look up the template and commit hash using the 'main'/'master' branch of each component defined in the app. It will look up the git commit hash for this branch and automatically set the `IMAGE_TAG` parameter passed to the templates.
3. if ClowdApp resources are found, figure out which additional ClowdApps to fetch and process
4. apply all the processed template resources into your reserved namespace
5. wait for all the resources to reach a 'ready' state
Expand Down Expand Up @@ -210,6 +211,35 @@ bonfire deploy advisor --set-image-tag quay.io/cloudservices/advisor-backend=my_
* `--optional-deps-method <hybrid|all|none>` -- change the way that bonfire processes ClowdApp optional dependencies (see "Dependency Processing" section)
* `--prefer PARAM_NAME=PARAM_VALUE` -- in cases where bonfire finds more than one deployment target, use this to set the parameter names and values that should be used to select a "preferred" deployment target. This option can be passed in multiple times. `bonfire` will select the target with the highest amount of "preferred parameters" on it. Default is currently set to `ENV_NAME=frontends` to select "stable" frontends in the consoledot environments.

## Trusted/Untrusted Resource Configurations

When templates are processed, bonfire removes all untrusted CPU/Memory requests and limits from ClowdApp/ClowdJob/ClowdJobInvocation objects within OpenShift templates (this is because the default value for `--remove-resources` is `all`). If the resource config is unset in a ClowdApp, Clowder will set the deployments to use default values defined in the “ClowdEnvironment” to take effect.

The reason for this behavior is because many app templates tend to request far more CPU/memory in their template compared to what they actually need in a test environment.

You can use the `--no-remove-resources` option to turn this behavior off for certain apps or components but the preferred approach is to audit your CPU/memory configurations and set up a resource configuration that bonfire trusts.

Once you have audited your container CPU/mem usage in test environments and have determined efficient values that you want to use, you can indicate that bonfire should trust the CPU/Memory requests and limits by adhering to the following requirements:

1. Any ClowdApp/ClowdJob/ClowdJobInvocation in your template asking for CPU/mem using requests or limits should use a parameter with a name formatted like this:
* `CPU_REQUEST_<NAME>`
* `CPU_LIMIT_<NAME>`
* `MEM_REQUEST_<NAME>`
* `MEM_LIMIT_<NAME>`

Where `<NAME>` can be whatever arbitrary name you'd like to use containing `A-Z`, `0-9,` and `_`. For example, either of these would be valid:
* `CPU_REQUEST_MQ_CONSUMER`
* `MEM_LIMIT_API`

2. Your deployment configuration for the component must explicitly define values for these parameters. For ephemeral environments normally this will mean that your parameters are defined under the ephemeral deploy target in app-interface.

If these steps are followed then your resource config is considered "trusted" and bonfire will not remove the requests/limits configurations.


**NOTE:** a couple things that will change in the future:
* The "host-inventory" app is trusted by default, but once their configurations are tweaked to be trusted we will remove this
* Object types such as Deployment/StatefulSet/DaemonSet/etc. are not currently analyzed. In future once teams have audited all container resource configurations, we will begin to have bonfire check for trusted configurations on all object types.


# Interactions with Ephemeral Namespace Operator

Expand Down Expand Up @@ -243,7 +273,7 @@ When bonfire processes templates, if it finds a ClowdApp, it will do the followi
* You will end up with all components of `app-a`, `app-b-clowdapp`, AND `app-c-clowdapp` deployed into the namespace.
* `none`: `bonfire` will ignore the `optionalDependencies` on all ClowdApps that it encounters

# Configuration Details
# Configuration Details

> NOTE: for information related to app-interface configurations, see the internal [ConsoleDot Docs](https://consoledot.pages.redhat.com/)
Expand All @@ -266,7 +296,7 @@ By default, if app components use `ClowdApp` resources in their templates, then

`bonfire` ships with a [default config](bonfire/resources/default_config.yaml) that should be enough to get started for most internal Red Hat employees.

By default, the configuration file will be stored in `~/.config/bonfire/config.yaml`.
By default, the configuration file will be stored in `~/.config/bonfire/config.yaml`.

If you wish to override any app configurations, you can edit your local configuration file by typing `bonfire config edit`. You can then define an app under the `apps` key of the config. You can reset the config to default at any time using `bonfire config write-default`.

Expand Down
16 changes: 8 additions & 8 deletions bonfire/bonfire.py
Original file line number Diff line number Diff line change
Expand Up @@ -545,9 +545,9 @@ def _app_or_component_selector(ctx, param, this_value):
click.option(
"--remove-resources",
help=(
"Remove resource limits and requests on ClowdApp configs "
"for specific components or apps. Prefix the app name with "
"'app:', otherwise specify the component name. (default: all)"
"Remove untrusted (defined in README) resource limits/requests on "
"ClowdApp/ClowdJob/CJI objects for specific components or apps. Prefix the app name "
"with 'app:', otherwise specify the component name. (default: 'all')"
),
type=str,
multiple=True,
Expand All @@ -556,9 +556,9 @@ def _app_or_component_selector(ctx, param, this_value):
click.option(
"--no-remove-resources",
help=(
"Don't remove resource limits and requests on ClowdApp configs "
"for specific components or apps. Prefix the app name with "
"'app:', otherwise specify the component name. (default: none)"
"Preserve resource limits/requests even if untrusted (defined in README) on "
"ClowdApp/ClowdJob/CJI objects for specific components or apps. Prefix the app name "
"with 'app:', otherwise specify the component name. (default: none)"
),
type=str,
multiple=True,
Expand Down Expand Up @@ -1416,9 +1416,9 @@ def _err_handler(err):
ns_url = f"{url}/k8s/cluster/projects/{ns}"
log.info("namespace url: %s", ns_url)
log.info(
"resource usage dashboard for namespace '%s': https://grafana.app-sre.devshift.net/d/jRY7KLnVz?var-namespace=%s",
ns,
"resource usage dashboard for namespace '%s': %s",
ns,
conf.RESOURCE_DASHBOARD_URL.format(namespace=ns),
)
click.echo(ns)

Expand Down
14 changes: 14 additions & 0 deletions bonfire/config.py
Original file line number Diff line number Diff line change
Expand Up @@ -88,6 +88,20 @@
if os.getenv("BONFIRE_TRUSTED_COMPONENTS"):
TRUSTED_COMPONENTS = os.getenv("BONFIRE_TRUSTED_COMPONENTS").split(",")

# regexes used to check for trusted resource request/limit
TRUSTED_REGEX_FOR_PATH = {
"resources.requests.cpu": r"\${(CPU_REQUEST[A-Z0-9_]+)}",
"resources.limits.cpu": r"\${(CPU_LIMIT[A-Z0-9_]+)}",
"resources.requests.memory": r"\${(MEM_REQUEST[A-Z0-9_]+)}",
"resources.limits.memory": r"\${(MEM_LIMIT[A-Z0-9_]+)}",
}

TRUSTED_CHECK_KINDS = ["ClowdApp", "ClowdJob", "ClowdJobInvocation"]

RESOURCE_DASHBOARD_URL = (
"https://grafana.app-sre.devshift.net/d/jRY7KLnVz?var-namespace={namespace}"
)

ELASTICSEARCH_HOST = os.getenv("ELASTICSEARCH_HOST", DEFAULT_ELASTICSEARCH_HOST)
ELASTICSEARCH_INDEX = os.getenv("ELASTICSEARCH_INDEX", DEFAULT_ELASTICSEARCH_INDEX)
ELASTICSEARCH_APIKEY = os.getenv("ELASTICSEARCH_APIKEY")
Expand Down
105 changes: 84 additions & 21 deletions bonfire/processor.py
Original file line number Diff line number Diff line change
Expand Up @@ -30,23 +30,85 @@ def _process_template(*args, **kwargs):
return processed_template


def _remove_resource_config(items):
for i in items:
if i["kind"] != "ClowdApp":
def _is_trusted_config(value, regex, component_params, path):
"""
Check for presence of a trusted param being used for the value.
A trusted parameter is defined as:
1. matching the expected regex pattern
2. the parameter value is set in the component's deploy config
Returns True if we determine this is a trusted value
"""
if not regex or not isinstance(regex, str):
raise ValueError("string value for 'regex' must be supplied")

match = False
in_params = False

if value:
match = re.match(regex, value)
if match and match.groups()[0] in component_params:
in_params = True

log.debug(
"value '%s', regex r'%s', matches=%s, in params=%s", value, regex, bool(match), in_params
)

return match and in_params


def _remove_untrusted_configs(data, params, path="", current_dict=None, current_key=None):
"""
Locate configurations within 'data' and remove them if not trusted.
Checks to see if any config matching a path listed in 'config.TRUSTED_PARAM_REGEX_FOR_PATH' is
found within 'data' dictionary and ensures the regex matches and that the parameter value is
set on the component's deploy config.
"""
if isinstance(data, dict):
for key, value in copy.copy(data).items():
_remove_untrusted_configs(
value, params, path + f".{key}", current_dict=data, current_key=key
)

elif isinstance(data, list):
for index, value in enumerate(copy.copy(data)):
_remove_untrusted_configs(value, params, path + f"[{index}]")

# check if this value is at a path where we need to see a certain parameter in use
# in order to preserve it
for path_end, regex in conf.TRUSTED_REGEX_FOR_PATH.items():
# only check values if the path end is listed in TRUSTED_PARAM_REGEX_FOR_PATH
if not path.endswith(path_end):
continue

removed = False
for d in i["spec"].get("deployments", []):
if "resources" in d["podSpec"]:
del d["podSpec"]["resources"]
removed = True
for p in i["spec"].get("pods", []):
if "resources" in p:
del p["resources"]
removed = True
if not _is_trusted_config(data, regex, params, path):
del current_dict[current_key]
log.debug("deleted untrusted config at '%s'", path)

if removed:
log.debug("removed resources from ClowdApp '%s'", i["metadata"]["name"])

def _remove_untrusted_configs_for_template(template, params):
"""
Removes untrusted configurations within the template.
A configuration is trusted if:
1. the value is defined using a template parameter
2. the parameter follows the proper syntax convention
3. the parameter is defined on the component in its deployment config
Checks template to see if any resources of kind listed in 'config.TRUSTED_CHECK_KINDS'
are found. If so, analyzes the config on those resources to see if any need to be removed.
"""
for obj in template.get("objects", []):
kind = obj.get("kind")
if kind not in conf.TRUSTED_CHECK_KINDS:
continue

name = obj.get("metadata", {}).get("name")
log.debug("checking resources on %s '%s'", kind, name)

_remove_untrusted_configs(obj, params)


def _remove_dependency_config(items):
Expand Down Expand Up @@ -580,15 +642,10 @@ def _get_component_items(self, component_name):
self._sub_params(component_name, params)
log.debug("parameters for component '%s': %s", component_name, params)

new_items = _process_template(template, params, self.local)["items"]

# override the tags for all occurences of an image if requested
new_items = self._sub_image_tags(new_items)

# evaluate --remove-resources/--no-remove-resources
app_name = self._get_app_for_component(component_name)

# if app/component is trusted, do not remove its resources
# if entire app or component is marked trusted, do not remove any resources
should_remove_resources = False
if app_name in conf.TRUSTED_APPS:
log.debug("should_remove: app '%s' listed in trusted apps", app_name)
Expand All @@ -604,7 +661,13 @@ def _get_component_items(self, component_name):
)
log.debug("should_remove_resources evaluates to %s", should_remove_resources)
if should_remove_resources:
_remove_resource_config(new_items)
_remove_untrusted_configs_for_template(template, params)

# process the template
new_items = _process_template(template, params, self.local)["items"]

# override the tags for all occurences of an image if requested
new_items = self._sub_image_tags(new_items)

# evaluate --remove-dependencies/--no-remove-dependencies
should_remove_deps = _should_remove(
Expand Down
Loading

0 comments on commit a1151a1

Please sign in to comment.