Skip to content

Commit

Permalink
Merge pull request #744 from minrk/wip-slug
Browse files Browse the repository at this point in the history
add 'safe' slug scheme
  • Loading branch information
consideRatio authored Aug 1, 2024
2 parents cb200bd + 845f3d8 commit f77cfc3
Show file tree
Hide file tree
Showing 7 changed files with 864 additions and 133 deletions.
3 changes: 2 additions & 1 deletion docs/source/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ management of containerized applications. If you want to run a JupyterHub
setup that needs to scale across multiple nodes (anything with over ~50
simultaneous users), Kubernetes is a wonderful way to do it. Features include:

- Easily and elasticly run anywhere between 2 and thousands of nodes with the
- Easily and elastically run anywhere between 2 and thousands of nodes with the
same set of powerful abstractions. Scale up and down as required by simply
adding or removing nodes.

Expand Down Expand Up @@ -81,5 +81,6 @@ utils
```{toctree}
:maxdepth: 2
:caption: Reference
templates
changelog
```
6 changes: 3 additions & 3 deletions docs/source/ssl.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,16 +8,16 @@ If enabled, the Kubespawner will mount the internal_ssl certificates as Kubernet

To enable, use the following settings:

```
```python
c.JupyterHub.internal_ssl = True

c.JupyterHub.spawner_class = 'kubespawner.KubeSpawner'
```

Further configuration can be specified with the following (listed with their default values):

```
c.KubeSpawner.secret_name_template = "jupyter-{username}{servername}"
```python
c.KubeSpawner.secret_name_template = "{pod_name}"

c.KubeSpawner.secret_mount_path = "/etc/jupyterhub/ssl/"
```
Expand Down
157 changes: 157 additions & 0 deletions docs/source/templates.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,157 @@
(templates)=

# Templated fields

Several fields in KubeSpawner can be resolved as string templates,
so each user server can get distinct values from the same configuration.

String templates use the Python formatting convention of `f"{fieldname}"`,
so for example the default `pod_name_template` of `"jupyter-{user_server}"` will produce:

| username | server name | pod name |
| ---------------- | ----------- | ---------------------------------------------- |
| `user` | `''` | `jupyter-user` |
| `user` | `server` | `jupyter-user--server` |
| `[email protected]` | `Some Name` | `jupyter-user-email-com--some-name---0c1fe94b` |

## templated properties

Some common templated fields:

- [pod_name_template](#KubeSpawner.pod_name_template)
- [pvc_name_template](#KubeSpawner.pvc_name_template)
- [working_dir](#KubeSpawner.working_dir)

## fields

The following fields are available in templates:

| field | description |
| ------------------------ | ------------------------------------------------------------------------------------------------------------------------------------------------------------ |
| `{username}` | the username passed through the configured slug scheme |
| `{servername}` | the name of the server passed through the configured slug scheme (`''` for the user's default server) |
| `{user_server}` | the username and servername together as a single slug. This should be used most places for a unique string for a given user's server (new in kubespawner 7). |
| `{unescaped_username}` | the actual username without escaping (no guarantees about value, except as enforced by your Authenticator) |
| `{unescaped_servername}` | the actual server name without escaping (no guarantees about value) |
| `{pod_name}` | the resolved pod name, often a good choice if you need a starting point for other resources (new in kubespawner 7) |
| `{pvc_name}` | the resolved PVC name (new in kubespawner 7) |
| `{namespace}` | the kubernetes namespace of the server (new in kubespawner 7) |
| `{hubnamespace}` | the kubernetes namespace of the Hub |

Because there are two escaping schemes for `username`, `servername`, and `user_server`, you can explicitly select one or the other on a per-template-field basis with the prefix `safe_` or `escaped_`:

| field | description |
| ----------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------- |
| `{escaped_username}` | the username passed through the old 'escape' slug scheme (new in kubespawner 7) |
| `{escaped_servername}` | the server name passed through the 'escape' slug scheme (new in kubespawner 7) |
| `{escaped_user_server}` | the username and servername together as a single slug, identical to `"{escaped_username}--{escaped_servername}".rstrip("-")` (new in kubespawner 7) |
| `{safe_username}` | the username passed through the 'safe' slug scheme (new in kubespawner 7) |
| `{safe_servername}` | the server name passed through the 'safe' slug scheme (new in kubespawner 7) |
| `{safe_user_server}` | the username and server name together as a 'safe' slug (new in kubespawner 7) |

These may be useful during a transition upgrading a deployment from an earlier version of kubespawner.

The value of the unprefixed `username`, etc. is goverend by the [](#KubeSpawner.slug_scheme) configuration, and always matches exactly one of these values.

## Template tips

In general, these guidelines should help you pick fields to use in your template strings:

- use `{user_server}` when a string should be unique _per server_ (e.g. pod name)
- use `{username}` when it should be unique per user, but shared across named servers (sometimes chosen for PVCs)
- use `{escaped_}` prefix if you need to keep certain values unchanged in a deployment upgrading from kubespawner \< 7
- `{pod_name}` can be re-used anywhere you want to create more resources associated with a given pod,
to avoid repeating yourself

## Changing template configuration

Changing configuration should not generally affect _running_ servers.
However, when changing a property that may need to persist across user server restarts, special consideration may be required.
For example, changing `pvc_name` or `working_dir` could result in disconnecting a user's server from data loaded in previous sessions.
This may be your intention or not! KubeSpawner cannot know.

`pvc_name` is handled specially, to avoid losing access to data.
If `KubeSpawner.remember_pvc_name` is True, once a server has started, a server's PVC name cannot be changed by configuration.
Any future launch will use the previous `pvc_name`, regardless of change in configuration.
If you _want_ to change the names of mounted PVCs, you can set

```python
c.KubeSpawner.remember_pvc_name = False
```

This handling isn't general for PVCs, only specifically the default `pvc_name`.
If you have defined your own volumes, you need to handle changes to these yourself.

## Upgrading from kubespawner \< 7

Prior to kubespawner 7, an escaping scheme was used that ensured values were _unique_,
but did not always ensure fields were _valid_.
In particular:

- start/end rules were not enforced
- length was not enforced

This meant that e.g. usernames that start with a capital letter or were very long could result in servers failing to start because the escaping scheme produced an invalid label.
To solve this, a new 'safe' scheme has been added in kubespawner 7 for computing template strings,
which aims to guarantee to always produce valid object names and labels.
The new scheme is the default in kubespawner 7.

You can select the scheme with:

```python
c.KubeSpawner.slug_scheme = "escape" # no changes from kubespawner 6
c.KubeSpawner.slug_scheme = "safe" # default for kubespawner 7
```

The new scheme has the following rules:

- the length of any _single_ template field is limited to 48 characters (the total length of the string is not enforced)
- the result will only contain lowercase ascii letters, numbers, and `-`
- it will always start and end with a letter or number
- if a name is 'safe', it is used unmodified
- if any escaping is required, a truncated safe subset of characters is used, followed by `---{hash}` where `{hash}` is a checksum of the original input string
- `-` shall not occur in sequences of more than one consecutive `-`, except where inserted by the escaping mechanism
- if no safe characters are present, 'x' is used for the 'safe' subset

Since length requirements are applied on a per-field basis, a new `{user_server}` field is added,
which computes a single valid slug following the above rules which is unique for a given user server.
The general form is:

```
{username}--{servername}---{hash}
```

where

- `--{servername}` is only present for non-empty server names
- `---{hash}` is only present if escaping is required for _either_ username or servername, and hashes the combination of user and server.

This `{user_server}` is the recommended value to use in pod names, etc.
In the escape scheme, `{user_server}` is identical to the previous value used in default templates: `{username}--{servername}`,
so it should be safe to upgrade previous templated using `{username}--{servername}` to `{user_server}` or `{escaped_user_server}`.

In the vast majority of cases (where no escaping is required), the 'safe' scheme produces identical results to the 'escape' scheme.
Probably the most common case where the two differ is in the presence of single `-` characters, which the `escape` scheme escaped to `-2d`, while the 'safe' scheme does not.

Examples:

| name | escape scheme | safe scheme |
| ------------------------------------------------------------------- | --------------------------------------------------------------------------------------------------- | -------------------------------------------------- |
| `username` | `username` | `username` |
| `has-hyphen` | `has-2dhyphen` | `has-hyphen` |
| `Capital` | `-43apital` (error) | `capital---1a1cf792` |
| `[email protected]` | `user-40email-2ecom` | `user-email-com---0925f997` |
| `a-very-long-name-that-is-too-long-for-sixty-four-character-labels` | `a-2dvery-2dlong-2dname-2dthat-2dis-2dtoo-2dlong-2dfor-2dsixty-2dfour-2dcharacter-2dlabels` (error) | `a-very-long-name-that-is-too-long-for---29ac5fd2` |
| `ALLCAPS` | `-41-4c-4c-43-41-50-53` (error) | `allcaps---27c6794c` |

Most changed names won't have a practical effect.
However, to avoid `pvc_name` changing even though KubeSpawner 6 didn't persist it,
on first launch (for each server) after upgrade KubeSpawner checks if:

1. `pvc_name_template` produces a different result with `scheme='escape'`
1. a pvc with the old 'escaped' name exists

and if such a pvc exists, the older name is used instead of the new one (it is then remembered for subsequent launches, according to `remember_pvc_name`).
This is an attempt to respect the `remember_pvc_name` configuration, even though the old name is not technically recorded.
We can infer the old value, as long as configuration has not changed.
This will only work if upgrading KubeSpawer does not _also_ coincide with a change in the `pvc_name_template` configuration.
192 changes: 192 additions & 0 deletions kubespawner/slugs.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,192 @@
"""Tools for generating slugs like k8s object names and labels
Requirements:
- always valid for arbitary strings
- no collisions
"""

import hashlib
import re
import string

_alphanum = tuple(string.ascii_letters + string.digits)
_alphanum_lower = tuple(string.ascii_lowercase + string.digits)
_lower_plus_hyphen = _alphanum_lower + ('-',)

# patterns _do not_ need to cover length or start/end conditions,
# which are handled separately
_object_pattern = re.compile(r'^[a-z0-9\.-]+$')
_label_pattern = re.compile(r'^[a-z0-9\.-_]+$', flags=re.IGNORECASE)

# match anything that's not lowercase alphanumeric (will be stripped, replaced with '-')
_non_alphanum_pattern = re.compile(r'[^a-z0-9]+')

# length of hash suffix
_hash_length = 8


def _is_valid_general(
s, starts_with=None, ends_with=None, pattern=None, min_length=None, max_length=None
):
"""General is_valid check
Checks rules:
"""
if min_length and len(s) < min_length:
return False
if max_length and len(s) > max_length:
return False
if starts_with and not s.startswith(starts_with):
return False
if ends_with and not s.endswith(ends_with):
return False
if pattern and not pattern.match(s):
return False
return True


def is_valid_object_name(s):
"""is_valid check for object names"""
# object rules: https://kubernetes.io/docs/concepts/overview/working-with-objects/names/#names
return _is_valid_general(
s,
starts_with=_alphanum_lower,
ends_with=_alphanum_lower,
pattern=_object_pattern,
max_length=255,
min_length=1,
)


def is_valid_label(s):
"""is_valid check for label values"""
# label rules: https://kubernetes.io/docs/concepts/overview/working-with-objects/labels/#syntax-and-character-set
if not s:
# empty strings are valid labels
return True
return _is_valid_general(
s,
starts_with=_alphanum,
ends_with=_alphanum,
pattern=_label_pattern,
max_length=63,
)


def is_valid_default(s):
"""Strict is_valid
Returns True if it's valid for _all_ our known uses
So we can more easily have a single is_valid check.
- object names have stricter character rules, but have longer max length
- labels have short max length, but allow uppercase
"""
return _is_valid_general(
s,
starts_with=_alphanum_lower,
ends_with=_alphanum_lower,
pattern=_object_pattern,
min_length=1,
max_length=63,
)


def _extract_safe_name(name, max_length):
"""Generate safe substring of a name
Guarantees:
- always starts and ends with a lowercase letter or number
- never more than one hyphen in a row (no '--')
- only contains lowercase letters, numbers, and hyphens
- length at least 1 ('x' if other rules strips down to empty string)
- max length not exceeded
"""
# compute safe slug from name (don't worry about collisions, hash handles that)
# cast to lowercase
# replace any sequence of non-alphanumeric characters with a single '-'
safe_name = _non_alphanum_pattern.sub("-", name.lower())
# truncate to max_length chars, strip '-' off ends
safe_name = safe_name.lstrip("-")[:max_length].rstrip("-")
if not safe_name:
# make sure it's non-empty
safe_name = 'x'
return safe_name


def strip_and_hash(name, max_length=32):
"""Generate an always-safe, unique string for any input
truncates name to max_length - len(hash_suffix) to fit in max_length
after adding hash suffix
"""
name_length = max_length - (_hash_length + 3)
if name_length < 1:
raise ValueError(f"Cannot make safe names shorter than {_hash_length + 4}")
# quick, short hash to avoid name collisions
name_hash = hashlib.sha256(name.encode("utf8")).hexdigest()[:_hash_length]
safe_name = _extract_safe_name(name, name_length)
# due to stripping of '-' in _extract_safe_name,
# the result will always have _exactly_ '---', never '--' nor '----'
# use '---' to avoid colliding with `{username}--{servername}` template join
return f"{safe_name}---{name_hash}"


def safe_slug(name, is_valid=is_valid_default, max_length=None):
"""Always generate a safe slug
is_valid should be a callable that returns True if a given string follows appropriate rules,
and False if it does not.
Given a string, if it's already valid, use it.
If it's not valid, follow a safe encoding scheme that ensures:
1. validity, and
2. no collisions
"""
if '--' in name:
# don't accept any names that could collide with the safe slug
return strip_and_hash(name, max_length=max_length or 32)
# allow max_length override for truncated sub-strings
if is_valid(name) and (max_length is None or len(name) <= max_length):
return name
else:
return strip_and_hash(name, max_length=max_length or 32)


def multi_slug(names, max_length=48):
"""multi-component slug with single hash on the end
same as strip_and_hash, but name components are joined with '--',
so it looks like:
{name1}--{name2}---{hash}
In order to avoid hash collisions on boundaries, use `\\xFF` as delimiter
"""
hasher = hashlib.sha256()
hasher.update(names[0].encode("utf8"))
for name in names[1:]:
# \xFF can't occur as a start byte in UTF8
# so use it as a word delimiter to make sure overlapping words don't collide
hasher.update(b"\xFF")
hasher.update(name.encode("utf8"))
hash = hasher.hexdigest()[:_hash_length]

name_slugs = []
available_chars = max_length - (_hash_length + 1)
# allocate equal space per name
# per_name accounts for '{name}--', so really two less
per_name = available_chars // len(names)
name_max_length = per_name - 2
if name_max_length < 2:
raise ValueError(f"Not enough characters for {len(names)} names: {max_length}")
for name in names:
name_slugs.append(_extract_safe_name(name, name_max_length))

# by joining names with '--', this cannot collide with single-hashed names,
# which can only contain '-' and the '---' hash delimiter once
return f"{'--'.join(name_slugs)}---{hash}"
Loading

0 comments on commit f77cfc3

Please sign in to comment.