-
Notifications
You must be signed in to change notification settings - Fork 303
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #744 from minrk/wip-slug
add 'safe' slug scheme
- Loading branch information
Showing
7 changed files
with
864 additions
and
133 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,157 @@ | ||
(templates)= | ||
|
||
# Templated fields | ||
|
||
Several fields in KubeSpawner can be resolved as string templates, | ||
so each user server can get distinct values from the same configuration. | ||
|
||
String templates use the Python formatting convention of `f"{fieldname}"`, | ||
so for example the default `pod_name_template` of `"jupyter-{user_server}"` will produce: | ||
|
||
| username | server name | pod name | | ||
| ---------------- | ----------- | ---------------------------------------------- | | ||
| `user` | `''` | `jupyter-user` | | ||
| `user` | `server` | `jupyter-user--server` | | ||
| `[email protected]` | `Some Name` | `jupyter-user-email-com--some-name---0c1fe94b` | | ||
|
||
## templated properties | ||
|
||
Some common templated fields: | ||
|
||
- [pod_name_template](#KubeSpawner.pod_name_template) | ||
- [pvc_name_template](#KubeSpawner.pvc_name_template) | ||
- [working_dir](#KubeSpawner.working_dir) | ||
|
||
## fields | ||
|
||
The following fields are available in templates: | ||
|
||
| field | description | | ||
| ------------------------ | ------------------------------------------------------------------------------------------------------------------------------------------------------------ | | ||
| `{username}` | the username passed through the configured slug scheme | | ||
| `{servername}` | the name of the server passed through the configured slug scheme (`''` for the user's default server) | | ||
| `{user_server}` | the username and servername together as a single slug. This should be used most places for a unique string for a given user's server (new in kubespawner 7). | | ||
| `{unescaped_username}` | the actual username without escaping (no guarantees about value, except as enforced by your Authenticator) | | ||
| `{unescaped_servername}` | the actual server name without escaping (no guarantees about value) | | ||
| `{pod_name}` | the resolved pod name, often a good choice if you need a starting point for other resources (new in kubespawner 7) | | ||
| `{pvc_name}` | the resolved PVC name (new in kubespawner 7) | | ||
| `{namespace}` | the kubernetes namespace of the server (new in kubespawner 7) | | ||
| `{hubnamespace}` | the kubernetes namespace of the Hub | | ||
|
||
Because there are two escaping schemes for `username`, `servername`, and `user_server`, you can explicitly select one or the other on a per-template-field basis with the prefix `safe_` or `escaped_`: | ||
|
||
| field | description | | ||
| ----------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------- | | ||
| `{escaped_username}` | the username passed through the old 'escape' slug scheme (new in kubespawner 7) | | ||
| `{escaped_servername}` | the server name passed through the 'escape' slug scheme (new in kubespawner 7) | | ||
| `{escaped_user_server}` | the username and servername together as a single slug, identical to `"{escaped_username}--{escaped_servername}".rstrip("-")` (new in kubespawner 7) | | ||
| `{safe_username}` | the username passed through the 'safe' slug scheme (new in kubespawner 7) | | ||
| `{safe_servername}` | the server name passed through the 'safe' slug scheme (new in kubespawner 7) | | ||
| `{safe_user_server}` | the username and server name together as a 'safe' slug (new in kubespawner 7) | | ||
|
||
These may be useful during a transition upgrading a deployment from an earlier version of kubespawner. | ||
|
||
The value of the unprefixed `username`, etc. is goverend by the [](#KubeSpawner.slug_scheme) configuration, and always matches exactly one of these values. | ||
|
||
## Template tips | ||
|
||
In general, these guidelines should help you pick fields to use in your template strings: | ||
|
||
- use `{user_server}` when a string should be unique _per server_ (e.g. pod name) | ||
- use `{username}` when it should be unique per user, but shared across named servers (sometimes chosen for PVCs) | ||
- use `{escaped_}` prefix if you need to keep certain values unchanged in a deployment upgrading from kubespawner \< 7 | ||
- `{pod_name}` can be re-used anywhere you want to create more resources associated with a given pod, | ||
to avoid repeating yourself | ||
|
||
## Changing template configuration | ||
|
||
Changing configuration should not generally affect _running_ servers. | ||
However, when changing a property that may need to persist across user server restarts, special consideration may be required. | ||
For example, changing `pvc_name` or `working_dir` could result in disconnecting a user's server from data loaded in previous sessions. | ||
This may be your intention or not! KubeSpawner cannot know. | ||
|
||
`pvc_name` is handled specially, to avoid losing access to data. | ||
If `KubeSpawner.remember_pvc_name` is True, once a server has started, a server's PVC name cannot be changed by configuration. | ||
Any future launch will use the previous `pvc_name`, regardless of change in configuration. | ||
If you _want_ to change the names of mounted PVCs, you can set | ||
|
||
```python | ||
c.KubeSpawner.remember_pvc_name = False | ||
``` | ||
|
||
This handling isn't general for PVCs, only specifically the default `pvc_name`. | ||
If you have defined your own volumes, you need to handle changes to these yourself. | ||
|
||
## Upgrading from kubespawner \< 7 | ||
|
||
Prior to kubespawner 7, an escaping scheme was used that ensured values were _unique_, | ||
but did not always ensure fields were _valid_. | ||
In particular: | ||
|
||
- start/end rules were not enforced | ||
- length was not enforced | ||
|
||
This meant that e.g. usernames that start with a capital letter or were very long could result in servers failing to start because the escaping scheme produced an invalid label. | ||
To solve this, a new 'safe' scheme has been added in kubespawner 7 for computing template strings, | ||
which aims to guarantee to always produce valid object names and labels. | ||
The new scheme is the default in kubespawner 7. | ||
|
||
You can select the scheme with: | ||
|
||
```python | ||
c.KubeSpawner.slug_scheme = "escape" # no changes from kubespawner 6 | ||
c.KubeSpawner.slug_scheme = "safe" # default for kubespawner 7 | ||
``` | ||
|
||
The new scheme has the following rules: | ||
|
||
- the length of any _single_ template field is limited to 48 characters (the total length of the string is not enforced) | ||
- the result will only contain lowercase ascii letters, numbers, and `-` | ||
- it will always start and end with a letter or number | ||
- if a name is 'safe', it is used unmodified | ||
- if any escaping is required, a truncated safe subset of characters is used, followed by `---{hash}` where `{hash}` is a checksum of the original input string | ||
- `-` shall not occur in sequences of more than one consecutive `-`, except where inserted by the escaping mechanism | ||
- if no safe characters are present, 'x' is used for the 'safe' subset | ||
|
||
Since length requirements are applied on a per-field basis, a new `{user_server}` field is added, | ||
which computes a single valid slug following the above rules which is unique for a given user server. | ||
The general form is: | ||
|
||
``` | ||
{username}--{servername}---{hash} | ||
``` | ||
|
||
where | ||
|
||
- `--{servername}` is only present for non-empty server names | ||
- `---{hash}` is only present if escaping is required for _either_ username or servername, and hashes the combination of user and server. | ||
|
||
This `{user_server}` is the recommended value to use in pod names, etc. | ||
In the escape scheme, `{user_server}` is identical to the previous value used in default templates: `{username}--{servername}`, | ||
so it should be safe to upgrade previous templated using `{username}--{servername}` to `{user_server}` or `{escaped_user_server}`. | ||
|
||
In the vast majority of cases (where no escaping is required), the 'safe' scheme produces identical results to the 'escape' scheme. | ||
Probably the most common case where the two differ is in the presence of single `-` characters, which the `escape` scheme escaped to `-2d`, while the 'safe' scheme does not. | ||
|
||
Examples: | ||
|
||
| name | escape scheme | safe scheme | | ||
| ------------------------------------------------------------------- | --------------------------------------------------------------------------------------------------- | -------------------------------------------------- | | ||
| `username` | `username` | `username` | | ||
| `has-hyphen` | `has-2dhyphen` | `has-hyphen` | | ||
| `Capital` | `-43apital` (error) | `capital---1a1cf792` | | ||
| `[email protected]` | `user-40email-2ecom` | `user-email-com---0925f997` | | ||
| `a-very-long-name-that-is-too-long-for-sixty-four-character-labels` | `a-2dvery-2dlong-2dname-2dthat-2dis-2dtoo-2dlong-2dfor-2dsixty-2dfour-2dcharacter-2dlabels` (error) | `a-very-long-name-that-is-too-long-for---29ac5fd2` | | ||
| `ALLCAPS` | `-41-4c-4c-43-41-50-53` (error) | `allcaps---27c6794c` | | ||
|
||
Most changed names won't have a practical effect. | ||
However, to avoid `pvc_name` changing even though KubeSpawner 6 didn't persist it, | ||
on first launch (for each server) after upgrade KubeSpawner checks if: | ||
|
||
1. `pvc_name_template` produces a different result with `scheme='escape'` | ||
1. a pvc with the old 'escaped' name exists | ||
|
||
and if such a pvc exists, the older name is used instead of the new one (it is then remembered for subsequent launches, according to `remember_pvc_name`). | ||
This is an attempt to respect the `remember_pvc_name` configuration, even though the old name is not technically recorded. | ||
We can infer the old value, as long as configuration has not changed. | ||
This will only work if upgrading KubeSpawer does not _also_ coincide with a change in the `pvc_name_template` configuration. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,192 @@ | ||
"""Tools for generating slugs like k8s object names and labels | ||
Requirements: | ||
- always valid for arbitary strings | ||
- no collisions | ||
""" | ||
|
||
import hashlib | ||
import re | ||
import string | ||
|
||
_alphanum = tuple(string.ascii_letters + string.digits) | ||
_alphanum_lower = tuple(string.ascii_lowercase + string.digits) | ||
_lower_plus_hyphen = _alphanum_lower + ('-',) | ||
|
||
# patterns _do not_ need to cover length or start/end conditions, | ||
# which are handled separately | ||
_object_pattern = re.compile(r'^[a-z0-9\.-]+$') | ||
_label_pattern = re.compile(r'^[a-z0-9\.-_]+$', flags=re.IGNORECASE) | ||
|
||
# match anything that's not lowercase alphanumeric (will be stripped, replaced with '-') | ||
_non_alphanum_pattern = re.compile(r'[^a-z0-9]+') | ||
|
||
# length of hash suffix | ||
_hash_length = 8 | ||
|
||
|
||
def _is_valid_general( | ||
s, starts_with=None, ends_with=None, pattern=None, min_length=None, max_length=None | ||
): | ||
"""General is_valid check | ||
Checks rules: | ||
""" | ||
if min_length and len(s) < min_length: | ||
return False | ||
if max_length and len(s) > max_length: | ||
return False | ||
if starts_with and not s.startswith(starts_with): | ||
return False | ||
if ends_with and not s.endswith(ends_with): | ||
return False | ||
if pattern and not pattern.match(s): | ||
return False | ||
return True | ||
|
||
|
||
def is_valid_object_name(s): | ||
"""is_valid check for object names""" | ||
# object rules: https://kubernetes.io/docs/concepts/overview/working-with-objects/names/#names | ||
return _is_valid_general( | ||
s, | ||
starts_with=_alphanum_lower, | ||
ends_with=_alphanum_lower, | ||
pattern=_object_pattern, | ||
max_length=255, | ||
min_length=1, | ||
) | ||
|
||
|
||
def is_valid_label(s): | ||
"""is_valid check for label values""" | ||
# label rules: https://kubernetes.io/docs/concepts/overview/working-with-objects/labels/#syntax-and-character-set | ||
if not s: | ||
# empty strings are valid labels | ||
return True | ||
return _is_valid_general( | ||
s, | ||
starts_with=_alphanum, | ||
ends_with=_alphanum, | ||
pattern=_label_pattern, | ||
max_length=63, | ||
) | ||
|
||
|
||
def is_valid_default(s): | ||
"""Strict is_valid | ||
Returns True if it's valid for _all_ our known uses | ||
So we can more easily have a single is_valid check. | ||
- object names have stricter character rules, but have longer max length | ||
- labels have short max length, but allow uppercase | ||
""" | ||
return _is_valid_general( | ||
s, | ||
starts_with=_alphanum_lower, | ||
ends_with=_alphanum_lower, | ||
pattern=_object_pattern, | ||
min_length=1, | ||
max_length=63, | ||
) | ||
|
||
|
||
def _extract_safe_name(name, max_length): | ||
"""Generate safe substring of a name | ||
Guarantees: | ||
- always starts and ends with a lowercase letter or number | ||
- never more than one hyphen in a row (no '--') | ||
- only contains lowercase letters, numbers, and hyphens | ||
- length at least 1 ('x' if other rules strips down to empty string) | ||
- max length not exceeded | ||
""" | ||
# compute safe slug from name (don't worry about collisions, hash handles that) | ||
# cast to lowercase | ||
# replace any sequence of non-alphanumeric characters with a single '-' | ||
safe_name = _non_alphanum_pattern.sub("-", name.lower()) | ||
# truncate to max_length chars, strip '-' off ends | ||
safe_name = safe_name.lstrip("-")[:max_length].rstrip("-") | ||
if not safe_name: | ||
# make sure it's non-empty | ||
safe_name = 'x' | ||
return safe_name | ||
|
||
|
||
def strip_and_hash(name, max_length=32): | ||
"""Generate an always-safe, unique string for any input | ||
truncates name to max_length - len(hash_suffix) to fit in max_length | ||
after adding hash suffix | ||
""" | ||
name_length = max_length - (_hash_length + 3) | ||
if name_length < 1: | ||
raise ValueError(f"Cannot make safe names shorter than {_hash_length + 4}") | ||
# quick, short hash to avoid name collisions | ||
name_hash = hashlib.sha256(name.encode("utf8")).hexdigest()[:_hash_length] | ||
safe_name = _extract_safe_name(name, name_length) | ||
# due to stripping of '-' in _extract_safe_name, | ||
# the result will always have _exactly_ '---', never '--' nor '----' | ||
# use '---' to avoid colliding with `{username}--{servername}` template join | ||
return f"{safe_name}---{name_hash}" | ||
|
||
|
||
def safe_slug(name, is_valid=is_valid_default, max_length=None): | ||
"""Always generate a safe slug | ||
is_valid should be a callable that returns True if a given string follows appropriate rules, | ||
and False if it does not. | ||
Given a string, if it's already valid, use it. | ||
If it's not valid, follow a safe encoding scheme that ensures: | ||
1. validity, and | ||
2. no collisions | ||
""" | ||
if '--' in name: | ||
# don't accept any names that could collide with the safe slug | ||
return strip_and_hash(name, max_length=max_length or 32) | ||
# allow max_length override for truncated sub-strings | ||
if is_valid(name) and (max_length is None or len(name) <= max_length): | ||
return name | ||
else: | ||
return strip_and_hash(name, max_length=max_length or 32) | ||
|
||
|
||
def multi_slug(names, max_length=48): | ||
"""multi-component slug with single hash on the end | ||
same as strip_and_hash, but name components are joined with '--', | ||
so it looks like: | ||
{name1}--{name2}---{hash} | ||
In order to avoid hash collisions on boundaries, use `\\xFF` as delimiter | ||
""" | ||
hasher = hashlib.sha256() | ||
hasher.update(names[0].encode("utf8")) | ||
for name in names[1:]: | ||
# \xFF can't occur as a start byte in UTF8 | ||
# so use it as a word delimiter to make sure overlapping words don't collide | ||
hasher.update(b"\xFF") | ||
hasher.update(name.encode("utf8")) | ||
hash = hasher.hexdigest()[:_hash_length] | ||
|
||
name_slugs = [] | ||
available_chars = max_length - (_hash_length + 1) | ||
# allocate equal space per name | ||
# per_name accounts for '{name}--', so really two less | ||
per_name = available_chars // len(names) | ||
name_max_length = per_name - 2 | ||
if name_max_length < 2: | ||
raise ValueError(f"Not enough characters for {len(names)} names: {max_length}") | ||
for name in names: | ||
name_slugs.append(_extract_safe_name(name, name_max_length)) | ||
|
||
# by joining names with '--', this cannot collide with single-hashed names, | ||
# which can only contain '-' and the '---' hash delimiter once | ||
return f"{'--'.join(name_slugs)}---{hash}" |
Oops, something went wrong.