Skip to content

Commit

Permalink
doc: add QoS policy and role documentation to the dCache Book
Browse files Browse the repository at this point in the history
Motivation:

Should have been added with the last policy engine patch
(my bad).

Modification:

Updates the qos page and adds a cookbook section.

Result:

Users are a little less lost.

Target: master
Request: 9.2
Patch: https://rb.dcache.org/r/14128/
Requires-notes:  yes (just say documentation now available)
Acked-by: Dmitry
  • Loading branch information
alrossi authored and lemora committed Oct 12, 2023
1 parent 0ffd722 commit 151b4cb
Show file tree
Hide file tree
Showing 5 changed files with 333 additions and 10 deletions.
44 changes: 34 additions & 10 deletions docs/TheBook/src/main/markdown/config-qos-engine.md
Original file line number Diff line number Diff line change
Expand Up @@ -194,16 +194,40 @@ is indefinitely set to "sticky". To change the file back to cached, a second
modification request is required. In the future, this may be done via a time-bound
set of rules given to the engine (not yet implemented).

File QoS modification can be achieved through the RESTful frontend for single files,

[dCache Frontend Service/A Note on the RESTful resource for QoS transitions](config-frontend.md/)

of through the Bulk service for file sets.

[dCache Bulk Service/Job plugins](config-bulk.md)

In addition, the administrator can issue transition requests directly through the admin interface
for the Bulk service using ``request submit``.
File QoS modification can be achieved through the RESTful frontend, either
for single files using `/api/v1/namespace`, or in bulk, using `/api/v1/bulk-requests`,
the latter communicating with the [dCache Bulk Service](config-bulk.md). Please
refer to the SWAGGER pages at (`https://example.org:3880/api/v1`) for a description
of the available RESTful resources. Admins can also submit and control bulk qos
transitions directly through the admin shell commands for the Bulk service.

#### QoS file policy (since 9.2)

With version 9.2, a "rule engine" capability has been added to the QoS Engine. The
way this works is as follows:

1. A QoS Policy is defined (it is expressed in JSON).
2. The policy is uploaded through the Frontend REST API (`qos-policy`). This stores the policy
in the namespace. The API also allows one to list current policies, view a policy's JSON,
and remove the policy. Adding and removal require admin privileges.
3. Directories can be tagged using the `QosPolicy` tag, which should indicate the
name of the policy. All files written to this directory will be associated with this policy.
4. The policy defines a set of transitions (media states), each having a specific duration.
The QoS Engine keeps track of the current transition and its expiration. Upon expiration,
it consults the policy to see what the next state is, and asks the QoS Verifier to apply it.
When a file has reached the final state of its policy, it is no longer checked by the QoS Engine;
however, if the file's final state includes `ONLINE` access latency, the QoS Scanner will
check it during the periodic online scans; on the other hand, if the file's final state
includes `NEARLINE` access latency, but its retention policy is `CUSTODIAL`, the QoS Scanner
will check to make sure it has a tape location. *_Note that there is no requirement for a file
in dCache to have a QoS policy._*
5. The Bulk service's `UPDATE_QOS` activity now allows for transitioning files both by
`targetQos` (`disk`, `tape`, `disk+tape`), but also by `qosPolicy` (associate with the
file with a policy by this name); in addition, it is possible to skip transitions in that
policy using the `qosState` argument to indicate which index of the transition
list to begin at (0 by default).

For more information on policies, with some examples, see the QoS Policy cookbook.

### QoS and "resilience"

Expand Down
294 changes: 294 additions & 0 deletions docs/TheBook/src/main/markdown/cookbook-qos-policies.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,294 @@
QoS Policies
=================================

With dCache 9.2, file QoS can be managed automatically by the `QoS Engine` using a policy.
The following details the structure of a policy, how to manage policies and how to
associate a policy with a file.

-----
[TOC bullet hierarchy]
-----

## QoS Policy Definition

QoS policies determine how files should be stored on various storage media over time.

QoS policies are of predefined types. Only users with administrative privileges
can add or remove policy definitions.

A policy is defined by a JSON object consisting of an identifying name and array of states,
each with a duration value (ISO 8601) and an array of media directives. The policy name is
arbitrary; meaningful names could indicate something about the storage policy itself.
For instance, one might choose “public-resilient” to denote files only on disk and which
have 20 replicas apiece (as at FNAL). However, these names should be unique within the local
dCache installation.

A file may be associated with a QoS policy. Files having a defined policy are verified
by the dCache QoS system and appropriate action taken to guarantee that the file's presence
on the various media conform with that policy. Files without a policy will be treated
in the usual manner.

The policy definitions serve as templates from which an individual file's transitions are derived.
The template becomes a set of QoS transitions to be applied in succession by transforming state
duration into a timestamp denoting the expiration of that state for that file.

The basic structure for the JSON file is as follows:

- *name* - String identifier. Required.
- *states* - Ordered list (array) of states. Required.

Each state consists of:

- *duration* - How long the state should last. Optional. No duration means the same as INF.
Expressed using ISO 8601 duration notation.
- *media* - Ordered list (array) of storage element descriptions. At least one is required.

Each storage element description consists of:

- *storageMedium* - currently one of (DISK, HSM). Required.
- *numberOfCopies* - currently supported for DISK only.
- *type* - String. Could describe a disk type or the hsm system name, for example. Optional, currently
unused.
- *instance* - String URI for the system instance (HSM only).
- *partitionKeys* - list (array) of values used to distribute copies across pools (DISK only).
Analogous to the storage unit attribute `onlyOneCopyPer` (pool tags) used for resilient files.
Optional.

A simple example:

```
{
"name": "TEST",
"states": [
{
"duration": "P10D",
"media": [
{
"storageMedium": "DISK",
"numberOfCopies": 2
}
]
},
{
"duration": "P1M",
"media": [
{
"storageMedium": "DISK",
"numberOfCopies": 1
},
{
"storageMedium": "HSM",
"numberOfCopies": 1
}
]
},
{
"media": [
{
"storageMedium": "HSM",
"numberOfCopies": 1
}
]
}
]
}
```

Here we have a policy with three states. When the file with this policy is initially
written, it is given two disk replicas. After ten days, the file is flushed to tape
and one of the replicas is cached. One month after that, the single replica is
also cached and the file is only guaranteed to be on tape.

There is theoretically no limit to the transitions making up the states array.
Currently, only two types of storage media are recognized, `DISK` and `HSM`, and
only one `HSM` copy at a time is supported (this may change in the future).

No duration need be given on the final state entry; the QoS Engine will stop tracking this file
after all transitions have completed. The QoS Scanner component, however, will still periodically
check that `ONLINE` files indeed have their necessary number of replicas, and, if the file
is `NEARLINE CUSTODIAL` and has a QoS policy, that it has been flushed (these are two separate
scans that can be scheduled with different periods in the QoS Scanner).

## Managing Policies

As stated above, only admins are given permission to define policies. These may be set, queried
and deleted. Once a policy is uploaded, however, it cannot be modified. One would instead have to
create a new policy definition and upload it, then delete the old one when one is sure files are no
longer associated with it.

The Frontend provides a RESTful resource for policy management. The SWAGGER page contains
more details. These are found under `qos-policy`:

![QoS Policy REST API](images/qos-policy-rest.png)

Policies are stored in the namespace (Chimera), but are also cached by the QoS Engine. There
are admin shell commands in the namespace available for listing and viewing policies:

```
admin > \sn help qos policies
NAME
qos policies -- List qos policy names
SYNOPSIS
qos policies
DESCRIPTION
Show list of policy names
admin > \sn help show qos policy
NAME
show qos policy -- Print qos policy
SYNOPSIS
show qos policy policy
DESCRIPTION
Display qos policy
ARGUMENTS
policy
The policy name.
```

## Applying Policies to Files

Files can be given a policy at the time of initial write via a directory tag.

```
[arossi@fndcatemp1 persistent]$ echo "TEST" > ".(tag)(QosPolicy)"
[arossi@fndcatemp1 persistent]$ grep "" $(cat ".(tags)()")
.(tag)(AccessLatency):ONLINE
.(tag)(file_family):dcache-devel-test
.(tag)(file_family_width):10
.(tag)(QosPolicy):TEST
.(tag)(RetentionPolicy):REPLICA
.(tag)(storage_group):persistent
```

When the `QosPolicy` tag is set, it overrides the `AccessLatency` and `RetentionPolicy`
attributes for the purposes of QoS verification. When written, the file will be placed
in the first state of the policy by the QoS Verifier, and the QoS Engine will record
the expiration based on the duration indicated. The QoS Engine periodically checks the expiration
timestamps of the files it has registered, and promotes them to the next state
accordingly.

Should one wish at some point to change the QoS policy for one or more files,
this can be achieved via the RESTful bulk resource `/api/v1/bulk-requests`.

For example,

```
{"activity":"UPDATE_QOS",
"arguments": {"qosPolicy":"TEST"},
"target":["/pnfs/fs/usr/example-user/my_scratch_dir"],
"expandDirectories":"TARGETS"}
```

represents the JSON parameter to the POST call on `bulk-requests` which will give
all the files in `my_scratch_dir` the `TEST` policy.

If `TEST` has more than one state, and you only wish to apply the transitions
from a certain index in the state list forward, you can also specify the `qosState`
attribute. For instance, in the `TEST` example above, the second state
(at index 1) flushes the file immediately and keeps only one disk copy.

Thus:

```
{"activity":"UPDATE_QOS",
"arguments": {"qosPolicy":"TEST", "qosState":1},
"target":["/pnfs/fs/usr/example-user/my_scratch_dir"],
"expandDirectories":"TARGETS"}
```

would skip the first state (two disk copies only) and immediately store the file on tape,
also making its disk copy persistent (sticky). Then after 30 days, that replica
would be cached.

The older arguments for the Bulk `UPDATE_QOS` activity type are still valid;
that is, one can still choose to do a one-time transition using

`"targetQos":"tape|disk+tape|disk"`

as before. In this case, there is no policy assigned; the file is simply transitioned
to that QoS state. One can also transition between policy and this kind of
simple QoS operation without restriction. For instance, say you wanted to
disassociate a file from its current policy, making it merely `NEARLINE CUSTODIAL`.
You could do a bulk request using `"targetQos":"tape"`; this would cache all disk
copies, and eliminate the file from QoS tracking by the QoS Engine.

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
NOTE: Policies cannot be assigned to single files using the /api/v1/namespace
resource (but one can still issue disk, tape and disk+tape transitions this
way). There is nothing, however, preventing one from changing a single file
by using the /api/v1/bulk-requests resource; just make the "target":[] array
contain a single file path.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

> Users need special authorization in order to transition files. This is
> achieved via roles defined in the multimap plugin configuration file. There are three
> roles available: admin, qos-user and qos-group. The first grants privileges
> on all files; the second, on files whose owner matches the user's uid; the
> last, on files whose group matches the user's primary gid. Example:
>
> dn:"/DC=org/DC=cilogon/C=US/O=Fermi National Accelerator Laboratory/OU=People/CN=Henry Higgins/CN=UID:higgs" username:higgs uid:8342 gid:4211,true roles:qos-user,qos-group
>
> This will give Prof. Higgins privileges on files he owns or which have his primary gid
> as group.
A file's policy and its current transition state are stored in the basic attributes
table of the namespace (Chimera). Inspection of file QoS policies is available both
through the REST interface and the admin shell.

The `/api/v1/namespace/<path>` resource will return the file's policy attributes if given
the `optional=true` parameter. As can be seen from the SWAGGER pages, the `/qos-policy`
resource also supports retrieval of file policy information:

![QoS File Policy REST API](images/qos-file-policy-rest.png)

These correspond to the namespace admin shell commands:

```
\sn help file policy
NAME
file policy -- shows qos policy info
SYNOPSIS
file policy <pathorid>
DESCRIPTION
Reports policy name and state for the file, if defined.
ARGUMENTS
<pathorid>
<pnfsid>|<path>.
\sn help file policy stats
NAME
file policy stats -- shows summary of qos policy info
SYNOPSIS
file policy stats
DESCRIPTION
Gives a list of policy names, states and respective file counts.
```

Additionally, the QoS Engine has a spot-check command to see if it is tracking
a given file:

```
\s qos-engine help qos
NAME
qos -- print qos info for a file if it is being tracked
SYNOPSIS
qos <pnfsid>
ARGUMENTS
<pnfsid>
The unique identifier of the file within dCache.
```
5 changes: 5 additions & 0 deletions docs/TheBook/src/main/markdown/cookbook.md
Original file line number Diff line number Diff line change
Expand Up @@ -69,6 +69,11 @@ This part contains guides for specific tasks a system administrator might want t
- [Maven Archetype](cookbook-writing-hsm-plugins.md#maven-archetype)
- [Examples](cookbook-writing-hsm-plugins.md#examples)

- [QoS Policies](cookbook-qos-policies.md)
- [QoS Policy Schema](cookbook-qos-policies.md#qos-policy-definition)
- [QoS Policy Management](cookbook-qos-policies.md#managing-policies)
- [Applying a QoS Policy to a file](cookbook-qos-policies.md#applying-policies-to-files)

- [Advanced Tuning](cookbook-advanced.md)
- [Multiple Queues for Movers in each Pool](cookbook-advanced.md#multiple-queues-for-movers-in-each-pool)
- [Description](cookbook-advanced.md#description)
Expand Down
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

0 comments on commit 151b4cb

Please sign in to comment.