The AMD Pensando Policy and Services Manager (PSM) is the central management component of the AMD Pensando Distributed Services Platform.
It provides a programmable, secure microservices-based infrastructure to control global policies and distributed services in a data center, such as network, security and storage services.
The PSM platform consists of either one or three quorum controller nodes that provide consistent services to a number of physical devices utilizing AMD Pensando data processing units (DPUs).
The following figure is an architectural diagram of the PSM platform.
PSM nodes run as virtual appliances, deployed as OVA or QCOW2 images.
However, the physical nodes hosting the VMs should run on separate physical servers for greater fault tolerance.
The key-value (KV) store is based on etcd
and distributed across all PSM nodes. Maintaining low latency between the nodes is critical to cluster performance.
All PSM components use gRPC with TLS to communicate with DPU-based devices and other PSM nodes.
In the PSM, the API gateways are distributed and run on all PSM nodes.
At any point, there is a single instance of the API Server running.
The API Server is highly available and will be restarted elsewhere by Kubernetes in case of a node failure.
All cluster services such as network, storage, and controllers, including the API Server, are distributed on the PSM nodes,
and are managed and scheduled by the Kubernetes Controller.
The PSM REST API is asynchronous in nature.
A client will describe a desired state (intent) via the API.
The PSM then applies the desired state to the components as needed.
The observed state (actual state) is maintained and updated by the PSM and describes the current state of each component.
In the PSM API, as with Kubernetes, a user specifies the intent, operations are idempotent, and status describes the actual state.
Below are the primary attributes used to describe various managed objects.
Kind: type of the object
Version: object version (optional)
Meta: object metadata (common to all objects, all fields are optional)
Name: string name of the object (user-provided unique string for this kind of object)
Tenant: tenant name of the object (optional)
Labels: arbitrary tags associated with an object
Spec: Object Specific Schema
Status: Object Specific Status Schema
Example:
There are two kinds of objects: cluster-wide and tenant-wide.
Cluster-wide objects are global to the cluster, while tenant-wide objects are specific to each tenant.
Some examples of cluster-wide objects are Cluster
, Node
and AuthenticationPolicy
.
Tenant-wide objects include Network
and SecurityGroup
.
See the API reference for more details on individual objects.
Each CRUD call receives a response message indicating the return status of the call.
The corresponding HTTPS response code should be used as the primary indicator of success or failure for the operation.
The JSON payload response contains return status and details.
Below are the response status-codes that the PSM will use to indicate the operation result:
Status Code | Description |
---|---|
200 | OK |
400 | Bad request parameters |
401 | Unauthorized request |
409 | Conflict while processing request |
412 | Pre-condition failed |
500 | Internal Server Error |
501 | Request not implemented |
503 | Service Unavailable |
NOTE: The PSM API Gateway returns a JSON-structured payload which includes the details of the response.
Best practice is to first check the HTTPS response code, and then parse the JSON data to get the details
To confirm that policy propagation has completed as intended for a given managed object,
the programmer must follow any add/update
operation with a corresponding get/list
of the object to verify both of the following:
- The propagation status has been marked
Propagation Complete
- The
generation-id
field inpropagation-status
has the same value as thegeneration-id
in the object’smeta
section.
Below is an example of ensuring that propagation is complete:
Username and Password credentials are referenced at runtime through the PSM_USER
and PSM_PASSWORD
environment variables.
In cases where PSM is running without SSL certificates, developers can set configuration.verify_ssl = False
to avoid Unauthorized request
responses.
Login is implicit within the configuration.Configuration
method, as shown below:
import os
from pensando_cloud.psm.apis import ClusterV1Api
from pensando_cloud.psm import configuration, api_client
import warnings
warnings.simplefilter("ignore")
configuration = configuration.Configuration(
psm_config_path = os.environ['HOME'] + "/.psm/config.json",
interactive_mode = True
)
configuration.verify_ssl = False
client = api_client.ApiClient(configuration)
api_instance = ClusterV1Api(client)
response = api_instance.get_cluster()
[...]
Following successful login, the PSM IP-addr and login token will be stored for the user in $HOME/.psm/config.json
There is no need to perform a token refresh. A token is valid for six days.
There is no need to perform a logout. When a client no longer needs access it can simply discard the current token.
However, deleting a user or changing RBAC privileges for a user will immediately affect any existing valid token the user might have.
A staging buffer allows for multiple operations to be consolidated into one single function call.
When staging multiple calls, each individual call is authorized and validated and accumulated in the staging buffer.
A commit operation on the staging buffer then applies and persists the contents of the staging buffer in a single function call.
Buffer commit operations have “all-or-nothing” semantics, meaning that either all changes accumulated in the buffer are applied together, or none is.
The following shows the flow when using configuration staging.
The staging buffer is represented as an object, identified by an ID which can be either user-provided (a “named buffer”), or automatically generated by the system. All staging operations will then use this identifier. The buffer ID becomes part of the URL for the CRUD operations on the buffer contents. There are two categories of actions related to the staging buffer:
- Operations on the staging buffer itself
- Staged operations in the staging buffer
All actions on the staging buffer go through RBAC authorization.
In addition to privileges for CRUD operations for objects that are staged,
the following privileges are associated with the staging buffer object itself :
- Create
- Update
- List Buffer Contents
- Commit
- Delete
- List Staging Buffers
Only operations for which the user has privileges will be allowed on the staging buffer. The Update, List and Commit Operations have additional considerations for the staging buffer.
Creates a new staging buffer. The identifier can be user-specified or will be a system-generated ID. This identifier is used for all subsequent operations on the staging buffer.
Updating the staging buffer involves adding API Object CRUD operations to the staging buffer. The CRUD operations themselves are authorized separately. To update the staging buffer with an API operation, the user needs to have both update privilege for the staging buffer, and privilege for the CRUD operation being staged.
Verifies and shows the contents of the staging buffer. This involves both semantic and syntactic validation at the API server. Each object in the buffer is validated, considering all the other objects in the staging buffer, plus the current state of the KV store. The Validation result specifies “success” or “failure”, and lists any objects that failed validation along with the associated failure messages.
The commit operation applies the changes in the staging buffer to the system. The user performing the commit must have both commit privileges on the staging buffer, and privileges for the object operations stored in the staging buffer.
Deletes the staging buffer and its contents. Has no effect on the actual objects within the staging buffer
List active staging buffers. Requires Read privileges on the staging buffer itself.
The REST API will allow the following operations on entries within the staging buffer:
- Create
- Update
- Delete
Creates a new staging buffer entry. The identifier can be user-specified or will be a system-generated ID. This identifier is used for all subsequent operations on the staging buffer entry.
Modify an existing entry in the buffer; elements to be updated are identified in the request using an URI.
Clears one or more entries in the staging buffer. The “clear” could be for any operation that is staged (Create/Update/Delete). Elements to be cleared are identified in the request using a URI. Each operation goes through the same RBAC authorization process as non-staged operations. If the operation passes RBAC, the object is validated and added to the staging buffer. Only one operation is supported per API object referenced in the staging buffer; for example, “delete” and “modify” on the same object is not permitted. If there is a conflicting operation on the same object then the latest operation will overwrite. A create and a subsequent modify would modify the initial create request.
PSM clients can use the staging buffer “bulk edit” feature to create, update, or delete multiple objects in one call, simplifying the task of staging large numbers of configuration changes. A typical sequence would be:
- Create a staging buffer
- Stage a single bulk edit, containing multiple operations
- Get the buffer to review its content
- Commit the buffer, executing all staged changes
- Delete the buffer
Bulk edit operations are currently supported only on config objects. Bulk edit operations are not supported on archive logs, configuration snapshots, rollout images or fwlogs.