Skip to content

Commit

Permalink
Ace readme update (#266)
Browse files Browse the repository at this point in the history
* Updated ACE readme (cli/scripts/ace_readme.md) to minimize content and remove overlap with our online docs; also added links

* Adding OS docs folder with expanded docs and README file for ACE with TOC to access content in docs folder.
  • Loading branch information
susan-pgedge authored Feb 19, 2025
1 parent 3b5ce38 commit 624fbfc
Show file tree
Hide file tree
Showing 5 changed files with 817 additions and 55 deletions.
73 changes: 18 additions & 55 deletions cli/scripts/ace_readme.md
Original file line number Diff line number Diff line change
@@ -1,65 +1,28 @@
## ACE (Anti-Chaos Engine) User Guide

ACE is a powerful tool designed to ensure and maintain consistency across nodes in a pgEdge cluster. It helps identify and resolve data inconsistencies, schema differences, and replication configuration mismatches across nodes in a cluster.
# ACE (Anti-Chaos Engine)

Key features include:
- Table-level data comparison and repair
- Replication set level verification
- Automated repair capabilities
- Schema comparison
- Spock configuration validation
ACE is a powerful tool designed to ensure and maintain consistency across nodes in a replication cluster managed by the [Spock extension](https://github.com/pgEdge/spock). It helps identify and resolve data inconsistencies, schema differences, and replication configuration mismatches across nodes in a cluster.

ACE is installed with the pgEdge Platform installer. The following commands describe installing ACE on a management system that is not a member of a replication cluster:
## Table of Contents
- [Building the ACE Extension](README.md#building-the-ace-extension)
- [Managing Data Consistency with ACE](docs/ace_overview.md)
- [Using ACE Functions](docs/ace_functions.md)
- [ACE API Endpoints](docs/ace_api.md)
- [Scheduling ACE Operations](docs/ace_schedule.md)

1. Navigate to the directory where ACE will be installed.
ACE is installed by the CLI installer. The following commands describe installing ACE on a management system (that is not a member of a replication cluster):

2. Invoke the pgEdge installer in that location with the command:
1. After installing the [CLI](https://github.com/pgEdge/cli), navigate to the directory where ACE will be installed.

`python3 -c "$(curl -fsSL https://pgedge-download.s3.amazonaws.com/REPO/install.py)`
2. Invoke the CLI [UM installer](https://github.com/pgEdge/cli/blob/REL25_01/docs/functions/um-install.md) in that location with the command:

3. Create a directory named `cluster` in the `pgedge` directory created by the pgEdge installer.
`pgedge um install ace`

4. [Create and update a cluster_name.json file](https://docs.pgedge.com/platform/installing_pgedge/manage_json), and place the file in `cluster/cluster_name/cluster_name.json` on the ACE host. For example, if your cluster name is `us_eu_backend`, the cluster definition file for this should be placed in `/pgedge/cluster/us_eu_backend/us_eu_backend.json`. The .json file must:
3. Create a directory named `cluster` in the `pgedge` directory.

4. [Create and update a .json file](https://github.com/pgEdge/cli/blob/REL25_01/docs/functions/cluster-json-template.md), and place the file in `cluster/cluster_name/cluster_name.json` on the ACE host. For example, if your cluster name is `us_eu_backend`, the cluster definition file for this should be placed in `/pgedge/cluster/us_eu_backend/us_eu_backend.json`. The .json file must:

## ACE Functions

To review online help about the ACE commands and syntax available for your ACE version, use the command:

`./pgedge ace [command_name] --help`

For detailed information about usage, recommended use cases, and optional arguments, see [the pgEdge documentation](https://designing.pgedge-docs-sandbox.pages.dev/platform/ace/using_ace#ace-commands).


## API Reference

ACE provides a REST API for programmatic access. The API server runs on `localhost:5000` by default. An SSH tunnel is required to access the API from outside the host machine for security purposes.

For detailed information about API usage, recommended use cases, and optional arguments, see [the pgEdge documentation](https://designing.pgedge-docs-sandbox.pages.dev/platform/ace/ace_api).


## Scheduling ACE Operations (Beta)

ACE supports scheduling of automated table-diff and auto-repair operations through configuration settings in the `ace_config.py` file. This allows for regular consistency checks and remediations without manual intervention.

The `ace_config.py` file is located in `${PGEDGE_HOME}/hub/scripts/ace_config.py`. Within the file, you define jobs and their schedules with key/value pairs in the followiwng sections:

* the `schedule_jobs` section of the file contains information about ACE diff jobs.
* the `schedule_config` section of the file contains information about the run frequency.
* the `auto_repair_config` section of the file contains information about scheduled repair jobs.

The ACE scheduler runs the jobs defined in the `ace_config.py` file automatically when ACE is started, or you can control the manually. Use the following commands to manually start and stop the scheduler:

To start the scheduler:

```bash
./pgedge start ace
```

To stop the scheduler:

```bash
./pgedge stop ace
```

For detailed information about using ACE scheduling functionality, [see the pgEdge documentation](https://designing.pgedge-docs-sandbox.pages.dev/platform/ace/schedule_ace).
* Contain connection information for each node in the cluster in the node's stanza.
* Identify the user that will be invoking ACE commands in the `db_user` property. This user must also be the table owner.

After ensuring that the .json file describes your cluster connections and identifies the ACE user, you're ready to use [ACE commands](docs/ace_functions.md).
299 changes: 299 additions & 0 deletions cli/scripts/docs/ace_api.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,299 @@

## ACE API Endpoints

ACE includes API endpoints for some of its most frequently used functions.

## API Reference

ACE provides a REST API for programmatic access. The API server runs on localhost:5000 by default. An SSH tunnel is required to access the API from outside the host machine for security purposes. You must also configure client-based certificiate authentication before using the ACE API.

Please refer to [Cert Auth with EasyRSA](https://docs.google.com/document/d/17SmNVx2Ootdc32ZEuW9qXlNpGIyhHOBfpJGRbXwHeP0/edit?usp=sharing) to set up server and client certificates.

You should create a client certificate separately for ACE--with all necessary privileges on tables, schemas, and databases that you want to use with ACE. This user should preferably be a superuser since ACE may need elevated privileges during diffs and repairs. Each external user can have their own client certificate--typically with lower privileges. The client's role will then need to be granted to the ACE user. For example, if the ACE user (with higher privileges) has a certificate with `ace_user` as the common name, and the external user has a certificate with `external_user` as the common name, then the `external_user` role will need to be granted to `ace_user`.

```sql
GRANT external_user TO ace_user;
```

This is required since ACE will attempt to use `SET ROLE` to switch to the external user's role before performing any operations, thus ensuring that diffs and repairs happen with the external user's privileges.


### The table-diff API

Initiates a table diff operation.

**Endpoint:** `GET /ace/table-diff`

**Request Body:**
```json
{
"cluster_name": "my_cluster", // required
"table_name": "public.users", // required
"dbname": "mydb", // optional
"block_rows": 10000, // optional, default: 10000
"max_cpu_ratio": 0.8, // optional, default: 0.6
"output": "json", // optional, default: "json"
"nodes": "all", // optional, default: "all"
"batch_size": 50, // optional, default: 1
"table_filter": "id < 1000", // optional
"quiet": false // optional, default: false
}
```

**Parameters:**
- `cluster_name` (required): Name of the cluster
- `table_name` (required): Fully qualified table name (schema.table)
- `dbname` (optional): Database name
- `block_rows` (optional): Number of rows per block (default: 10000)
- `max_cpu_ratio` (optional): Maximum CPU usage ratio (default: 0.8)
- `output` (optional): Output format ["json", "csv", "html"] (default: "json")
- `nodes` (optional): Nodes to include ("all" or comma-separated list)
- `batch_size` (optional): Batch size for processing (default: 50)
- `table_filter` (optional): SQL WHERE clause to filter rows for comparison
- `quiet` (optional): Suppress output (default: false)

**Example Request:**
```bash
curl -X POST "http://localhost:5000/ace/table-diff" \
-H "Content-Type: application/json" \
--cert /path/to/client.crt \
--key /path/to/client.key \
-d '{
"cluster_name": "my_cluster",
"table_name": "public.users",
"output": "html"
}'
```

**Example Response:**
```json
{
"task_id": "td_20240315_123456",
"submitted_at": "2024-03-15T12:34:56.789Z"
}
```

### The table-repair API

Initiates a table repair operation.

**Endpoint:** `GET /ace/table-repair`

**Request Body:**
```json
{
"cluster_name": "my_cluster", // required
"diff_file": "/path/to/diff.json", // required
"source_of_truth": "primary", // required unless fix_nulls is true
"table_name": "public.users", // required
"dbname": "mydb", // optional
"dry_run": false, // optional, default: false
"quiet": false, // optional, default: false
"generate_report": false, // optional, default: false
"upsert_only": false, // optional, default: false
"insert_only": false, // optional, default: false
"bidirectional": false, // optional, default: false
"fix_nulls": false, // optional, default: false
"fire_triggers": false // optional, default: false
}
```

**Parameters:**
- `cluster_name` (required): Name of the cluster
- `diff_file` (required): Path to the diff file
- `source_of_truth` (required): Source node for repairs
- `table_name` (required): Fully qualified table name
- `dbname` (optional): Database name
- `dry_run` (optional): Simulate repairs (default: false)
- `quiet` (optional): Suppress output (default: false)
- `generate_report` (optional): Create detailed report (default: false)
- `upsert_only` (optional): Skip deletions (default: false)

**Example Request:**
```bash
curl -X POST "http://localhost:5000/ace/table-repair" \
-H "Content-Type: application/json" \
--cert /path/to/client.crt \
--key /path/to/client.key \
-d '{
"cluster_name": "my_cluster",
"diff_file": "/path/to/diff.json",
"source_of_truth": "primary",
"table_name": "public.users"
}'
```

**Example Response:**
```json
{
"task_id": "tr_20240315_123456",
"submitted_at": "2024-03-15T12:34:56.789Z"
}
```

### The table-rerun API

Reruns a previous table diff operation.

**Endpoint:** `POST /ace/table-rerun`

**Request Body:**
```json
{
"cluster_name": "my_cluster", // required
"diff_file": "/path/to/diff.json", // required
"table_name": "public.users", // required
"dbname": "mydb", // optional
"quiet": false, // optional, default: false
"behavior": "multiprocessing" // optional, default: "multiprocessing"
}
```

**Parameters:**
- `cluster_name` (required): Name of the cluster
- `diff_file` (required): Path to the previous diff file
- `table_name` (required): Fully qualified table name
- `dbname` (optional): Database name
- `quiet` (optional): Suppress output (default: false)
- `behavior` (optional): Processing behavior ["multiprocessing", "hostdb"]

**Example Request:**
```bash
curl -X POST "http://localhost:5000/ace/table-rerun" \
-H "Content-Type: application/json" \
--cert /path/to/client.crt \
--key /path/to/client.key \
-d '{
"cluster_name": "my_cluster",
"diff_file": "/path/to/diff.json",
"table_name": "public.users"
}'
```

**Example Response:**
```json
{
"task_id": "tr_20240315_123456",
"submitted_at": "2024-03-15T12:34:56.789Z"
}
```

### The task-status API

Retrieves the status of a submitted task.

**Endpoint:** `GET /ace/task-status/<task_id>`

**Parameters:**
- `task_id` (required): The ID of the task to check

**Example Request:**
```bash
curl "http://localhost:5000/ace/task-status?task_id=td_20240315_123456" \
--cert /path/to/client.crt \
--key /path/to/client.key
```

**Example Response:**
```json
{
"task_id": "td_20240315_123456",
"task_type": "table-diff",
"status": "COMPLETED",
"started_at": "2024-03-15T12:34:56.789Z",
"finished_at": "2024-03-15T12:35:01.234Z",
"time_taken": 4.445,
"result": {
"diff_file": "/path/to/output.json",
"total_rows": 10000,
"mismatched_rows": 5
"summary": {
// Additional task-specific details
}
}
}
```

### Spock Exception Update API

Updates the status of a Spock exception.

**Endpoint:** `POST /ace/update-spock-exception`

**Request Body:**
```json
{
"cluster_name": "my_cluster", // required
"node_name": "node1", // required
"dbname": "mydb", // optional
"exception_details": { // required
"remote_origin": "origin_oid", // required
"remote_commit_ts": "2024-03-15T12:34:56Z", // required
"remote_xid": "123456", // required
"command_counter": 1, // optional
"status": "RESOLVED", // required
"resolution_details": { // optional
"details": "Issue fixed"
}
}
}
```

**Parameters:**
- `cluster_name` (required): Name of the cluster
- `node_name` (required): The name of the node
- `dbname` (optional): The name of the database
- `exception_details` (required)
- `remote_origin` (optional): The OID of the origin
- `remote_commit_ts` (optional): The timestamp of the exception
- `remote_xid` (optional): The XID of the transaction
- `command_counter` (optional): The number of commands executed
- `status` (optional): The current state of the exception
- `resolution_details`
- `details` (optional): Details about the exception

**Example Request:**
```bash
curl -X POST "http://localhost:5000/ace/update-spock-exception" \
-H "Content-Type: application/json" \
--cert /path/to/client.crt \
--key /path/to/client.key \
-d '{
"cluster_name": "my_cluster",
"node_name": "node1",
"exception_details": {
"remote_origin": "origin1",
"remote_commit_ts": "2024-03-15T12:34:56Z",
"remote_xid": "123456",
"status": "RESOLVED"
}
}'
```

**Example Response:**
```json
{
"message": "Exception status updated successfully"
}
```

## API Error Responses

ACE API endpoints return error responses in the following format:

```json
{
"error": "Description of what went wrong"
}
```

Common HTTP status codes:
- 200: Success
- 400: Bad Request (missing or invalid parameters)
- 401: Unauthorized (missing or invalid client certificate)
- 415: Unsupported Media Type (request body is not JSON)
- 500: Internal Server Error




Loading

0 comments on commit 624fbfc

Please sign in to comment.