-
Notifications
You must be signed in to change notification settings - Fork 11
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
* Updated ACE readme (cli/scripts/ace_readme.md) to minimize content and remove overlap with our online docs; also added links * Adding OS docs folder with expanded docs and README file for ACE with TOC to access content in docs folder.
- Loading branch information
1 parent
3b5ce38
commit 624fbfc
Showing
5 changed files
with
817 additions
and
55 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,65 +1,28 @@ | ||
## ACE (Anti-Chaos Engine) User Guide | ||
|
||
ACE is a powerful tool designed to ensure and maintain consistency across nodes in a pgEdge cluster. It helps identify and resolve data inconsistencies, schema differences, and replication configuration mismatches across nodes in a cluster. | ||
# ACE (Anti-Chaos Engine) | ||
|
||
Key features include: | ||
- Table-level data comparison and repair | ||
- Replication set level verification | ||
- Automated repair capabilities | ||
- Schema comparison | ||
- Spock configuration validation | ||
ACE is a powerful tool designed to ensure and maintain consistency across nodes in a replication cluster managed by the [Spock extension](https://github.com/pgEdge/spock). It helps identify and resolve data inconsistencies, schema differences, and replication configuration mismatches across nodes in a cluster. | ||
|
||
ACE is installed with the pgEdge Platform installer. The following commands describe installing ACE on a management system that is not a member of a replication cluster: | ||
## Table of Contents | ||
- [Building the ACE Extension](README.md#building-the-ace-extension) | ||
- [Managing Data Consistency with ACE](docs/ace_overview.md) | ||
- [Using ACE Functions](docs/ace_functions.md) | ||
- [ACE API Endpoints](docs/ace_api.md) | ||
- [Scheduling ACE Operations](docs/ace_schedule.md) | ||
|
||
1. Navigate to the directory where ACE will be installed. | ||
ACE is installed by the CLI installer. The following commands describe installing ACE on a management system (that is not a member of a replication cluster): | ||
|
||
2. Invoke the pgEdge installer in that location with the command: | ||
1. After installing the [CLI](https://github.com/pgEdge/cli), navigate to the directory where ACE will be installed. | ||
|
||
`python3 -c "$(curl -fsSL https://pgedge-download.s3.amazonaws.com/REPO/install.py)` | ||
2. Invoke the CLI [UM installer](https://github.com/pgEdge/cli/blob/REL25_01/docs/functions/um-install.md) in that location with the command: | ||
|
||
3. Create a directory named `cluster` in the `pgedge` directory created by the pgEdge installer. | ||
`pgedge um install ace` | ||
|
||
4. [Create and update a cluster_name.json file](https://docs.pgedge.com/platform/installing_pgedge/manage_json), and place the file in `cluster/cluster_name/cluster_name.json` on the ACE host. For example, if your cluster name is `us_eu_backend`, the cluster definition file for this should be placed in `/pgedge/cluster/us_eu_backend/us_eu_backend.json`. The .json file must: | ||
3. Create a directory named `cluster` in the `pgedge` directory. | ||
|
||
4. [Create and update a .json file](https://github.com/pgEdge/cli/blob/REL25_01/docs/functions/cluster-json-template.md), and place the file in `cluster/cluster_name/cluster_name.json` on the ACE host. For example, if your cluster name is `us_eu_backend`, the cluster definition file for this should be placed in `/pgedge/cluster/us_eu_backend/us_eu_backend.json`. The .json file must: | ||
|
||
## ACE Functions | ||
|
||
To review online help about the ACE commands and syntax available for your ACE version, use the command: | ||
|
||
`./pgedge ace [command_name] --help` | ||
|
||
For detailed information about usage, recommended use cases, and optional arguments, see [the pgEdge documentation](https://designing.pgedge-docs-sandbox.pages.dev/platform/ace/using_ace#ace-commands). | ||
|
||
|
||
## API Reference | ||
|
||
ACE provides a REST API for programmatic access. The API server runs on `localhost:5000` by default. An SSH tunnel is required to access the API from outside the host machine for security purposes. | ||
|
||
For detailed information about API usage, recommended use cases, and optional arguments, see [the pgEdge documentation](https://designing.pgedge-docs-sandbox.pages.dev/platform/ace/ace_api). | ||
|
||
|
||
## Scheduling ACE Operations (Beta) | ||
|
||
ACE supports scheduling of automated table-diff and auto-repair operations through configuration settings in the `ace_config.py` file. This allows for regular consistency checks and remediations without manual intervention. | ||
|
||
The `ace_config.py` file is located in `${PGEDGE_HOME}/hub/scripts/ace_config.py`. Within the file, you define jobs and their schedules with key/value pairs in the followiwng sections: | ||
|
||
* the `schedule_jobs` section of the file contains information about ACE diff jobs. | ||
* the `schedule_config` section of the file contains information about the run frequency. | ||
* the `auto_repair_config` section of the file contains information about scheduled repair jobs. | ||
|
||
The ACE scheduler runs the jobs defined in the `ace_config.py` file automatically when ACE is started, or you can control the manually. Use the following commands to manually start and stop the scheduler: | ||
|
||
To start the scheduler: | ||
|
||
```bash | ||
./pgedge start ace | ||
``` | ||
|
||
To stop the scheduler: | ||
|
||
```bash | ||
./pgedge stop ace | ||
``` | ||
|
||
For detailed information about using ACE scheduling functionality, [see the pgEdge documentation](https://designing.pgedge-docs-sandbox.pages.dev/platform/ace/schedule_ace). | ||
* Contain connection information for each node in the cluster in the node's stanza. | ||
* Identify the user that will be invoking ACE commands in the `db_user` property. This user must also be the table owner. | ||
|
||
After ensuring that the .json file describes your cluster connections and identifies the ACE user, you're ready to use [ACE commands](docs/ace_functions.md). |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,299 @@ | ||
|
||
## ACE API Endpoints | ||
|
||
ACE includes API endpoints for some of its most frequently used functions. | ||
|
||
## API Reference | ||
|
||
ACE provides a REST API for programmatic access. The API server runs on localhost:5000 by default. An SSH tunnel is required to access the API from outside the host machine for security purposes. You must also configure client-based certificiate authentication before using the ACE API. | ||
|
||
Please refer to [Cert Auth with EasyRSA](https://docs.google.com/document/d/17SmNVx2Ootdc32ZEuW9qXlNpGIyhHOBfpJGRbXwHeP0/edit?usp=sharing) to set up server and client certificates. | ||
|
||
You should create a client certificate separately for ACE--with all necessary privileges on tables, schemas, and databases that you want to use with ACE. This user should preferably be a superuser since ACE may need elevated privileges during diffs and repairs. Each external user can have their own client certificate--typically with lower privileges. The client's role will then need to be granted to the ACE user. For example, if the ACE user (with higher privileges) has a certificate with `ace_user` as the common name, and the external user has a certificate with `external_user` as the common name, then the `external_user` role will need to be granted to `ace_user`. | ||
|
||
```sql | ||
GRANT external_user TO ace_user; | ||
``` | ||
|
||
This is required since ACE will attempt to use `SET ROLE` to switch to the external user's role before performing any operations, thus ensuring that diffs and repairs happen with the external user's privileges. | ||
|
||
|
||
### The table-diff API | ||
|
||
Initiates a table diff operation. | ||
|
||
**Endpoint:** `GET /ace/table-diff` | ||
|
||
**Request Body:** | ||
```json | ||
{ | ||
"cluster_name": "my_cluster", // required | ||
"table_name": "public.users", // required | ||
"dbname": "mydb", // optional | ||
"block_rows": 10000, // optional, default: 10000 | ||
"max_cpu_ratio": 0.8, // optional, default: 0.6 | ||
"output": "json", // optional, default: "json" | ||
"nodes": "all", // optional, default: "all" | ||
"batch_size": 50, // optional, default: 1 | ||
"table_filter": "id < 1000", // optional | ||
"quiet": false // optional, default: false | ||
} | ||
``` | ||
|
||
**Parameters:** | ||
- `cluster_name` (required): Name of the cluster | ||
- `table_name` (required): Fully qualified table name (schema.table) | ||
- `dbname` (optional): Database name | ||
- `block_rows` (optional): Number of rows per block (default: 10000) | ||
- `max_cpu_ratio` (optional): Maximum CPU usage ratio (default: 0.8) | ||
- `output` (optional): Output format ["json", "csv", "html"] (default: "json") | ||
- `nodes` (optional): Nodes to include ("all" or comma-separated list) | ||
- `batch_size` (optional): Batch size for processing (default: 50) | ||
- `table_filter` (optional): SQL WHERE clause to filter rows for comparison | ||
- `quiet` (optional): Suppress output (default: false) | ||
|
||
**Example Request:** | ||
```bash | ||
curl -X POST "http://localhost:5000/ace/table-diff" \ | ||
-H "Content-Type: application/json" \ | ||
--cert /path/to/client.crt \ | ||
--key /path/to/client.key \ | ||
-d '{ | ||
"cluster_name": "my_cluster", | ||
"table_name": "public.users", | ||
"output": "html" | ||
}' | ||
``` | ||
|
||
**Example Response:** | ||
```json | ||
{ | ||
"task_id": "td_20240315_123456", | ||
"submitted_at": "2024-03-15T12:34:56.789Z" | ||
} | ||
``` | ||
|
||
### The table-repair API | ||
|
||
Initiates a table repair operation. | ||
|
||
**Endpoint:** `GET /ace/table-repair` | ||
|
||
**Request Body:** | ||
```json | ||
{ | ||
"cluster_name": "my_cluster", // required | ||
"diff_file": "/path/to/diff.json", // required | ||
"source_of_truth": "primary", // required unless fix_nulls is true | ||
"table_name": "public.users", // required | ||
"dbname": "mydb", // optional | ||
"dry_run": false, // optional, default: false | ||
"quiet": false, // optional, default: false | ||
"generate_report": false, // optional, default: false | ||
"upsert_only": false, // optional, default: false | ||
"insert_only": false, // optional, default: false | ||
"bidirectional": false, // optional, default: false | ||
"fix_nulls": false, // optional, default: false | ||
"fire_triggers": false // optional, default: false | ||
} | ||
``` | ||
|
||
**Parameters:** | ||
- `cluster_name` (required): Name of the cluster | ||
- `diff_file` (required): Path to the diff file | ||
- `source_of_truth` (required): Source node for repairs | ||
- `table_name` (required): Fully qualified table name | ||
- `dbname` (optional): Database name | ||
- `dry_run` (optional): Simulate repairs (default: false) | ||
- `quiet` (optional): Suppress output (default: false) | ||
- `generate_report` (optional): Create detailed report (default: false) | ||
- `upsert_only` (optional): Skip deletions (default: false) | ||
|
||
**Example Request:** | ||
```bash | ||
curl -X POST "http://localhost:5000/ace/table-repair" \ | ||
-H "Content-Type: application/json" \ | ||
--cert /path/to/client.crt \ | ||
--key /path/to/client.key \ | ||
-d '{ | ||
"cluster_name": "my_cluster", | ||
"diff_file": "/path/to/diff.json", | ||
"source_of_truth": "primary", | ||
"table_name": "public.users" | ||
}' | ||
``` | ||
|
||
**Example Response:** | ||
```json | ||
{ | ||
"task_id": "tr_20240315_123456", | ||
"submitted_at": "2024-03-15T12:34:56.789Z" | ||
} | ||
``` | ||
|
||
### The table-rerun API | ||
|
||
Reruns a previous table diff operation. | ||
|
||
**Endpoint:** `POST /ace/table-rerun` | ||
|
||
**Request Body:** | ||
```json | ||
{ | ||
"cluster_name": "my_cluster", // required | ||
"diff_file": "/path/to/diff.json", // required | ||
"table_name": "public.users", // required | ||
"dbname": "mydb", // optional | ||
"quiet": false, // optional, default: false | ||
"behavior": "multiprocessing" // optional, default: "multiprocessing" | ||
} | ||
``` | ||
|
||
**Parameters:** | ||
- `cluster_name` (required): Name of the cluster | ||
- `diff_file` (required): Path to the previous diff file | ||
- `table_name` (required): Fully qualified table name | ||
- `dbname` (optional): Database name | ||
- `quiet` (optional): Suppress output (default: false) | ||
- `behavior` (optional): Processing behavior ["multiprocessing", "hostdb"] | ||
|
||
**Example Request:** | ||
```bash | ||
curl -X POST "http://localhost:5000/ace/table-rerun" \ | ||
-H "Content-Type: application/json" \ | ||
--cert /path/to/client.crt \ | ||
--key /path/to/client.key \ | ||
-d '{ | ||
"cluster_name": "my_cluster", | ||
"diff_file": "/path/to/diff.json", | ||
"table_name": "public.users" | ||
}' | ||
``` | ||
|
||
**Example Response:** | ||
```json | ||
{ | ||
"task_id": "tr_20240315_123456", | ||
"submitted_at": "2024-03-15T12:34:56.789Z" | ||
} | ||
``` | ||
|
||
### The task-status API | ||
|
||
Retrieves the status of a submitted task. | ||
|
||
**Endpoint:** `GET /ace/task-status/<task_id>` | ||
|
||
**Parameters:** | ||
- `task_id` (required): The ID of the task to check | ||
|
||
**Example Request:** | ||
```bash | ||
curl "http://localhost:5000/ace/task-status?task_id=td_20240315_123456" \ | ||
--cert /path/to/client.crt \ | ||
--key /path/to/client.key | ||
``` | ||
|
||
**Example Response:** | ||
```json | ||
{ | ||
"task_id": "td_20240315_123456", | ||
"task_type": "table-diff", | ||
"status": "COMPLETED", | ||
"started_at": "2024-03-15T12:34:56.789Z", | ||
"finished_at": "2024-03-15T12:35:01.234Z", | ||
"time_taken": 4.445, | ||
"result": { | ||
"diff_file": "/path/to/output.json", | ||
"total_rows": 10000, | ||
"mismatched_rows": 5 | ||
"summary": { | ||
// Additional task-specific details | ||
} | ||
} | ||
} | ||
``` | ||
|
||
### Spock Exception Update API | ||
|
||
Updates the status of a Spock exception. | ||
|
||
**Endpoint:** `POST /ace/update-spock-exception` | ||
|
||
**Request Body:** | ||
```json | ||
{ | ||
"cluster_name": "my_cluster", // required | ||
"node_name": "node1", // required | ||
"dbname": "mydb", // optional | ||
"exception_details": { // required | ||
"remote_origin": "origin_oid", // required | ||
"remote_commit_ts": "2024-03-15T12:34:56Z", // required | ||
"remote_xid": "123456", // required | ||
"command_counter": 1, // optional | ||
"status": "RESOLVED", // required | ||
"resolution_details": { // optional | ||
"details": "Issue fixed" | ||
} | ||
} | ||
} | ||
``` | ||
|
||
**Parameters:** | ||
- `cluster_name` (required): Name of the cluster | ||
- `node_name` (required): The name of the node | ||
- `dbname` (optional): The name of the database | ||
- `exception_details` (required) | ||
- `remote_origin` (optional): The OID of the origin | ||
- `remote_commit_ts` (optional): The timestamp of the exception | ||
- `remote_xid` (optional): The XID of the transaction | ||
- `command_counter` (optional): The number of commands executed | ||
- `status` (optional): The current state of the exception | ||
- `resolution_details` | ||
- `details` (optional): Details about the exception | ||
|
||
**Example Request:** | ||
```bash | ||
curl -X POST "http://localhost:5000/ace/update-spock-exception" \ | ||
-H "Content-Type: application/json" \ | ||
--cert /path/to/client.crt \ | ||
--key /path/to/client.key \ | ||
-d '{ | ||
"cluster_name": "my_cluster", | ||
"node_name": "node1", | ||
"exception_details": { | ||
"remote_origin": "origin1", | ||
"remote_commit_ts": "2024-03-15T12:34:56Z", | ||
"remote_xid": "123456", | ||
"status": "RESOLVED" | ||
} | ||
}' | ||
``` | ||
|
||
**Example Response:** | ||
```json | ||
{ | ||
"message": "Exception status updated successfully" | ||
} | ||
``` | ||
|
||
## API Error Responses | ||
|
||
ACE API endpoints return error responses in the following format: | ||
|
||
```json | ||
{ | ||
"error": "Description of what went wrong" | ||
} | ||
``` | ||
|
||
Common HTTP status codes: | ||
- 200: Success | ||
- 400: Bad Request (missing or invalid parameters) | ||
- 401: Unauthorized (missing or invalid client certificate) | ||
- 415: Unsupported Media Type (request body is not JSON) | ||
- 500: Internal Server Error | ||
|
||
|
||
|
||
|
Oops, something went wrong.