Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

create dms module #6625

Open
wants to merge 16 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
16 changes: 16 additions & 0 deletions .markdownlint.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
{
"MD033": {
"allowed_elements": [
"a",
"br",
"p",
"pre",
"table",
"tbody",
"td",
"th",
"thead",
"tr"
]
}
}
38 changes: 38 additions & 0 deletions terraform/aws/modules/data-engineering/dms/.terraform-docs.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
# .terraform-docs.yaml
formatter: markdown table
sections:
hide-all: true
show:
- providers
- inputs
- outputs
- resources

output:
file: README.md
mode: replace
template: |-
<!-- BEGIN_TF_DOCS -->
{{ .Content }}
<!-- END_TF_DOCS -->
{{- printf "\n" -}}

content: |-
# RDS Export Terraform Module

## Example

```hcl
{{ include "examples/example-readme/main.tf" }}
```

## Note

Update the mappings.json to specify the mappings for the DMS task.
This will be used to select the tables to be migrated.

{{ .Inputs }}

{{ .Outputs }}

{{ .Resources }}
105 changes: 105 additions & 0 deletions terraform/aws/modules/data-engineering/dms/DB_SETUP.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,105 @@
# Database Setup

Follow the instructions below to setup the database for the DMS pipeline.
[AWS Documentation for Oracle Database Setup](https://docs.aws.amazon.com/dms/latest/userguide/CHAP_Source.Oracle.html)

## Steps

- Create a user for DMS.
Grant the necessary permissions and enable the required options.
(Change the password to a strong password)

```SQL
CREATE USER DMS IDENTIFIED BY "StrongPassword123!";
GRANT CREATE SESSION TO DMS;
GRANT CONNECT, RESOURCE TO DMS;

ALTER USER DMS quota unlimited on USERS;

GRANT SELECT ANY TRANSACTION to DMS;
GRANT SELECT on DBA_TABLESPACES to DMS;
GRANT EXECUTE on rdsadmin.rdsadmin_util to DMS;
GRANT LOGMINING to DMS;

exec rdsadmin.rdsadmin_master_util.create_archivelog_dir;
exec rdsadmin.rdsadmin_master_util.create_onlinelog_dir;

GRANT READ ON DIRECTORY ONLINELOG_DIR TO DMS;
GRANT READ ON DIRECTORY ARCHIVELOG_DIR TO DMS;

exec rdsadmin.rdsadmin_util.grant_sys_object('ALL_VIEWS', 'DMS', 'SELECT');
exec rdsadmin.rdsadmin_util.grant_sys_object('ALL_TAB_PARTITIONS', 'DMS', 'SELECT');
exec rdsadmin.rdsadmin_util.grant_sys_object('ALL_INDEXES', 'DMS', 'SELECT');
exec rdsadmin.rdsadmin_util.grant_sys_object('ALL_OBJECTS', 'DMS', 'SELECT');
exec rdsadmin.rdsadmin_util.grant_sys_object('ALL_TABLES', 'DMS', 'SELECT');
exec rdsadmin.rdsadmin_util.grant_sys_object('ALL_USERS', 'DMS', 'SELECT');
exec rdsadmin.rdsadmin_util.grant_sys_object('ALL_CATALOG', 'DMS', 'SELECT');
exec rdsadmin.rdsadmin_util.grant_sys_object('ALL_CONSTRAINTS', 'DMS', 'SELECT');
exec rdsadmin.rdsadmin_util.grant_sys_object('ALL_CONS_COLUMNS', 'DMS', 'SELECT');
exec rdsadmin.rdsadmin_util.grant_sys_object('ALL_TAB_COLS', 'DMS', 'SELECT');
exec rdsadmin.rdsadmin_util.grant_sys_object('ALL_IND_COLUMNS', 'DMS', 'SELECT');
exec rdsadmin.rdsadmin_util.grant_sys_object('ALL_LOG_GROUPS', 'DMS', 'SELECT');
exec rdsadmin.rdsadmin_util.grant_sys_object('V_$ARCHIVED_LOG', 'DMS', 'SELECT');
exec rdsadmin.rdsadmin_util.grant_sys_object('V_$LOG', 'DMS', 'SELECT');
exec rdsadmin.rdsadmin_util.grant_sys_object('V_$LOGFILE', 'DMS', 'SELECT');
exec rdsadmin.rdsadmin_util.grant_sys_object('V_$DATABASE', 'DMS', 'SELECT');
exec rdsadmin.rdsadmin_util.grant_sys_object('V_$THREAD', 'DMS', 'SELECT');
exec rdsadmin.rdsadmin_util.grant_sys_object('V_$PARAMETER', 'DMS', 'SELECT');
exec rdsadmin.rdsadmin_util.grant_sys_object('V_$NLS_PARAMETERS', 'DMS', 'SELECT');
exec rdsadmin.rdsadmin_util.grant_sys_object('V_$TIMEZONE_NAMES', 'DMS', 'SELECT');
exec rdsadmin.rdsadmin_util.grant_sys_object('V_$TRANSACTION', 'DMS', 'SELECT');
exec rdsadmin.rdsadmin_util.grant_sys_object('V_$CONTAINERS', 'DMS', 'SELECT');
exec rdsadmin.rdsadmin_util.grant_sys_object('DBA_REGISTRY', 'DMS', 'SELECT');
exec rdsadmin.rdsadmin_util.grant_sys_object('OBJ$', 'DMS', 'SELECT');
exec rdsadmin.rdsadmin_util.grant_sys_object('ALL_ENCRYPTED_COLUMNS', 'DMS', 'SELECT');
exec rdsadmin.rdsadmin_util.grant_sys_object('V_$LOGMNR_LOGS', 'DMS', 'SELECT');
exec rdsadmin.rdsadmin_util.grant_sys_object('V_$LOGMNR_CONTENTS','DMS','SELECT');
exec rdsadmin.rdsadmin_util.grant_sys_object('DBMS_LOGMNR', 'DMS', 'EXECUTE');

-- (as of Oracle versions 12.1 and higher)
exec rdsadmin.rdsadmin_util.grant_sys_object('REGISTRY$SQLPATCH', 'DMS', 'SELECT');

-- (for Amazon RDS Active Dataguard Standby (ADG))
exec rdsadmin.rdsadmin_util.grant_sys_object('V_$STANDBY_LOG', 'DMS', 'SELECT');

-- (for transparent data encryption (TDE))

exec rdsadmin.rdsadmin_util.grant_sys_object('ENC$', 'DMS', 'SELECT');

-- (for validation with LOB columns)
exec rdsadmin.rdsadmin_util.grant_sys_object('DBMS_CRYPTO', 'DMS', 'EXECUTE');

-- (for binary reader)
exec rdsadmin.rdsadmin_util.grant_sys_object('DBA_DIRECTORIES','DMS','SELECT');

-- Required when the source database is Oracle Data guard
-- and Oracle Standby is used in the latest release of
-- DMS version 3.4.6, version 3.4.7, and higher.
exec rdsadmin.rdsadmin_util.grant_sys_object('V_$DATAGUARD_STATS', 'DMS', 'SELECT');

exec rdsadmin.rdsadmin_util.set_configuration('archivelog retention hours',24);
commit;

exec rdsadmin.rdsadmin_util.alter_supplemental_logging('ADD');
exec rdsadmin.rdsadmin_util.alter_supplemental_logging('ADD','PRIMARY KEY');
```

- Grant the DMS user permissions to read from the source tables.

```SQL
GRANT SELECT ON <SCHEMA_NAME>.<TABLE_NAME> TO DMS;
```

- Create an AWS Secret in Secrets Manager for the DMS user with the follwing details

```json
{
"username": "<DMS_USER>"
"password": "<DMS_PASSWORD>"
"port": "1521"
"host": "<DB_HOST>"
}
```

- Define the dms terraform block with the required input parameters
[Example in readme](./README.md) and apply the Terraform
106 changes: 106 additions & 0 deletions terraform/aws/modules/data-engineering/dms/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,106 @@
<!-- BEGIN_TF_DOCS -->
# RDS Export Terraform Module

## Example

```hcl
data "aws_availability_zones" "available" {}

locals {
name = "test-dms"
tags = {
business-unit = "HMPPS"
application = "Data Engineering"
environment-name = "sandbox"
is-production = "False"
owner = "DMET"
team-name = "DMET"
namespace = "dmet-test"
}
}

module "dms" {
source = "github.com/ministryofjustice/analytical-platform//terraform/aws/modules/data-engineering/dms?ref=66a7d870"

environment = local.tags.environment-name
vpc_id = module.vpc.vpc_id
db = aws_db_instance.dms_test.identifier

dms_replication_instance = {
replication_instance_id = aws_db_instance.dms_test.identifier
subnet_ids = module.vpc.private_subnets
subnet_group_name = local.name
allocated_storage = 20
availability_zone = data.aws_availability_zones.available.names[0]
engine_version = "3.5.4"
multi_az = false
replication_instance_class = "dms.t2.micro"
inbound_cidr = module.vpc.vpc_cidr_block
}

dms_source = {
engine_name = "oracle"
secrets_manager_arn = "arn:aws:secretsmanager:eu-west-1:123456789012:secret:dms-user-secret"
sid = aws_db_instance.dms_test.db_name
extra_connection_attributes = "addSupplementalLogging=N;useBfile=Y;useLogminerReader=N;"
cdc_start_time = "2025-01-29T11:00:00Z"
}

replication_task_id = {
full_load = "${aws_db_instance.dms_test.identifier}-full-load"
cdc = "${aws_db_instance.dms_test.identifier}-cdc"
}

dms_mapping_rules = file("${path.module}/mappings.json")
landing_bucket = aws_s3_bucket.landing.bucket
landing_bucket_folder = "${local.tags.team-name}/${aws_db_instance.dms_test.identifier}"

tags = local.tags
}
```

## Note

Update the mappings.json to specify the mappings for the DMS task.
This will be used to select the tables to be migrated.

## Inputs

| Name | Description | Type | Default | Required |
|------|-------------|------|---------|:--------:|
| <a name="input_db"></a> [db](#input\_db) | The database name | `string` | n/a | yes |
| <a name="input_dms_mapping_rules"></a> [dms\_mapping\_rules](#input\_dms\_mapping\_rules) | The path to the mapping rules file | `string` | n/a | yes |
| <a name="input_dms_replication_instance"></a> [dms\_replication\_instance](#input\_dms\_replication\_instance) | n/a | <pre>object({<br/> replication_instance_id = string<br/> subnet_group_id = optional(string)<br/> subnet_group_name = optional(string)<br/> subnet_ids = optional(list(string))<br/> allocated_storage = number<br/> availability_zone = string<br/> engine_version = string<br/> kms_key_arn = optional(string)<br/> multi_az = bool<br/> replication_instance_class = string<br/> inbound_cidr = string<br/> })</pre> | n/a | yes |
| <a name="input_dms_source"></a> [dms\_source](#input\_dms\_source) | extra\_connection\_attributes: Extra connection attributes to be used in the connection string</br><br/> cdc\_start\_time: The start time for the CDC task, this will need to be set to a date after the Oracle database setup has been complete (this is to ensure the logs are available) | <pre>object({<br/> engine_name = string,<br/> secrets_manager_arn = string,<br/> sid = string,<br/> extra_connection_attributes = optional(string)<br/> cdc_start_time = optional(string)<br/> })</pre> | n/a | yes |
| <a name="input_environment"></a> [environment](#input\_environment) | The environment name | `string` | n/a | yes |
| <a name="input_landing_bucket"></a> [landing\_bucket](#input\_landing\_bucket) | The S3 bucket name where the output data will be stored | `string` | n/a | yes |
| <a name="input_landing_bucket_folder"></a> [landing\_bucket\_folder](#input\_landing\_bucket\_folder) | The S3 bucket folder where the output data will be stored | `string` | n/a | yes |
| <a name="input_replication_task_id"></a> [replication\_task\_id](#input\_replication\_task\_id) | n/a | <pre>object({<br/> full_load = string<br/> cdc = string<br/> })</pre> | n/a | yes |
| <a name="input_s3_target_config"></a> [s3\_target\_config](#input\_s3\_target\_config) | n/a | <pre>object({<br/> add_column_name = bool<br/> max_batch_interval = number<br/> min_file_size = number<br/> timestamp_column_name = string<br/> })</pre> | <pre>{<br/> "add_column_name": true,<br/> "max_batch_interval": 3600,<br/> "min_file_size": 32000,<br/> "timestamp_column_name": "EXTRACTION_TIMESTAMP"<br/>}</pre> | no |
| <a name="input_tags"></a> [tags](#input\_tags) | n/a | `map(string)` | n/a | yes |
| <a name="input_vpc_id"></a> [vpc\_id](#input\_vpc\_id) | The VPC ID | `string` | n/a | yes |

## Outputs

| Name | Description |
|------|-------------|
| <a name="output_terraform_rules"></a> [terraform\_rules](#output\_terraform\_rules) | n/a |

## Resources

| Name | Type |
|------|------|
| [aws_dms_endpoint.source](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/dms_endpoint) | resource |
| [aws_dms_replication_instance.instance](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/dms_replication_instance) | resource |
| [aws_dms_replication_subnet_group.replication_subnet_group](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/dms_replication_subnet_group) | resource |
| [aws_dms_replication_task.cdc_replication_task](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/dms_replication_task) | resource |
| [aws_dms_replication_task.full_load_replication_task](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/dms_replication_task) | resource |
| [aws_dms_s3_endpoint.s3_target](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/dms_s3_endpoint) | resource |
| [aws_iam_role.dms](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/iam_role) | resource |
| [aws_iam_role.dms_source](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/iam_role) | resource |
| [aws_iam_role_policy.dms](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/iam_role_policy) | resource |
| [aws_iam_role_policy.dms_source](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/iam_role_policy) | resource |
| [aws_security_group.replication_instance](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/security_group) | resource |
| [aws_vpc_security_group_egress_rule.replication_instance_outbound](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/vpc_security_group_egress_rule) | resource |
| [aws_vpc_security_group_ingress_rule.replication_instance_inbound](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/vpc_security_group_ingress_rule) | resource |
<!-- END_TF_DOCS -->
Loading