Autor: Adam Leszczyński <[email protected]>, version: 1.6.1, date: 2024-07-16 |
This document describes configuration parameters and usage of OpenLogReplicator.
OpenLogReplicator program is non-interactive. The only parameters accepted are:
-
-f|--file <config file>
— configuration file name (default:"OpenLogReplicator.json"
), -
-p <process name>
— process name (default:"OpenLogReplicator"
) displayed in the process list; useful when multiple instances are running, -
-r|--root
— disable root check on startup (default: root check is enabled); allow running the program as root, which is not recommended and should be used only in special cases, -
-v|--version
— print version and exit.
All parameters are defined in OpenLogReplicator.json
config file which should be placed in the same directory.
The file should be in JSON format.
For start please check example config files in scripts
folder.
Refer to a full parameters list for more details.
All output messages are sent to stderr stream. Optionally, JSON output can be sent to stdout stream when test parameter is set to a no zero value.
The only language of the documentation and error messages and used in program is English.
In some interval, the program writes a checkpoint file which contains information about the last processed transaction (sent to Kafka output).
OpenLogReplicator should have read, write and execute permissions for the checkpoint
directory.
It creates or deletes files like <database>-chkpt.json
and <database>-chkpt-<scn>.json
files.
<database>
is the database name defined in OpenLogReplicator.json
file and <scn>
is some database SCN number.
The file is in JSON format. The file should contain a single object with the following parameters:
Parameter | Specification | Notes |
---|---|---|
|
list of source elements, mandatory |
The list should contain just one source element. |
|
list of target elements, mandatory |
The list should contain just one target element. |
|
string, max length: 256, mandatory |
The value must be equal to TIP: This is a safe-checker to make sure to check the content of the JSON configuration file during program upgrade. During upgrade, always check the documentation for parameter changes and verify that the JSON configuration file is correct. |
|
string, max length: 256, default: |
The location where the NOTE: This parameter is only valid when |
|
number, min: 0, max: 1, default: 0 |
Print hex dump of vector data for all dumped OP codes. Possible values are:
NOTE: This parameter is only valid when |
|
number, min: 0, max: 2, default: 0 |
Create output similar to Possible values are:
CAUTION: The result doesn’t have to fully match the results of |
|
number, min: 0, max: 4, default: 2 |
Messages verbose level. All messages are sent to stderr output stream. Possible values are:
|
|
number, min: 0, max: 524287, default: 0 |
Print debug information. The value is a sum of various trace parameters, please refer to source code for details. CAUTION: The codes can change without prior notice. |
Parameter | Specification | Notes |
---|---|---|
|
string, max length: 256, mandatory |
The name of the source -– referenced later in a target element. TIP: This is just a logical name used in the config file. It doesn’t have to match the actual database SID. |
|
element of format, mandatory |
Configuration of output data. |
|
string, max length: 256, mandatory |
This name is used for identifying database connection. This name is mentioned in the output and in the checkpoint files. WARNING: After starting replication, the value shouldn’t change, otherwise the checkpoint files would not be properly read. TIP: This is just a logical name used in the config file. It doesn’t have to match the actual database SID. |
|
element of reader, mandatory |
Configuration of redo log reader. |
|
string, max length: 256, default is |
Way of getting an archive redo log file list. Possible values are:
TIP: This parameter is only valid for |
|
number, default: 10000000 |
Time to sleep between two attempts to read an archived redo log list. Number in microseconds. |
|
number, max: 1000000000, default: 10 |
Number of retries to read an archived redo log list before failing. |
|
element of debug |
Group of options used for debugging. |
|
element of filter |
Group of options used to filter the contents of the database and define which tables are replicated. CAUTION: The filter is applied only to the data, not to the DDL operations. IMPORTANT: During the first run, the schema is read only for tables which are selected by the filter. If the filter is changed, the schema would not update. Startup would fail because the set of users present in checkpoint files would not match the set of users defined in config file. The schema would update only when the program is reset, (i.e., the checkpoint files are removed and forced recreation). |
|
element of metrics |
Group of options used for collecting metrics of OpenLogReplicator. |
number, min: 0, max: 524287, default: 0 |
A sum of various flags. Flags define various options for the program. Possible values are:
CAUTION: This option would cause a delay of data replication. When the redo log files are big or the operation of switching redo log groups is done, infrequent delay can occur. Transactions would not be read until the redo log group is switched.
NOTE: Refer to details in the User Manual for details.
NOTE: Refer to details in the User Manual for details.
TIP: Direct IO bypasses the disk caching mechanism. Using this option is not recommended and should be used only in special cases.
CAUTION: This option is not recommended. It is useful only for debugging. For most cases when the program fails, it is better to stop the program and fix the problem. The program is not designed to continue after error as this can lead to schema data inconsistency and nondeterministic data can be sent to output.
TIP: Incomplete transactions are transactions that have started before replication was set up. Some starting elements of such transactions may be missing in the output. By default, such transactions are ignored.
TIP: The checkpoint records are useful to monitor the progress of replication. They’re also used to detect the last processed transaction. If the checkpoint records are hidden and there is low activity of data changes, it may be challenging to detect OpenLogReplicator failure.
TIP: The number of checkpoint files left is defined by parameter
|
|
|
element of memory |
Configuration of memory settings. |
|
number, min: 0, default: 50000 |
The amount of time the program would sleep when all data from online redo log is and the program is waiting for more transactions. Number in microseconds. IMPORTANT: The default setting is 50.000 microseconds meaning which is equal to 1/20 s or 50 ms. This means that 20 times a second OpenLogReplicator polls disk for new changes on disk (until there is no activity — after new data appears, it is read sequentially to the end). With default setting, in the worst case, the read process would notice after 50 ms that new data is ready. This is actually rapid and a proper setting for most cases. If this delay is potentially too big — the value can be decreased, but this would increase CPU usage. |
|
number, min: 0, default: 0 |
When this parameter is set to non-zero value, the redo log file data is read second time for verification after defined delay. Double read mode applies only to online redo log files. Number in microseconds. IMPORTANT: Some filesystems (like ext_4 or btrfs) can share disk read cache between multiple processes. This can cause problems when the redo log files are read by multiple processes. This can cause read inconsistencies when the database process is writing to the same memory buffer as the OpenLogReplicator process is reading. The checksum for disk blocks is just two bytes, so it is impossible to detect if the data is corrupted or not. The only way to detect this is to read the data again and compare the data. This parameter defines time delay after which the redo log file data is read second time for verification. CAUTION: Instead of double read, it is recommended to use Direct IO disk operations instead. This option disables disk read cache and guarantees that the data is read from disk. Use this option just as a workaround in case when Direct IO is not possible. |
|
number, min: 0, default: 10000000 |
During online redo log reading, a new redo log group could be created, and the program would need to refresh the list of redo log groups. In case there is a situation when old redo log file has been completely processed, but still no new group is created, the program would need to refresh the list of redo log groups. Number in microseconds. |
Parameter | Specification | Notes |
---|---|---|
|
number, min: 16, default: 1024 |
The maximum amount of memory the program can allocate. Number in megabytes. IMPORTANT: This number doesn’t include memory allocated for sending big JSON messages to Kafka – this memory is not included here and is allocated on demand separately. It does also not include memory used for LOB processing. |
|
number, min: 16, max: |
Amount of memory allocated at startup and desired amount of allocated memory during work.
If memory is dynamically allocated in greater amount, it will be released as soon as it is not required any more.
See notes for Number in megabytes. |
|
number, min: 1, max: |
Size of memory buffer used for disk read. Number in megabytes. IMPORTANT: Greater buffer size increases performance, but also increases memory usage.
Disk buffer memory is part of the main memory (controlled by |
Parameter | Specification | Notes |
---|---|---|
|
string, max length: 256, default |
Possible values are:
Example config file:
Example config file:
Example config file: IMPORTANT: Batch mode is intended to be used only for testing and troubleshooting purpopses.
Using this mode for continues replication might lead to errors.
It is not guaranteed that after the batch completion, the checkpoint file contains a proper schema which could be used for further processing.
Using the checkpoint files which are created at the end of batch processing for CAUTION: Providing one by one redo log files and running the program in batch mode is different from running |
|
signed number, min: -32768, max: 32767, default: -1 |
Define container ID for the database. This is used for multi-tenant databases. TIP: `-1' is the default value and means that the database is single-tenant. |
|
string, default: database DBTIMEZONE value |
Overwrites database DBTIMEZONE value. Timezone should be in format The time zone is used only as base timezone for values for TIMESTAMP WITH LOCAL TZ type. |
|
number, min: 0, max: 15, default: 0 |
A sum of numbers:
NOTE: This field is valid only for IMPORTANT: This might increase performance a bit, but it is not recommended to use this option.
NOTE: For performance reasons, user might disable those checks. They are recommended to be enabled in production environment, especially when during program upgrades, the field names change. Referring to old invalid field names might cause the program to fail. |
|
string, default: time zone of OpenLogReplicator host |
Time zone used by the host where the database is running. Timezone should be in format If OpenLogReplicator is running on a host with a different time zone, adjust this parameter to the proper value. |
|
string, max length: 4000 |
Format of expected archived redo log files. This parameter defines how to parse the redo log file name to read the sequence number. When FRA is configured the format of files is expected to be |
|
string, default: time zone of OpenLogReplicator host |
Time zone used for logging messagees. Timezone should be in format By default, log messages are printed in the local time zone of the host where OpenLogReplicator is being run. To print messages with log in the UTC timezone, set the value to '+00:00'. Used log timezone is printed on startup. IMPORTANT: The value of this parameter can be configured by setting the environment variable |
|
string, max length: 128 |
Password for connecting to database instance. NOTE: This field is valid only for CAUTION: The password is stored in unencrypted string in the configuration file. |
|
list of string pairs, max length: 2048 |
List of pairs of files NOTE: This field is valid only for TIP: The parameter is useful when OpenLogReplicator operates on a different host than the database server is running and the paths differ.
For example, the path may be: |
|
string, max length: 2048 |
Debugging parameter which allows to copy all contents of processed redo log files to defined folder. TIP: This parameter is useful for diagnosing disk-read related problems.
When consistency errors are detected, the redo log file is copied to the defined folder.
The file name is in format: |
|
list of string, max length: 2048 |
List of redo logs files which should be processed in batch mode. Elements could be files but also folders. In the second case, all files in this folder would be processed. NOTE: This field is valid only for Example config file: |
string, max length: 4096 |
Connect string for connecting to the database instance.
Format should be in form like: NOTE: This field is valid only for |
|
|
number, min: 0 |
The first SCN number to be processed. If not specified, the program will start from the current SCN. CAUTION: Setting a very low value of starting SCN might cause problems during program startup if the schema has changed since this SCN and the schema is not available to read using database flashback. In such a case, the program will not be able to read the metadata and will stop. IMPORTANT: Setting this parameter to some value would mean that transactions started before this SCN would not be processed. |
|
number, min: 0 |
First sequence number to be processed. IMPORTANT: If not specified, the first sequence would be determined by reading SCN boundaries assigned to particular redo log files and matched to starting SCN. |
|
number, min: 0 |
Determine starting SCN by relative time.
The value and is relative to the current time using Number in seconds. NOTE: This field is valid only for CAUTION: It is invalid to use this parameter when |
|
string, max length: 256 |
Determine a starting SCN value by absolute time.
The value is in format NOTE: This field is valid only for CAUTION: It is invalid to use this parameter when |
|
element of state |
Configuration of state settings to store checkpoint information. |
|
string, max length: 128 |
Database user for connecting to database instance. NOTE: This field is valid only for |
|
number, min: 0, default: 0 |
An upper limit for transaction size. If the transaction size is greater than this value, the transaction is split into multiple transactions. Number in megabytes. CAUTION: The intention of this parameter is for debugging purposes only. It is not recommended to use it in production environment. The transaction splitting is intended to limit memory usage and assumes that the transaction is committed while splitting is performed. If the transaction is not committed, the first part of the transaction would be sent to output anyway. If the transaction contains a large number of partially rolled back DML operations, they might appear in output in spite of the rollback. |
Parameter | Specification | Notes |
---|---|---|
|
number, min: 0, default: 500 |
Threshold of processed redo log data after which checkpoint file is created. Number in megabytes. |
|
number, min: 0, default: 600 |
Threshold of processed redo log data time after which checkpoint file is created. Number in seconds. IMPORTANT: The time refers not to processing time by OpenLogReplicator but to time of the redo log data. For example, the default setting of 600 seconds means that if the last checkpoint was created after processing redo log data created at 10:40 when the processing reaches data created at 10:50 new checkpoint file is created. |
|
number, min: 0, default: 100 |
Number of checkpoint files which should be kept. The oldest checkpoint files are deleted. TIP: Value TIP: Keeping a larger number of checkpoint files allows adjusting starting SCN more precisely. It provides more security in case of filesystem corruption and the last checkpoint file not being available. CAUTION: The number of checkpoint files may be actually larger than this parameter (exactly up to |
|
string, max length: 2048, default: |
The path to store checkpoint files. NOTE: This field is valid only for IMPORTANT: The path should be accessible for writing by the user which runs the program. |
|
_number_m min: 0, default: 20 |
To increase operating speed, not all checkpoint files would contain the full schema of the database. In case the schema didn’t change, it is not necessary to repeat the schema in every checkpoint file. The value determines the consecutive number of checkpoint files which may not contain the full schema. TIP: The value of |
|
string, max length: 256, default: |
Only |
Parameter | Specification | Notes |
---|---|---|
|
number, min: 0, default: 0 |
For debug purposes only. Stop program after specified number of log switches. |
|
number, min: 0, default: 0 |
For debug purposes only. Stop program after specified number of LWN checkpoints. |
|
number, min: 0, default: 0 |
For debug purposes only. Stop program after specified number of transactions. |
|
string, max length: 128 |
Owner of the debug table. |
|
string, max length: 128 |
This is a technical parameter primary used only for running test cases and defines table name. If any DML transactions occur for this table (like insert, update or delete), the program would stop. The transaction doesn’t necessary need to be committed. |
Parameter | Specification | Notes |
---|---|---|
|
string, max length: 256, required |
Possible values are:
Refer to details in output format chapter for details. CAUTION: Protocol buffer support is in experimental state. It is not fully tested and might not work properly. Don’t use it for production without testing. |
number, min: 0, max: 7, default: 0 |
Transaction attributes location. Field value is a sum of:
|
|
number, min: 0, max: 3, default: 0 |
Format for (n)char, (n)varchar(2) and clob column types. By default, the value is written in Unicode format, using UTF-8 to code characters. Field value is a sum of:
|
|
numeric, min: 0, max: 2, default: 0 |
Column duplicate specification.
TIP: This is the format that takes less space. There is an assumption that if the column doesn’t appear in the INSERT of DELETE statement, it means that the value is NULL. CAUTION: For LOB columns the before value is not available in the REDO stream. Therefore, the column is not included in the output. Only after value is included.
CAUTION: It is technically not possible to differentiate if the column was actually mentioned by UPDATE DML command or not.
|
|
number, min: 0, max: 3, default: 0 |
Present database name in payload. Value is a sum of:
|
|
|
numeric, min: 0, default: 1048576 |
Number of bytes after which the output buffer is flushed. When set to |
number, min: 0, max: 10, default: 0 |
INTERVAL DAY TO SECONDS field format. Possible values are:
|
|
number, min: 0, max: 4, default: 0 |
INTERVAL YEAR TO MONTH field format. Possible values are:
*
|
|
number, min: 0, max: 31, default: 0 |
Message format specification. Value is a sum of:
TIP: By default, the transaction is split to many messages: begin, DML, DML, …, commit. Using this flag would cause to combine all messages into one. For performance reasons, this is not recommended when using Kafka when transactions could be in hundreds of megabytes in size.
For JSON only target, the following additional flags are available:
|
|
number, min: 0, max: 1, default: 0 |
Add Possible values are:
|
|
|
number, min: 0, max: 7, default: 0 |
Schema format sent to output. By default, the schema is not sent to output. Example output:
The field is a sum of values:
TIP: This optimization is based on the fact that it is meaningless to attach the same schema definition every time if it didn’t change. It is assumed that the client would cache the schema and would not request it again. If the schema changes, the first message where new schema is used would contain the full schema. Example output:
TIP: Remember to use flag
Example output:
|
number, min: 0, max: 3, default: 0 |
SCN field format. By default, every DML operation contains Possible values are:
|
|
number, min: 0, max: 1, default: 0 |
Include Possible values are:
|
|
number, min: 0, max: 15, default: 0 |
Format of timestamp values. In the following description, the following timestamp is used as an example:
NOTE: This format is also used for type |
|
number, min: 0, max: 4, default: 0 |
Format of timestamp with time zone values. In the following description, the following timestamp with time zone is used as an example: Possible values are:
|
|
number, min: 0, max: 1, default: 0 |
Include Possible values are:
|
|
|
number, min: 0, max: 1, default: 0 |
Unknown value reporting.
For unknown values Possible values are:
|
number, min: 0, max: 2, default: 0 |
Format of the Transaction ID (XID). Possible values are:
|
Parameter | Specification | Notes |
---|---|---|
|
list of a table element |
List of table regex rules which should be tracked in the redo log stream and sent to output. A table that matches at least one of the rules is tracked, thus the rules can overlap. Example:
|
list of string elements, max length: 32 |
List of transaction IDs which should be skipped.
The format if XID should be one of: Example:
|
|
|
list of string elements, max length: 32 |
Debug option to dump to stderr internals about certain XID. The format is the same as for skip-xid. |
Parameter | Specification | Notes |
---|---|---|
|
string, max length: 128, mandatory |
Name of the metrics module. Currently only |
|
string, max length: 128, mandatory for |
Network address used to bind the metrics module for Prometheus.
The format is Example:
|
|
string, max length: 128 |
Define tags for Possible values are:
|
Parameter | Specification | Notes |
---|---|---|
|
string, max length: 128, mandatory |
Regex pattern for matching owner name. The pattern is case-sensitive. |
|
string, max length: 128, mandatory |
Regex pattern for matching table name. The pattern is case-sensitive. |
|
string, max length: 4096 |
A string field with a list of columns which should be used as a primary key. The columns are separated by comma. The column names are case-sensitive. TIP: If a table doesn’t contain a primary key, a custom set of columns can be treated as a primary key. |
|
string, max length: 16384 |
An expression which should be evaluated for every row. The format of the field is C-like. Example:
The expression is evaluated from left to right. The following tokens can be used:
The expression can contain the following tokens, which has name derived from the attribute list of the transaction:
|
Parameter | Specification | Notes |
---|---|---|
|
string, max length: 256, mandatory |
A logical name of the target used in JSON file for referencing. |
|
string, max length: 256, mandatory |
A logical name of the source which this target should be connected with. |
|
element of a writer, mandatory |
Configuration of output processor. |
Parameter | Specification | Notes |
---|---|---|
|
string, max length: 256, mandatory |
Name of a Kafka topic used to send transactions as JSON messages. NOTE: This field is valid only for |
|
string, max length: 256, mandatory |
Possible values are:
Perform all actions like parsing redo log, producing messages, but messages are discarded and not sent to any target. TIP: This target is useful for testing purposes, to verify if redo log file parsing works correctly. This writer does not accept any parameters.
This mode assumes that OpenLogReplicator acts as a server. A client connects to the server and receives the messages. If the client disconnects, the server will wait for a new client to connect and buffer transactions while no client connection is present.
TIP: Technically this is the same as |
|
string, max length: 256, mandatory |
For network writer type: For zeromq writer type: NOTE: This field is valid only for |
|
number, min: 0, max: 1, default: 1 |
If define output file for transaction exists, append to it. If not, create a new file. NOTE: This field is valid only for CAUTION: Parameter |
|
number, min: 1, max: 953, default: 100 |
Maximum size of a message sent to Kafka. Number in megabytes. CAUTION: Memory for this buffer is allocated independently of memory defined as NOTE: This field is valid only for |
|
number, min: 0, default: 0 |
Maximum file size for output file.
The size can be defined only when NOTE: This field is valid only for |
|
number, min: 0, max: 2, default: 0 |
Put a new line after each transaction. Possible values are:
NOTE: This field is valid only for |
|
string, max length: 256 |
Format of output file.
The format is the same as for The following placeholders are supported:
NOTE: There should be only one placeholder in the format.
When using NOTE: This field is valid only for |
|
number, min: 100, max: 3600000000, default: 100000 |
Interval for polling for new messages. Number in microseconds. TIP: This parameter defines how often the client library checks for new messages. The smaller the value, the more often the client library checks for new messages. The larger the value, the more messages are buffered in the client library. NOTE: This field is valid only for |
|
map of string to string |
Additional properties for Kafka producer. Refer to librdkafka documentation for full list of parameters. Typically used parameters are:
This field allows also setting customer Kafka security related parameters like authentication, encryption, etc. CAUTION: You should not set the NOTE: This field is valid only for |
|
number, min: 1, max: 1000000, default: 65536 |
Size of message queue. TIP: This parameter defines how many messages can be sent to output. If the message offers a level of parallelism, messages can be sent in parallel. If the message transport doesn’t offer a level of parallelism, messages are sent one by one. The larger the value, the more messages can be sent in parallel. |
|
string, max length: 256, default: |
Format of timestamp (defined using placeholder NOTE: This field is valid only for |