Skip to content

Latest commit

 

History

History
198 lines (138 loc) · 11.2 KB

README.md

File metadata and controls

198 lines (138 loc) · 11.2 KB

db-migration-tool

This tool can be used to migrate the data in a given collection from a source database in one Fauna Region Group to another target database, which can be in an entirely different Region Group. It uses Fauna's temporality feature to replicate all the write events that occurred after a given timestamp in the source collection, into the same collection of a target database. This tool can be use in combination with Fauna's Database Copy feature to achieve a database migration. By itself, this tool does not copy over or migrate indexes or keys.

The general procedure for migrating from one Region Group to another is to create a backup snapshot of the source database, create a new database from that snapshot on the desired Region Group. Once the database copy is complete, use this tool to synchronize the writes that have occurred on the sources database since the snapshot was taken. More guidance regarding migration is included at the end of this README.

Pre-requisites

Installation

Install Node.js

Verify Node.js installation

$ node --version

Verify npm installation

$ npm --version

Clone the repository

$ cd to/my/directory
$ git clone https://github.com/fauna-labs/db-migration-tool.git

Install dependencies

$ cd to/my/directory/db-migration-tool
$ npm install

Prepare your databases

  1. Enable history_days on all collections that need to be migrated.
    1. Update each collection's document as required.

      [!IMPORTANT] To avoid gaps in data, be sure to set history_days to a period greater than the time expected to complete the migration. For example, if you plan to take 3 days from the time of snapshot to complete the migration, then set history_days to a value greater than 3.

  2. Schedule a snapshot of the source databases.
    1. Please refer to the Backups documentation for more information.

Using the tool

Running the script

Open a terminal, navigate to the project directory, and execute the main.js file with necessary options.

Example:

$ cd to/my/directory/db-migration-tool
$ node main.js --source $SOURCE_KEY --target $TARGET_KEY --collections $C1_NAME $C2_NAME $C3_NAME --timestamp $START_TIMESTAMP

CLI Options:

  -v, --version                  output the version number
  -s, --source <string>          admin secret for the source DB
  -t, --target <string>          admin secret for the target DB
  -d, --timestamp <number>       the timestamp from which to start syncing
  -c, --collections <string...>  [optional] the list of Collection names to be sync'ed. If not provided, all collections will be sync'ed.
  -i, --indexes <string...>      [optional] the list of Index names to be used with the respective Collections listed
  -p, --parallelism <number>     [optional] apply up to N events per transaction (default: 10)
  --validate <number>            [optional] paginate through documents N at a time (1000 max) and compare source to target; WARNING: this
                                 could take a long time and will accrue additional read ops
  --endpoint <string>            [optional] the endpoint to use for the source and target DBs (default: "https://db.fauna.com")
  -y, --yes                      [optional] skip confirmation prompts and run the migration
  -h, --help                     display help for command

Indexes

This tool will automatically create a new index in the source collection, that is necessary to properly sync data from the source collection. These autogenerated indexes vary on how long it takes for their build process to complete, which depends on the size of the collection. The index needs to be active before this tool can sync the collection data.

Indexes of the following shape will be created:

{
  name: "IndexName",
  source: Collection("CollectionName"),
  terms: [],
  values: [ { field: "ts" }, { field: "ref" } ]
}

If you would like to use an existing index, you can specify the index name with the -i, --index option. The index must be of the same shape as above.

Warning

This tool does not verify the index shape before using it. If you specify a custom index, it is your responsibility to ensure that the index is of the correct shape.

Converting between ISO time strings and Timestamps

A Fauna Timestamp, an integer representing the microseconds since the epoch, is needed to define the start of the synchronization operation (passed in with the -d, --timestamp option). You can use Fauna to convert between ISO time strings and Timestamps.

String to Timestamp

// FQL v4
ToMicros(Time("2023-11-03T00:00:00Z")) // 1698969600000000

Timestamp to String

// FQL v4
Epoch(1698969600000000, "microseconds") // Time("2023-11-03T00:00:00Z")

Configuring tool constants

Several constants are defined in main.js that can be adjusted to optimize the performance of the tool.

  • DURATION - (default: 30) Time span (in minutes) to gather events
    • Increase the DURATION constant to widen the window of events that are processed at once, or decrease it to avoid rate limiting.
  • ITERATIONS - (default: 20) Number of iterations to run the tool
    • The overall window of events that are processed is DURATION * ITERATIONS minutes.
  • WAIT_TIME - (default: 10) Wait time between iterations in seconds
    • Decrease the WAIT_TIME constant to take less time between iterations, or increase it to avoid rate limiting.
  • DEFAULT_PAGE_SIZE - (default: 64) The default Page size for retrieving documents from the custom index
    • If your documents are very large, you may want to decrease this value to limit the amount of data processed per transaction. Otherwise, you should not need to change this. The tool paginates through all events in the duration window. There is typically little benefit to increasing or decreasing the page size.

These constants are optimized for migration of databases with a large number of events in each 30 minute window. If you want to change performance, start with the DURATION constant and adjust the others as needed.

For example, if you are on the Startup plan, with a throughput limit of 1000 Writes per second, then you should pick a duration that would limit the number of events to under 1000.

Best Practices

  • To avoid gaps in synchronization, you should use a start timestamp less than the timestamp of the last synced write on the target collection.
  • To reduce the overall time to sync an entire database, run one instance of this tool for each collection, in parallel.

Process for Migration

1. Copy your database to the desired Region Group

  1. Check that all prerequisites listed earlier in this README have been met.
  2. Create a database copy (target database) in the target RG from the latest available snapshot of the source database.

2. Synchronization

Note

If you paused writes to your database before the snapshot time, and intend to keep it paused for the duration of the migration, this script is not needed to because no data needs to be synchronized. You can skip to Application cutover.

The script is idempotent, so it can safely be run multiple times. We recommend running the script at least once before pausing writes for your application to cutover. This gives you the chance to monitor the time it takes to perform the sync operation and plan ahead.

Monitor the time it takes to perform this update, as this is the theoretical minimum downtime that can be achieved during cutover. You can run this update on a regular basis to get a typical baseline of the time needed to sync the database with the latest writes since the tool was last run.

  1. Generate admin keys for the source database and target database.
  2. Run the script, providing the required authentication keys, the name of the collection to be synchronized, and the timestamp from which write history is to be applied to the target database.
    1. The first time you run the script, the timestamp should be a time shortly before the snapshot was taken. This will sync all the writes that occurred after the snapshot was taken.
    2. Subsequent executions of the script can use the timestamp of the last successful execution. This will sync all the writes that occurred since the last successful execution.
  3. Wait for the required indexes to become active. The script will stop if the index is not active and prompt you to try again later. Once the indexes are active, the script will proceed to sync the collection data.
  4. Repeat the operation for all collections in all databases that need to be migrated.

3. Application cutover

Important

It is recommended that this be scheduled for a window when the downtime least impacts your application workload.

Application cutover is the action of transitioning your application from using the source database and reconnecting it to the target database. The strategy for your cutover involves changing the access keys your application uses to connect to Fauna. Application cutover occurs when you have replaced the keys which connect your application to the source database with keys that connect to the target database.

  1. Disable writes from the application.
  2. Confirm no new writes are occurring.
  3. Run a final execution of this tool to sync the latest writes to the target database.
  4. Update access keys in the application with keys pointing to the target database.
  5. Reenable writes in your application.
  6. Confirm that writes are being applied to the target database.

Troubleshooting

"What if I receive a 429 status code?":

429 errors are returned from Fauna when you encounter throughput limits (i.e. your account is being rate-limited). To adjust for this:

  • Reduce parallelism using the --parallelism option.
  • Tweak the following constant values in main.js to reduce the number of events processed per each interval

Limitations

  • Documents with over 100,000 events will only have the first set of 100,000 events copied over.
  • Any new schema documents (collections, indexes) created after the snapshot was copied will not be migrated. Usage of this tool is not recommended while schema documents are modified.
  • Creates, updates, and deletes applied after the snapshot was taken will be copied in order by this script but using the current time. In other words, the ts field's value is not preserved.
  • Usage of history manipulation is incompatible with this script. Because this script only looks at events in time order going forward, it will miss events manipulated in the past.

Testing

The test suite requires a running instance of the Fauna Dev docker image. See Fauna documentation for installation and setup instructions.

https://docs.fauna.com/fauna/current/tools/dev

Once the Fauna Dev docker image is running, you can run the test suite with the following command:

npm run test:migration