astronomer · jwitz · Mar 29, 2023 · Mar 29, 2023 · jwitz · Mar 29, 2023
@@ -8,71 +8,87 @@ transfers made easy<br><br>
 
 [![CI](https://github.com/astronomer/apache-airflow-provider-transfers/actions/workflows/ci-uto.yaml/badge.svg)](https://github.com/astronomer/apache-airflow-provider-transfers)
 
-The **Universal Transfer Operator** simplifies how users transfer data from a source to a destination using [Apache Airflow](https://airflow.apache.org/). It offers a consistent agnostic interface, improving the users' experience so they do not need to use explicitly specific providers or operators.
+The **UniversalTransferOperator** simplifies how you transfer data from a source to a destination using [Apache Airflow](https://airflow.apache.org/). Its agnostic interface eliminates the need to use specific providers or operators.
 
-At the moment, it supports transferring data between [file locations](https://github.com/astronomer/apache-airflow-provider-transfers/blob/main/src/universal_transfer_operator/constants.py#L26-L32) and [databases](https://github.com/astronomer/apache-airflow-provider-transfers/blob/main/src/universal_transfer_operator/constants.py#L72-L74) (in both directions) and cross-database transfers.
+At the moment, it supports transferring data between [file locations](https://github.com/astronomer/apache-airflow-provider-transfers/blob/main/src/universal_transfer_operator/constants.py#L26-L32) and [databases](https://github.com/astronomer/apache-airflow-provider-transfers/blob/main/src/universal_transfer_operator/constants.py#L72-L74), as well as cross-database transfers.
 
 This project is maintained by [Astronomer](https://astronomer.io).
 
 ## Installation
 
-```
+```sh
 pip install apache-airflow-provider-transfers
 ```
 
-
 ## Example DAGs
 
-Checkout the [example_dags](./example_dags) folder for examples of how the UniversalTransfeOperator can be used.
+See the [example_dags](./example_dags) folder for examples of how you can use the UniversalTransferOperator.
 
-
-## How Universal Transfer Operator Works
+## How the UniversalTransferOperator works
 
 ![Approach](./docs/images/approach.png)
 
-With Universal Transfer Operator, users can perform data transfers using the following transfer modes:
+The purpose of the UniversalTransferOperator is to move data from a source dataset to a destination dataset. Your datasets can be defined as `Files` or as `Tables`. 
+
+Instead of using different operators for each of your transfers, the UniversalTransferOperator supports three universal transfer types:
 
-1. Non-native
-2. Native
-3. Third-party
+- Non-native transfers
+- Native transfers
+- Third-party transfers
 
+### Non-native transfers
 
-### Non-native transfer
+In a non-native transfer, you transfer data from a source to a destination through Airflow workers. Chunking is applied where possible. This method can be suitable for datasets smaller than 2GB in size. However, the performance of this method is dependent upon the worker's memory, disk, processor, and network configuration.
 
-Non-native transfers rely on transferring the data through the Airflow worker node. Chunking is applied where possible. This method can be suitable for datasets smaller than 2GB, depending on the source and target. The performance of this method is highly dependent upon the worker's memory, disk, processor and network configuration.
+To use this type of transfer, you provide the UniversalTransferOperator with:
 
-Internally, the steps involved are:
-- Retrieve the dataset data in chunks from dataset storage to the worker node.
-- Send data to the cloud dataset from the worker node.
+- A `task_id`.
+- A `source_dataset`, defined as a `File` or `Table`.
+- A `destination_dataset`, defined as a `File` or `Table`.
+
+When you initiate the transfer, the following happens in Airflow:
+
+- The worker retrieves the dataset in chunks from the data source.
+- The worker sends data to the destination dataset.
 
 Following is an example of non-native transfers between Google cloud storage and Sqlite:
 
 https://github.com/astronomer/apache-airflow-provider-transfers/blob/main/example_dags/example_universal_transfer_operator.py#L37-L41
 
+### Native transfers
 
-### Improving bottlenecks by using native transfer
+In a native transfer, Airflow relies on the mechanisms and tools offered by your data source and destination to facilitate the transfer. For example, when you use a native transfer to transfer data from object storage to a Snowflake database, Airflow calls on Snowflake to run the ``COPY INTO`` command. Another example is that when loading data from S3 to BigQuery, the UniversalTransferOperator calls on the GCP Storage Transfer Service to facilitate the data transfer.
 
-An alternative to using the Non-native transfer method is the native method. The native transfers rely on mechanisms and tools offered by the data source or data target providers. In the case of moving from object storage to a Snowflake database, for instance, a native transfer consists in using the built-in ``COPY INTO`` command. When loading data from S3 to BigQuery, the Universal Transfer Operator uses the GCP  Storage Transfer Service.
+The benefit of native transfers is that they can perform better for larger datasets (2 GB) and don't rely on the Airflow worker node hardware configuration. Airflow worker nodes are used only as orchestrators and don't perform any data operations. The speed depends exclusively on the service being used and the bandwidth between the source and destination.
 
-The benefit of native transfers is that they will likely perform better for larger datasets (2 GB) and do not rely on the Airflow worker node hardware configuration. With this approach, the Airflow worker nodes are used as orchestrators and do not perform the transfer. The speed depends exclusively on the service being used and the bandwidth between the source and destination.
+When you initiate the transfer, the following happens in Airflow:
 
-Steps:
-- Request destination dataset to ingest data from the source dataset.
-- Destination dataset requests source dataset for data.
+- The worker calls on the destination dataset to ingest data from the source dataset.
+- The destination dataset runs the necessary steps to request and ingest data from the source dataset.
 
-> **_NOTE:_**
- The Native method implementation is in progress and will be available in future releases.
+> **Note**
+> The Native method implementation is in progress and will be available in future releases.
 
+### Third-party transfers
 
-### Transfer using a third-party tool
-The Universal Transfer Operator can also offer an interface to generic third-party services that transfer data, similar to Fivetran.
+In a third-party transfer, the UniversalTransferOperator calls on a third-party service to facilitate your data transfer, such as Fivetran.
 
-Here is an example of how to use Fivetran for transfers:
+To complete a third-party transfer, you provide the UniversalTransferOperator with:
 
-https://github.com/astronomer/apache-airflow-provider-transfers/blob/main/example_dags/example_dag_fivetran.py#L52-L58
+- A source dataset, defined as a `Table` or `File`.
+- A destination dataset, defined as a `Table` or `File`.
+- The parameter `transfer_mode=TransferMode.THIRDPARTY`.
+- `transfer_params` for the third-party tool.
+
+When you initiate the transfer, the following happens in Airflow:
 
+- The worker calls on the third-party tool to facilitate the data transfer.
 
+Currently, Fivetran is the only suppported third-party tool. See [`fivetran.py`](https://github.com/astronomer/apache-airflow-provider-transfers/blob/main/src/universal_transfer_operator/integrations/fivetran.py) for a complete list of parameters that you can set to determine how Fivetran completes the transfer.
 
+Here is an example of how to use Fivetran for transfers:
+
+https://github.com/astronomer/apache-airflow-provider-transfers/blob/main/example_dags/example_dag_fivetran.py#L52-L58
 
 ## Supported technologies
 
@@ -84,7 +100,6 @@ https://github.com/astronomer/apache-airflow-provider-transfers/blob/main/exampl
 
     https://github.com/astronomer/apache-airflow-provider-transfers/blob/main/src/universal_transfer_operator/constants.py#L26-L32
 
-
 ## Documentation
 
 The documentation is a work in progress -- we aim to follow the [Diátaxis](https://diataxis.fr/) system.
@@ -93,7 +108,6 @@ The documentation is a work in progress -- we aim to follow the [Diátaxis](http
 
 - **[Getting Started Tutorial](https://apache-airflow-provider-transfers.readthedocs.io/en/latest/getting-started/GETTING_STARTED.html)**: A hands-on introduction to the Universal Transfer Operator
 
-
 ## Changelog
 
 The **Universal Transfer Operator** follows semantic versioning for releases. Check the [changelog](/docs/CHANGELOG.md) for the latest changes.