Skip to content
This repository has been archived by the owner on Apr 11, 2024. It is now read-only.

Split the Table to multiple Files #25

Open
10 tasks
sunank200 opened this issue Mar 27, 2023 · 0 comments
Open
10 tasks

Split the Table to multiple Files #25

sunank200 opened this issue Mar 27, 2023 · 0 comments

Comments

@sunank200
Copy link
Collaborator

Please describe the feature you'd like to see

We can also have multiple file — good to have not a must-have feature
1. we can also have multiple files, if the table is having data in GBs.
2. what should be naming scheme: test.csv —> test_1.csv, test_2.csv
3. We can assume a default file_size_threshold, once reached we can split data into multiple file.

    ```
    transfer_non_native_bigquery_to_sqlite = UniversalTransferOperator(
        task_id="transfer_non_native_bigquery_to_sqlite",
        source_dataset=Table(
            name="uto_s3_to_bigquery_table", conn_id="google_cloud_default", metadata=Metadata(schema="astro")
        ),
        destination_dataset=File(name="uto_bigquery_to_sqlite_table", type=FileType.PARQUET, conn_id="sqlite_default"),
    # threshold_file_size=500MB
    )
    
    Assume:
    threshold_file_size=1GB
    
    OUPUT:
    uto_bigquery_to_sqlite_table_1 <- 1GB
    uto_bigquery_to_sqlite_table_2 <- 100MB
    ```

Describe the solution you'd like
Exporting a huge table into multiple smaller files allows users to effectively parallelise the transformation afterwards, using tools like Spark and Beam.

Additional context
More details at: notion doc

Acceptance Criteria

  • Test case with valid end-to-end transfer from a Table to multiple files
  • All checks and tests in the CI should pass
  • Unit tests (90% code coverage or more)
  • Integration tests (if the feature relates to a new database or external service)
  • Example DAG
  • Docstrings in reStructuredText for each of methods, classes, functions and module-level attributes (including Example DAG on how it should be used)
  • Exception handling in case of errors
  • Logging (are we exposing useful information to the user? e.g. source and destination)
  • Improve the documentation (README, Sphinx, and any other relevant)
  • How to use Guide for the feature (example)
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant