Skip to content
This repository has been archived by the owner on Apr 11, 2024. It is now read-only.

File to Dataframe support - GCS/S3 to Pandas Dataframe #21

Open
9 tasks
sunank200 opened this issue Feb 7, 2023 · 0 comments
Open
9 tasks

File to Dataframe support - GCS/S3 to Pandas Dataframe #21

sunank200 opened this issue Feb 7, 2023 · 0 comments
Assignees

Comments

@sunank200
Copy link
Collaborator

sunank200 commented Feb 7, 2023

Please describe the feature you'd like to see

  • Define interfaces

    • Use Airflow 2.4 Dataset concept to build more types of Datasets:
      • Dataframe
    • Dataframe DataProviders
      • Add interface for Dataframe DataProvider.
      • Add interface for DataframeProviders.
      • Add read and write methods in DataframeDataProviders with the context manager.
  • Non-native transfers

    • Add a transfer workflow for S3 to Pandas Dataframe using a non-native approach.
    • Add a transfer workflow for GCS to Pandas Dataframe using a non-native approach.
    • Add example DAG for S3 to Pandas Dataframe implementation.
    • Add example DAG for GCS to Pandas Dataframe implementation.

Acceptance Criteria

  • All checks and tests in the CI should pass
  • Unit tests (90% code coverage or more, once available)
  • Integration tests (if the feature relates to a new database or external service)
  • Example DAG
  • Docstrings in reStructuredText for each of methods, classes, functions and module-level attributes (including Example DAG on how it should be used)
  • Exception handling in case of errors
  • Logging (are we exposing useful information to the user? e.g. source and destination)
  • Improve the documentation (README, Sphinx, and any other relevant)
  • How to use Guide for the feature (example)
@sunank200 sunank200 transferred this issue from astronomer/astro-sdk Mar 27, 2023
@utkarsharma2 utkarsharma2 mentioned this issue Apr 17, 2023
2 tasks
utkarsharma2 added a commit that referenced this issue Aug 25, 2023
# Description
## What is the current behavior?
Currently, the dataframe dataset is not supported.

closes:
#18
#21

## What is the new behavior?
Added dataframe dataset.

## Does this introduce a breaking change?
Nope


### Checklist
- [ ] Created tests which fail without the change (if possible)
- [ ] Extended the README / documentation, if necessary

---------

Co-authored-by: Wei Lee <[email protected]>
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants