Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for chunked reading of Parquet and CSV files in Reader class #47

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

butkeraites-hotglue
Copy link
Contributor

  • Implemented get_in_chunks method to read large datasets in manageable chunks.
  • Enhanced handling of Parquet files with catalog types, including type mapping for pandas dtypes to pyarrow types.
  • Updated CSV reading to ensure date fields are correctly parsed.
  • Added error handling for unsupported file types and improved user feedback for method usage.

This change improves memory efficiency and flexibility when processing large ETL files.

- Implemented `get_in_chunks` method to read large datasets in manageable chunks.
- Enhanced handling of Parquet files with catalog types, including type mapping for pandas dtypes to pyarrow types.
- Updated CSV reading to ensure date fields are correctly parsed.
- Added error handling for unsupported file types and improved user feedback for method usage.

This change improves memory efficiency and flexibility when processing large ETL files.
- Consolidated logic for reading Parquet and CSV files into dedicated methods: `_process_parquet`, `_process_csv`, `_process_parquet_in_chunks`, and `_process_csv_in_chunks`.
- Introduced helper methods for schema creation and type conversion, enhancing code readability and maintainability.
- Improved error handling when parsing catalog types, ensuring robustness in data processing.
- Streamlined the handling of data types and date parsing for both file formats.

This refactor enhances the clarity and efficiency of the Reader class, making it easier to manage file input and type handling.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant