-
Notifications
You must be signed in to change notification settings - Fork 43
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Chore: New example sync to snowflake Cortex #350
base: main
Are you sure you want to change the base?
Conversation
WalkthroughWalkthroughThe changes involve enhancements to error handling across multiple files in the Airbyte framework. Modifications include the introduction of specific exception classes, adjustments to exception propagation, and refinements in how errors are logged and reported. A new example script for integrating with Snowflake as a destination has also been added, demonstrating the setup and configuration processes for data integration. Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant AirbyteConnector
participant Subprocess
participant ExceptionHandler
User->>AirbyteConnector: Initiate check
AirbyteConnector->>ExceptionHandler: Handle AirbyteConnectorFailedError
ExceptionHandler->>AirbyteConnector: Log original_exception
AirbyteConnector->>Subprocess: Execute subprocess
Subprocess-->>AirbyteConnector: Return exit_code
AirbyteConnector->>ExceptionHandler: Handle AirbyteSubprocessFailedError
ExceptionHandler->>AirbyteConnector: Log exit_code and original_exception
Wdyt? Recent review detailsConfiguration used: CodeRabbit UI Files selected for processing (1)
Files skipped from review as they are similar to previous changes (1)
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
Outside diff range, codebase verification and nitpick comments (4)
examples/run_snowflake_destination.py (3)
8-34
: Consider parameterizing the GCP project name?The script looks great for demonstrating Snowflake integration! I noticed the GCP project name is hardcoded. For a demo script, this is totally fine, but what do you think about making it a parameter? This could make the script more flexible for different demo scenarios. Wdyt?
If you agree, here's a simple way to do it:
import os AIRBYTE_INTERNAL_GCP_PROJECT = os.getenv("GCP_PROJECT", "dataline-integration-testing")This way, you can easily override the project name when running the script, like:
GCP_PROJECT=my-demo-project poetry run python examples/run_snowflake_destination.py
74-87
: Streamline the main flow?I like how you're checking the destinations and source before writing. That's a good practice to show in a demo. However, I noticed there's some commented-out code for writing to the Snowflake destination. For a clear demo flow, what do you think about removing the commented sections and focusing on the main Cortex write operation? This could make the script's purpose more immediately clear to readers. Wdyt?
If you agree, we could simplify this section to something like:
snowflake_destination.check() cortex_destination.check() source.check() cortex_write_result = cortex_destination.write( source, cache=False, )This keeps the focus on the main flow of the demo.
89-92
: Clarify the purpose of the commented section?I noticed there's a commented-out section at the end for reading from the cache and printing stream information. While this could be useful for debugging or extending the demo, it might be a bit confusing in the main script. What do you think about either removing this section or adding a clear comment explaining its purpose and how it could be used? This could help readers understand the full capabilities of the script without cluttering the main flow. Wdyt?
If you decide to keep it, maybe add a comment like:
# Uncomment the following lines to read from the cache and print stream information # result = source.read(cache) # for name in ["products"]: # print(f"Stream {name}: {len(list(result[name]))} records")This way, it's clear how this section could be used without distracting from the main demo.
airbyte/_executors/base.py (1)
45-48
: Nice addition of specific error handling for BrokenPipeError!This change improves the robustness of the input pumping process. Would you consider adding a brief comment explaining why we're treating BrokenPipeError differently? It might help future maintainers understand the rationale. Something like:
# BrokenPipeError is expected when the subprocess closes its input stream # before we finish writing. We handle this gracefully to allow normal termination.What do you think? This could make the code even more self-documenting. WDYT?
cache = SnowflakeCache( | ||
account=secret_config["account"], | ||
username=secret_config["username"], | ||
password=secret_config["password"], | ||
database=secret_config["database"], | ||
warehouse=secret_config["warehouse"], | ||
role=secret_config["role"], | ||
) | ||
snowflake_destination = ab.get_destination( | ||
"destination-snowflake", | ||
config={ | ||
**snowflake_destination_secret, | ||
"default_schema": "pyairbyte_tests", | ||
}, | ||
) | ||
cortex_destination_secret["processing"]["text_fields"] = [ | ||
"make", | ||
"model", | ||
"name", | ||
"gender", | ||
] | ||
cortex_destination_secret["indexing"]["default_schema"] = "pyairbyte_tests" | ||
cortex_destination = ab.get_destination( | ||
"destination-snowflake-cortex", | ||
config=cortex_destination_secret, | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider focusing on one destination type?
The setup for both Snowflake and Cortex destinations is comprehensive. For a demo script, though, we might want to keep things as simple as possible. What do you think about focusing on just one destination type? Maybe we could have separate example scripts for Snowflake and Cortex? This could make each example more focused and easier to follow. Wdyt?
If you agree, we could either:
- Keep just the Snowflake destination and remove the Cortex-specific code.
- Keep just the Cortex destination and remove the regular Snowflake destination code.
- Create two separate example scripts, one for each destination type.
airbyte/_executors/base.py
Outdated
if exception_holder.exception and not isinstance( | ||
exception_holder.exception, exc.AirbyteConnectorBrokenPipeError | ||
): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great refinement of the error handling logic!
I like how you've improved the control flow by checking for AirbyteConnectorBrokenPipeError
. This change enhances the overall error management.
For the conditional inclusion of the original exception, would you consider extracting the condition into a separate variable for improved readability? Something like:
is_reportable_exception = (
exception_holder.exception
and not isinstance(exception_holder.exception, exc.AirbyteConnectorBrokenPipeError)
)
raise exc.AirbyteSubprocessFailedError(
run_args=args,
exit_code=exit_code,
original_exception=exception_holder.exception if is_reportable_exception else None,
)
This could make the logic a bit clearer at a glance. What are your thoughts on this suggestion?
Also applies to: 142-148
/fix-pr
|
Summary by CodeRabbit
New Features
Bug Fixes
Documentation
Chores