-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: fallback to json
if orjson
cannot serialize value
#210
Conversation
Signed-off-by: Artem Inzhyyants <[email protected]>
Signed-off-by: Artem Inzhyyants <[email protected]>
📝 WalkthroughWalkthroughThe changes focus on improving error handling in the Changes
Sequence DiagramsequenceDiagram
participant Entrypoint as AirbyteEntrypoint
participant Serializer as AirbyteMessageSerializer
participant OrJSON as orjson
participant StdJSON as json
Entrypoint->>Serializer: dump(airbyte_message)
Serializer-->>Entrypoint: serialized_message
Entrypoint->>OrJSON: dumps(serialized_message)
alt Serialization Successful
OrJSON-->>Entrypoint: JSON string
else Serialization Failed
Entrypoint->>StdJSON: dumps(serialized_message)
StdJSON-->>Entrypoint: JSON string
end
What do you think about this summary? Is there anything else you’d like to add or modify? 📜 Recent review detailsConfiguration used: CodeRabbit UI 📒 Files selected for processing (2)
🚧 Files skipped from review as they are similar to previous changes (2)
⏰ Context from checks skipped due to timeout of 90000ms (3)
Finishing Touches
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
airbyte_cdk/entrypoint.py (1)
295-299
: How about adding some improvements to the error handling? wdyt?The fallback logic works, but we could make it more robust:
- Consider catching specific exceptions instead of using a bare
except
- Maybe add debug logging when falling back to help with troubleshooting?
- Would it be helpful to add a comment about the ~30% performance impact when falling back?
Here's a suggestion:
serialized_message = AirbyteMessageSerializer.dump(airbyte_message) try: return orjson.dumps(serialized_message).decode() - except: + except (orjson.JSONEncodeError, ValueError) as e: + logger.debug(f"Failed to serialize with orjson, falling back to json: {str(e)}") + # Note: Fallback to json.dumps may result in ~30% slower serialization return json.dumps(serialized_message)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
airbyte_cdk/entrypoint.py
(2 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (2)
- GitHub Check: Pytest (All, Python 3.11, Ubuntu)
- GitHub Check: Pytest (All, Python 3.10, Ubuntu)
🔇 Additional comments (2)
airbyte_cdk/entrypoint.py (2)
8-8
: LGTM!The
json
import is correctly placed with other standard library imports.
295-299
: Let's verify the usage and impact of this changeThe changes look good, but let's verify the impact across the codebase.
✅ Verification successful
The serialization change looks good! 👍
The change aligns well with the codebase's existing patterns where orjson is already widely used for serialization. The fallback to json.dumps ensures backward compatibility in case of any edge cases.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Find all usages of airbyte_message_to_string to understand the impact echo "Searching for airbyte_message_to_string usage:" rg "airbyte_message_to_string" -B 2 -A 2 # Look for any performance-related tests that might need updating echo -e "\nSearching for performance tests:" rg -i "performance|benchmark" --type pythonLength of output: 3510
Script:
#!/bin/bash # Check for orjson import and usage echo "Checking orjson imports:" rg "import.*orjson" -B 2 -A 2 # Look for serialization-related tests echo -e "\nChecking for serialization tests:" rg -l "test.*serial|serial.*test" --type-add 'python:*.py' --type python # Check for other json.dumps usage echo -e "\nChecking other json.dumps usage:" rg "json\.dumps" --type-add 'python:*.py' --type python -B 2 -A 2Length of output: 70089
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the analysis!
I've added a log and a test. Can you have a look at it? If this is fine for you, I'll release the CDK and klaviyo
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
unit_tests/test_entrypoint.py (1)
773-788
: Enhance test assertions and documentation?The test verifies the fallback mechanism when
orjson
fails to serialize large integers. Consider these improvements, wdyt?
- Add docstring explaining the test's purpose and the specific scenario being tested
- Verify the content of the serialized messages to ensure data integrity
- Add assertions to confirm that the fallback to
json
actually occurredHere's a suggested implementation:
def test_given_serialization_error_using_orjson_then_fallback_on_json(entrypoint: AirbyteEntrypoint, mocker, spec_mock, config_mock): + """ + Test that when orjson fails to serialize a value (e.g., integers > int64), + the system falls back to the standard json library successfully. + """ parsed_args = Namespace( command="read", config="config_path", state="statepath", catalog="catalogpath" ) record = AirbyteMessage(record=AirbyteRecordMessage(stream="stream", data={"data": 7046723166326052303072}, emitted_at=1), type=Type.RECORD) + # Mock logger to verify fallback warning + mock_logger = mocker.patch("airbyte_cdk.entrypoint.logger") mocker.patch.object(MockSource, "read_state", return_value={}) mocker.patch.object(MockSource, "read_catalog", return_value={}) mocker.patch.object( MockSource, "read", return_value=[record, record] ) messages = list(entrypoint.run(parsed_args)) # There will be multiple messages here because the fixture `entrypoint` sets a control message. We only care about records here record_messages = list(filter(lambda message: "RECORD" in message, messages)) assert len(record_messages) == 2 + # Verify that the fallback warning was logged + mock_logger.warning.assert_called_once_with("Failed to serialize message with orjson, falling back to json") + # Verify the content of serialized messages + for message in record_messages: + deserialized = json.loads(message) + assert deserialized["record"]["data"]["data"] == 7046723166326052303072
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
airbyte_cdk/entrypoint.py
(3 hunks)unit_tests/test_entrypoint.py
(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- airbyte_cdk/entrypoint.py
⏰ Context from checks skipped due to timeout of 90000ms (10)
- GitHub Check: Ruff Lint Check
- GitHub Check: Build and Inspect Python Package
- GitHub Check: Ruff Format Check
- GitHub Check: Pytest (All, Python 3.11, Ubuntu)
- GitHub Check: preview_docs
- GitHub Check: Pytest (All, Python 3.10, Ubuntu)
- GitHub Check: MyPy Check
- GitHub Check: Pytest (Fast)
- GitHub Check: update_release_draft
- GitHub Check: Analyze (python)
Signed-off-by: Artem Inzhyyants <[email protected]>
What
resolve https://github.com/airbytehq/oncall/issues/7226
too big ( >
int64
) values can not be serialized by orjson library: ijl/orjson#116How
fallback to
json
iforjson
cannot serialize valueTest
Prerequisites
{ "count": 10000000 }
Test results
{"field5": 111111111111111111111}
in DummyFieldsConclusion
Summary by CodeRabbit
Bug Fixes
Tests