-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: add request count with eviction for not retriable requests to fix memory problem #92
fix: add request count with eviction for not retriable requests to fix memory problem #92
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM for this part. I'm mostly wondering on the other changes in maxi297/test_closing_requests_objects now
/autofix
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There seems to be a test failing but apart from that, the change looks good to me. Thanks for hopping onto this one Aldo! (note that you will need to auto approve as I was the one opening this PR)
Following the release of this, can we release source-klaviyo with an updated CDK version?
Auto approve is my passion. Yes, I will look at the test and later follow up with a Klaviyo release. |
📝 Walkthrough📝 WalkthroughWalkthroughThe changes in this pull request primarily enhance the Changes
Sequence Diagram(s)sequenceDiagram
participant Client
participant HttpClient
participant Server
Client->>HttpClient: Send request
HttpClient->>Server: Forward request
Server-->>HttpClient: Response (success/error)
HttpClient->>HttpClient: Handle response
alt Error response
HttpClient->>HttpClient: Check if retry is needed
HttpClient->>HttpClient: Evict key if no retry
end
HttpClient-->>Client: Return response
What do you think of these changes? Do they align with your expectations for improving the 📜 Recent review detailsConfiguration used: CodeRabbit UI 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Outside diff range and nitpick comments (4)
pyproject.toml (1)
97-97
: Consider pinning the pympler version for reproducible buildsHey! I see you've added pympler for memory profiling, which aligns perfectly with the memory optimization goals 🎯. However, using "*" might lead to unexpected behavior if a breaking change is introduced in future versions. Would you consider pinning to a specific version or version range (e.g., "^1.0.0") for more predictable builds? wdyt?
unit_tests/sources/streams/http/test_http_client.py (2)
416-417
: Consider extracting magic numbers as constants for better maintainability.The values
1000
and250
could be more meaningful as named constants. What do you think about something like this?+TOTAL_TEST_REQUESTS = 1000 +MAX_MEMORY_THRESHOLD_BYTES = 250 # Empirically determined threshold - requests_to_make = 1000 + requests_to_make = TOTAL_TEST_REQUESTS for requests_count in range(requests_to_make): ... - assert size_of_request_count_store < 250 + assert size_of_request_count_store < MAX_MEMORY_THRESHOLD_BYTESAlso applies to: 423-424
422-422
: The memory comparison comment could be more descriptive.The comment about 0.5 MB could be clearer about how this number was determined. Maybe something like:
- # for the number of requests_to_make if we didn't evict the requests we could see increase the value ~0.5 MB + # Without request eviction, storing 1000 requests would consume ~0.5 MB (empirically measured)wdyt?
airbyte_cdk/sources/streams/http/http_client.py (1)
356-358
: Are we evicting keys in all necessary scenarios?We currently call
_evict_key
whenresponse_action
is not in_ACTIONS_TO_RETRY_ON
. Should we also consider evicting keys for other response actions to prevent potential memory leaks? What do you think?
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
⛔ Files ignored due to path filters (1)
poetry.lock
is excluded by!**/*.lock
📒 Files selected for processing (3)
airbyte_cdk/sources/streams/http/http_client.py
(5 hunks)pyproject.toml
(1 hunks)unit_tests/sources/streams/http/test_http_client.py
(2 hunks)
🔇 Additional comments (4)
pyproject.toml (1)
97-97
: Verify pympler usage in test files
The addition of pympler in dev dependencies looks good! To ensure it's being used effectively for memory profiling, let's verify its implementation:
✅ Verification successful
Pympler is used appropriately for memory profiling, but could we expand its usage? 🤔
The dependency is correctly placed in dev dependencies and is being used to measure memory allocation in HTTP client tests. I see it's specifically checking that the request attempt count store stays under 250 bytes - neat!
Would you consider adding more memory checks in other critical areas of the codebase where memory usage is important? Just a thought since we have the tool available now! wdyt?
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Search for pympler imports and usage in test files
echo "Checking pympler usage in test files:"
rg -l "from pympler" unit_tests/
rg -l "import pympler" unit_tests/
Length of output: 208
Script:
#!/bin/bash
# Let's check the actual implementation to understand how pympler is being used
echo "Checking pympler usage in http_client test:"
rg -A 5 "pympler" unit_tests/sources/streams/http/test_http_client.py
# Let's also check if there are any other memory-related tests
echo -e "\nChecking for memory-related tests:"
rg -i "memory|mem_|size|allocation" unit_tests/sources/streams/http/test_http_client.py
Length of output: 645
unit_tests/sources/streams/http/test_http_client.py (2)
5-5
: LGTM! Clean import addition.
The asizeof
import from pympler
is well-placed and essential for memory measurements.
378-425
: Consider potential test flakiness due to system-dependent memory measurements.
The test relies on absolute memory measurements which could vary across different systems or CI environments. Would you consider either:
- Making the threshold relative (e.g., compare initial vs final memory usage)
- Adding a tolerance factor to the assertion
- Skip the test in CI environments
Let's check if this test is already flaky in CI:
airbyte_cdk/sources/streams/http/http_client.py (1)
385-385
: Can we ensure filter_secrets
sanitizes all sensitive data?
While using filter_secrets
on the error_message
before logging, are we confident it effectively sanitizes all sensitive information, including any secrets in request.url
and request.body
? Should we verify this to prevent accidental leakage? Wdyt?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Outside diff range and nitpick comments (1)
unit_tests/sources/streams/http/test_http_client.py (1)
379-426
: Consider enhancing the test's maintainability and clarityA few suggestions to make this test more robust and maintainable:
- The magic number
250
for size assertion could be a named constant at the module level, what do you think?- The test documentation could better explain the memory management strategy being tested.
- We could parameterize this test to cover different scenarios (e.g., different request counts, size thresholds).
- The
nonlocal
variable usage could be simplified.Here's a possible refactor, wdyt?
+# Maximum allowed size in bytes for the request count store +MAX_REQUEST_COUNT_STORE_SIZE = 250 -@pytest.mark.usefixtures("mock_sleep") -def test_evicting_requests_for_request_count(): +@pytest.mark.usefixtures("mock_sleep") +@pytest.mark.parametrize("requests_to_make,max_size", [ + (1000, MAX_REQUEST_COUNT_STORE_SIZE), + (500, MAX_REQUEST_COUNT_STORE_SIZE // 2), +]) +def test_evicting_requests_for_request_count(requests_to_make, max_size): + """Test that the request count store's memory usage remains bounded. + + This test verifies that the HttpClient properly evicts completed requests + from its request count store to prevent unbounded memory growth. + """ mocked_session = MagicMock(spec=requests.Session) valid_response = MagicMock(spec=requests.Response) valid_response.status_code = 200 valid_response.ok = True valid_response.headers = {} - call_count = 3 + retry_count = 3 def update_response(*args, **kwargs): - nonlocal call_count - if http_client._session.send.call_count == call_count: - call_count += 3 + if http_client._session.send.call_count % retry_count == 0: return valid_response - else: - retry_response = MagicMock(spec=requests.Response) - retry_response.ok = False - retry_response.status_code = 408 - retry_response.headers = {} - return retry_response + retry_response = MagicMock(spec=requests.Response) + retry_response.ok = False + retry_response.status_code = 408 + retry_response.headers = {} + return retry_response # ... rest of the test ... - assert size_of_request_count_store < 250 + assert size_of_request_count_store < max_size
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (3)
airbyte_cdk/sources/streams/http/http_client.py
(3 hunks)pyproject.toml
(1 hunks)unit_tests/sources/streams/http/test_http_client.py
(2 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- pyproject.toml
🔇 Additional comments (5)
airbyte_cdk/sources/streams/http/http_client.py (4)
79-79
: Nice addition of the retry actions constant! 👍
The set-based approach for retry actions improves readability and maintainability. The constant effectively encapsulates the retry logic in one place.
382-384
: Clean integration of the eviction logic! 🎯
The eviction is well-placed to occur when we're certain no more retries are needed. The code reads naturally and maintains the existing error handling flow.
363-373
: 🛠️ Refactor suggestion
Consider enhancing thread safety for the eviction method
The implementation looks good and the docstring is very informative! However, in concurrent scenarios, the check-then-delete pattern might lead to race conditions. What do you think about using a thread-safe data structure for _request_attempt_count
? For example:
- self._request_attempt_count: Dict[requests.PreparedRequest, int] = {}
+ from threading import Lock
+ from collections import defaultdict
+ self._request_attempt_count_lock = Lock()
+ self._request_attempt_count = defaultdict(int)
Then in _evict_key
:
def _evict_key(self, prepared_request: requests.PreparedRequest) -> None:
- if prepared_request in self._request_attempt_count:
- del self._request_attempt_count[prepared_request]
+ with self._request_attempt_count_lock:
+ if prepared_request in self._request_attempt_count:
+ del self._request_attempt_count[prepared_request]
This would ensure thread-safe operations. Wdyt? 🤔
Likely invalid or redundant comment.
Line range hint 363-384
: Verify memory optimization effectiveness
Let's confirm that the memory usage is indeed optimized with these changes.
✅ Verification successful
Memory optimization implementation looks solid! 🎯
The implementation is well-tested and effectively addresses the memory consumption issue:
- Found memory-specific test in
unit_tests/sources/streams/http/test_http_client.py
that verifies the memory footprint stays under 250 bytes - The
_request_attempt_count
is properly managed with eviction when requests are completed - Implementation is used consistently across concurrent scenarios (verified through test coverage)
- Test cases cover both regular retries and exponential backoff scenarios
The memory optimization appears to be working as intended, with proper test coverage and implementation. The size assertion in tests provides a concrete guarantee against memory bloat.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Check for memory-related tests and verify the implementation
# Look for memory-related tests
echo "Searching for memory-related tests..."
rg -l "test.*memory|test.*evict" --type python
# Check for concurrent testing scenarios
echo "Searching for concurrent testing patterns..."
rg -l "concurrent|threading|asyncio" --type python
# Look for other request count implementations that might need similar optimization
echo "Searching for other request counting patterns..."
ast-grep --pattern 'request_attempt_count = {$$$}'
Length of output: 507
Script:
#!/bin/bash
# Let's try again with correct file type specifications
# Look for memory-related tests
echo "Searching for memory-related tests..."
rg -l "test.*memory|test.*evict"
# Check for concurrent testing scenarios
echo "Searching for concurrent testing patterns..."
rg -l "concurrent|threading|asyncio"
# Look for the actual implementation of request counting
echo "Searching for request counting implementation..."
rg "_request_attempt_count" -A 2 -B 2
# Check for related test files
echo "Looking for related test files..."
fd "test.*http.*\.py$"
Length of output: 12692
unit_tests/sources/streams/http/test_http_client.py (1)
5-5
: LGTM!
The import statement is well-placed and properly used for memory size measurement.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
APPROVED
I will release it on 2024-12-02. |
what
Addresses high memory consumption when enabling concurrency in https://github.com/airbytehq/oncall/issues/6821.
The
_request_attempt_count
attribute keeps growing as multiple requests are made using the samehttp_client
.To mitigate this issue, we evict keys for completed requests once we confirm that no further retries are needed.
This helps manage memory usage more efficiently while maintaining the necessary logic for retry attempts.
Summary by CodeRabbit
New Features
Bug Fixes
Tests
Chores
pympler
) for improved memory profiling.