Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(agents-api): Use blob store for objects that are too big #645

Merged
merged 13 commits into from
Oct 16, 2024

Conversation

creatorrr
Copy link
Contributor

@creatorrr creatorrr commented Oct 14, 2024

Fixes #579

Signed-off-by: Diwank Singh Tomer [email protected]


Important

Integrates blob store for large object handling in agents-api with S3 client and updated serialization logic.

  • Blob Store Integration:
    • Adds s3.py for S3 client operations: get_s3_client(), list_buckets(), setup(), exists(), add_object(), get_object(), delete_object(), and add_object_with_hash().
    • Updates env.py to include blob store configurations: use_blob_store_for_temporal, blob_store_bucket, blob_store_cutoff_kb, s3_endpoint, s3_access_key, s3_secret_key.
    • Modifies codec.py to use blob store for serialization if use_blob_store_for_temporal is true and object size exceeds blob_store_cutoff_kb.
  • Functionality:
    • Decorator auto_blob_store in storage_handler.py to automatically store large outputs in blob store and load inputs from blob store if they are RemoteObject.
    • Applies auto_blob_store to multiple functions in activities and task_steps to handle large data.
  • Configuration:
    • Updates .env.example and docker-compose.yml to include blob store environment variables.
    • Adds boto3 and xxhash to pyproject.toml dependencies.

This description was created by Ellipsis for 2ea0155. It will automatically update as commits are pushed.

Copy link
Contributor

@ellipsis-dev ellipsis-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❌ Changes requested. Reviewed everything up to cb591eb in 31 seconds

More details
  • Looked at 209 lines of code in 6 files
  • Skipped 1 files when reviewing.
  • Skipped posting 3 drafted comments based on config settings.
1. agents-api/agents_api/clients/s3.py:27
  • Draft comment:
    Consider caching the list_buckets() function to avoid redundant calls to the S3 client, which can improve performance.
  • Reason this comment was not posted:
    Confidence changes required: 50%
    The setup function in s3.py uses list_buckets() which calls get_s3_client(). This results in multiple calls to get_s3_client() which is cached. However, the list_buckets() function itself is not cached, leading to potential inefficiencies if called multiple times. Caching list_buckets() could improve performance.
2. agents-api/agents_api/worker/codec.py:43
  • Draft comment:
    Replace print statements with proper logging to avoid cluttering the output in production environments.
  • Reason this comment was not posted:
    Confidence changes required: 50%
    The print statements in serialize function are likely for debugging purposes. They should be removed or replaced with proper logging to avoid cluttering the output in production environments.
3. agents-api/agents_api/clients/s3.py:71
  • Draft comment:
    Using a dynamic maxsize for lru_cache based on blob_store_cutoff_kb can lead to unexpected cache sizes. Consider using a fixed size or document this behavior clearly.
  • Reason this comment was not posted:
    Confidence changes required: 50%
    The get_object function in s3.py uses lru_cache with a dynamic maxsize based on blob_store_cutoff_kb. This could lead to unexpected cache sizes if blob_store_cutoff_kb changes. It's better to use a fixed cache size or document this behavior clearly.

Workflow ID: wflow_ThcEpvmCn9iPZDua


Want Ellipsis to fix these issues? Tag @ellipsis-dev in a comment. You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

agents-api/agents_api/clients/s3.py Show resolved Hide resolved
Copy link
Contributor

@ellipsis-dev ellipsis-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Looks good to me! Incremental review on f577137 in 16 seconds

More details
  • Looked at 14 lines of code in 1 files
  • Skipped 0 files when reviewing.
  • Skipped posting 1 drafted comments based on config settings.
1. agents-api/agents_api/worker/codec.py:41
  • Draft comment:
    Remove commented-out print statements for cleaner code.
  • Reason this comment was not posted:
    Confidence changes required: 50%
    The print statements are unnecessary and should be removed for cleaner code.

Workflow ID: wflow_dOqOL6qJ4o2hWU3E


You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

Copy link
Contributor

@ellipsis-dev ellipsis-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Looks good to me! Incremental review on 2ea0155 in 49 seconds

More details
  • Looked at 637 lines of code in 27 files
  • Skipped 0 files when reviewing.
  • Skipped posting 0 drafted comments based on config settings.

Workflow ID: wflow_ppsUeJrzVMMn4m95


You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

@creatorrr creatorrr merged commit cd6045d into dev Oct 16, 2024
4 checks passed
@creatorrr creatorrr deleted the f/blob-store-temporal branch October 16, 2024 16:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Temporal: Input Exceeds Size Limit While Running Tasks With Large Data
2 participants