Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Split payload content by smaller batches for embedding #653

Merged
merged 2 commits into from
Oct 15, 2024

Conversation

whiterabbit1983
Copy link
Contributor

@whiterabbit1983 whiterabbit1983 commented Oct 15, 2024

Important

embed_docs in embed_docs.py now processes payloads in smaller batches asynchronously using batched and asyncio, with a new test case added.

  • Behavior:
    • embed_docs in embed_docs.py now processes payload content in smaller batches using batched from itertools.
    • Introduces max_batch_size parameter to control batch size, defaulting to 100.
    • Uses asyncio.wait for asynchronous embedding of batches.
  • Functions:
    • Adds embed_batch inner function to process each batch of indices and snippets.
    • Modifies embed_docs to use embed_batch for batch processing.
  • Imports:
    • Adds asyncio and batched imports to support new batching logic.
  • Tests:
    • Adds test case in test_activities.py to verify embed_docs with batching logic using unittest.mock.patch.

This description was created by Ellipsis for 30b26be. It will automatically update as commits are pushed.

Copy link
Contributor

@ellipsis-dev ellipsis-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❌ Changes requested. Reviewed everything up to 54bb2e8 in 33 seconds

More details
  • Looked at 70 lines of code in 1 files
  • Skipped 0 files when reviewing.
  • Skipped posting 0 drafted comments based on config settings.

Workflow ID: wflow_FIAa0RNmiZNq7CCF


Want Ellipsis to fix these issues? Tag @ellipsis-dev in a comment. You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

agents-api/agents_api/activities/embed_docs.py Outdated Show resolved Hide resolved
@whiterabbit1983 whiterabbit1983 force-pushed the x/embedding-batch-size-limit branch from 54bb2e8 to ec62ad4 Compare October 15, 2024 09:50
Copy link
Contributor

@ellipsis-dev ellipsis-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❌ Changes requested. Incremental review on ec62ad4 in 21 seconds

More details
  • Looked at 116 lines of code in 2 files
  • Skipped 0 files when reviewing.
  • Skipped posting 0 drafted comments based on config settings.

Workflow ID: wflow_I3kx3ykCwpJQk4Y6


Want Ellipsis to fix these issues? Tag @ellipsis-dev in a comment. You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

agents-api/tests/test_activities.py Outdated Show resolved Hide resolved
@whiterabbit1983 whiterabbit1983 force-pushed the x/embedding-batch-size-limit branch from 86cf911 to 7505347 Compare October 15, 2024 09:53
Copy link
Contributor

@ellipsis-dev ellipsis-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❌ Changes requested. Incremental review on 7505347 in 24 seconds

More details
  • Looked at 116 lines of code in 2 files
  • Skipped 0 files when reviewing.
  • Skipped posting 0 drafted comments based on config settings.

Workflow ID: wflow_UqXhNfmt1sPs48pp


Want Ellipsis to fix these issues? Tag @ellipsis-dev in a comment. You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

agents-api/agents_api/activities/embed_docs.py Outdated Show resolved Hide resolved
@whiterabbit1983 whiterabbit1983 force-pushed the x/embedding-batch-size-limit branch from bc46948 to 0476f38 Compare October 15, 2024 13:36
Copy link
Contributor

@ellipsis-dev ellipsis-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❌ Changes requested. Incremental review on 0476f38 in 25 seconds

More details
  • Looked at 77 lines of code in 1 files
  • Skipped 0 files when reviewing.
  • Skipped posting 0 drafted comments based on config settings.

Workflow ID: wflow_WbBquX6idAR3tRpX


Want Ellipsis to fix these issues? Tag @ellipsis-dev in a comment. You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

agents-api/agents_api/activities/embed_docs.py Outdated Show resolved Hide resolved
@whiterabbit1983 whiterabbit1983 force-pushed the x/embedding-batch-size-limit branch from 0476f38 to cb9eb91 Compare October 15, 2024 13:54
Copy link
Contributor

@ellipsis-dev ellipsis-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❌ Changes requested. Incremental review on cb9eb91 in 45 seconds

More details
  • Looked at 60 lines of code in 1 files
  • Skipped 0 files when reviewing.
  • Skipped posting 2 drafted comments based on config settings.
1. agents-api/agents_api/activities/embed_docs.py:19
  • Draft comment:
    The indices variable is created but not used in embed_snippets_query. Ensure that this is intentional and that indices are not needed.
  • Reason this comment was not posted:
    Comment did not seem useful.
2. agents-api/agents_api/activities/embed_docs.py:49
  • Draft comment:
    The max_batch_size parameter in mock_embed_docs is unused and can be removed for clarity.
  • Reason this comment was not posted:
    Confidence changes required: 50%
    The mock_embed_docs function has a max_batch_size parameter that is not used. This is unnecessary and should be removed for clarity.

Workflow ID: wflow_qzPp0Wty8HHM5R3Q


Want Ellipsis to fix these issues? Tag @ellipsis-dev in a comment. You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

@whiterabbit1983 whiterabbit1983 force-pushed the x/embedding-batch-size-limit branch from cb9eb91 to 30b26be Compare October 15, 2024 14:06
Copy link
Contributor

@ellipsis-dev ellipsis-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❌ Changes requested. Incremental review on 30b26be in 24 seconds

More details
  • Looked at 60 lines of code in 1 files
  • Skipped 0 files when reviewing.
  • Skipped posting 0 drafted comments based on config settings.

Workflow ID: wflow_j5MnZy7166y1WaFm


Want Ellipsis to fix these issues? Tag @ellipsis-dev in a comment. You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

@creatorrr
Copy link
Contributor

lgtm. just double check the tests and typechecks (maybe add a mock embedding test with 1000 entries or something)

@whiterabbit1983 whiterabbit1983 merged commit 3dacacd into dev Oct 15, 2024
7 of 11 checks passed
@whiterabbit1983 whiterabbit1983 deleted the x/embedding-batch-size-limit branch October 15, 2024 18:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants