Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
fix: Split payload content by smaller batches for embedding (#653)
<!-- ELLIPSIS_HIDDEN --> > [!IMPORTANT] > `embed_docs` in `embed_docs.py` now processes payloads in smaller batches asynchronously using `batched` and `asyncio`, with a new test case added. > > - **Behavior**: > - `embed_docs` in `embed_docs.py` now processes payload content in smaller batches using `batched` from `itertools`. > - Introduces `max_batch_size` parameter to control batch size, defaulting to 100. > - Uses `asyncio.wait` for asynchronous embedding of batches. > - **Functions**: > - Adds `embed_batch` inner function to process each batch of indices and snippets. > - Modifies `embed_docs` to use `embed_batch` for batch processing. > - **Imports**: > - Adds `asyncio` and `batched` imports to support new batching logic. > - **Tests**: > - Adds test case in `test_activities.py` to verify `embed_docs` with batching logic using `unittest.mock.patch`. > > <sup>This description was created by </sup>[<img alt="Ellipsis" src="https://img.shields.io/badge/Ellipsis-blue?color=175173">](https://www.ellipsis.dev?ref=julep-ai%2Fjulep&utm_source=github&utm_medium=referral)<sup> for 30b26be. It will automatically update as commits are pushed.</sup> <!-- ELLIPSIS_HIDDEN --> Co-authored-by: Diwank Singh Tomer <[email protected]>
- Loading branch information