Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
feat(agents-api): Optimize Search Queries NLP processing pipeline (#735)
<!-- ELLIPSIS_HIDDEN --> > [!IMPORTANT] > Optimized NLP processing in `nlp.py` with caching, batch processing, and enhanced query building, and switched deployment to Gunicorn. > > - **Performance Optimization**: > - Introduced `KeywordMatcher` singleton with batch processing in `nlp.py` for efficient keyword matching. > - Added `lru_cache` to `clean_keyword()` and `_create_pattern()` for caching results. > - Optimized `extract_keywords()` to process spans in a single pass and count frequencies efficiently. > - **Functionality Changes**: > - Modified `paragraph_to_custom_queries()` to include `min_keywords` parameter for filtering low-value queries. > - Enhanced `find_proximity_groups()` with sorted positions and union-find for efficient grouping. > - Improved `build_query()` with cached patterns for query construction. > - **Deployment**: > - Changed `ENTRYPOINT` in `Dockerfile` to use Gunicorn with `gunicorn_conf.py`. > - Added `gunicorn_conf.py` for Gunicorn configuration. > - Updated `pyproject.toml` to include `gunicorn` and `uvloop` dependencies. > - **Miscellaneous**: > - Precompiled regex patterns for whitespace and non-alphanumeric characters in `nlp.py`. > - Disabled unused components in spaCy pipeline for performance. > > <sup>This description was created by </sup>[<img alt="Ellipsis" src="https://img.shields.io/badge/Ellipsis-blue?color=175173">](https://www.ellipsis.dev?ref=julep-ai%2Fjulep&utm_source=github&utm_medium=referral)<sup> for 0f4c4e0. It will automatically update as commits are pushed.</sup> <!-- ELLIPSIS_HIDDEN --> --------- Signed-off-by: Diwank Singh Tomer <[email protected]> Co-authored-by: HamadaSalhab <[email protected]> Co-authored-by: Diwank Singh Tomer <[email protected]>
- Loading branch information