Skip to content

Commit

Permalink
feat(agents-api): Optimize Search Queries NLP processing pipeline (#735)
Browse files Browse the repository at this point in the history
<!-- ELLIPSIS_HIDDEN -->


> [!IMPORTANT]
> Optimized NLP processing in `nlp.py` with caching, batch processing,
and enhanced query building, and switched deployment to Gunicorn.
> 
>   - **Performance Optimization**:
> - Introduced `KeywordMatcher` singleton with batch processing in
`nlp.py` for efficient keyword matching.
> - Added `lru_cache` to `clean_keyword()` and `_create_pattern()` for
caching results.
> - Optimized `extract_keywords()` to process spans in a single pass and
count frequencies efficiently.
>   - **Functionality Changes**:
> - Modified `paragraph_to_custom_queries()` to include `min_keywords`
parameter for filtering low-value queries.
> - Enhanced `find_proximity_groups()` with sorted positions and
union-find for efficient grouping.
> - Improved `build_query()` with cached patterns for query
construction.
>   - **Deployment**:
> - Changed `ENTRYPOINT` in `Dockerfile` to use Gunicorn with
`gunicorn_conf.py`.
>     - Added `gunicorn_conf.py` for Gunicorn configuration.
> - Updated `pyproject.toml` to include `gunicorn` and `uvloop`
dependencies.
>   - **Miscellaneous**:
> - Precompiled regex patterns for whitespace and non-alphanumeric
characters in `nlp.py`.
>     - Disabled unused components in spaCy pipeline for performance.
> 
> <sup>This description was created by </sup>[<img alt="Ellipsis"
src="https://img.shields.io/badge/Ellipsis-blue?color=175173">](https://www.ellipsis.dev?ref=julep-ai%2Fjulep&utm_source=github&utm_medium=referral)<sup>
for 0f4c4e0. It will automatically
update as commits are pushed.</sup>


<!-- ELLIPSIS_HIDDEN -->

---------

Signed-off-by: Diwank Singh Tomer <[email protected]>
Co-authored-by: HamadaSalhab <[email protected]>
Co-authored-by: Diwank Singh Tomer <[email protected]>
  • Loading branch information
3 people authored Oct 24, 2024
1 parent 7f7fce9 commit 65e912d
Show file tree
Hide file tree
Showing 8 changed files with 361 additions and 173 deletions.
2 changes: 1 addition & 1 deletion agents-api/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -41,4 +41,4 @@ RUN poetry install --no-dev --no-root

COPY . ./

ENTRYPOINT ["python", "-m", "agents_api.web", "--host", "0.0.0.0", "--port", "8080"]
ENTRYPOINT ["gunicorn", "agents_api.web:app", "-c", "gunicorn_conf.py"]
Loading

0 comments on commit 65e912d

Please sign in to comment.