feat(agents-api): Optimize Search Queries NLP processing pipeline (#735) · julep-ai/julep@65e912d

Commit

feat(agents-api): Optimize Search Queries NLP processing pipeline (#735)

<!-- ELLIPSIS_HIDDEN -->


> [!IMPORTANT]
> Optimized NLP processing in `nlp.py` with caching, batch processing,
and enhanced query building, and switched deployment to Gunicorn.
> 
>   - **Performance Optimization**:
> - Introduced `KeywordMatcher` singleton with batch processing in
`nlp.py` for efficient keyword matching.
> - Added `lru_cache` to `clean_keyword()` and `_create_pattern()` for
caching results.
> - Optimized `extract_keywords()` to process spans in a single pass and
count frequencies efficiently.
>   - **Functionality Changes**:
> - Modified `paragraph_to_custom_queries()` to include `min_keywords`
parameter for filtering low-value queries.
> - Enhanced `find_proximity_groups()` with sorted positions and
union-find for efficient grouping.
> - Improved `build_query()` with cached patterns for query
construction.
>   - **Deployment**:
> - Changed `ENTRYPOINT` in `Dockerfile` to use Gunicorn with
`gunicorn_conf.py`.
>     - Added `gunicorn_conf.py` for Gunicorn configuration.
> - Updated `pyproject.toml` to include `gunicorn` and `uvloop`
dependencies.
>   - **Miscellaneous**:
> - Precompiled regex patterns for whitespace and non-alphanumeric
characters in `nlp.py`.
>     - Disabled unused components in spaCy pipeline for performance.
> 
> <sup>This description was created by </sup>[<img alt="Ellipsis"
src="https://img.shields.io/badge/Ellipsis-blue?color=175173">](https://www.ellipsis.dev?ref=julep-ai%2Fjulep&utm_source=github&utm_medium=referral)<sup>
for 0f4c4e0. It will automatically
update as commits are pushed.</sup>


<!-- ELLIPSIS_HIDDEN -->

---------

Signed-off-by: Diwank Singh Tomer <[email protected]>
Co-authored-by: HamadaSalhab <[email protected]>
Co-authored-by: Diwank Singh Tomer <[email protected]>

Loading branch information

3 people authored Oct 24, 2024

1 parent 7f7fce9 commit 65e912d

agents-api/Dockerfile

            
                      Original file line number
                      Diff line number
                      Diff line change
                  
    @@ -41,4 +41,4 @@ RUN poetry install --no-dev --no-root
  
    COPY . ./

    ENTRYPOINT ["python", "-m", "agents_api.web", "--host", "0.0.0.0", "--port", "8080"]

    ENTRYPOINT ["gunicorn", "agents_api.web:app", "-c", "gunicorn_conf.py"]

0 comments on commit `65e912d`

Please sign in to comment.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Commit

There are no files selected for viewing

0 comments on commit `65e912d`

Original file line number	Diff line number	Diff line change
Expand Up		@@ -41,4 +41,4 @@ RUN poetry install --no-dev --no-root

		COPY . ./

		ENTRYPOINT ["python", "-m", "agents_api.web", "--host", "0.0.0.0", "--port", "8080"]
		ENTRYPOINT ["gunicorn", "agents_api.web:app", "-c", "gunicorn_conf.py"]

Commit

There are no files selected for viewing

0 comments on commit 65e912d

0 comments on commit `65e912d`