Fix 'fast' ingestion, type errors #1957

NolanTrem · 2025-02-10T22:03:41Z

Important

Fix type errors, add tests for 'fast' ingestion mode, and update type annotations across multiple files.

Behavior:
- Add tests for 'fast' ingestion mode in DocumentsIntegrationSuperUser.test.ts.
- Comment out community-related tests in GraphsIntegrationSuperUser.test.ts.
- Remove compose.test.yaml file.
Functions:
- Add createSample() method in documents.ts with updated URL handling.
- Modify reasoning_agent() in retrieval.ts to handle streaming and non-streaming requests.
- Update _websocket_auth_wrapper() in auth.py to raise exceptions instead of returning None.
Type Annotations:
- Update type annotations in retrieval_service.py, jwt.py, r2r_auth.py, supabase.py, documents.py, postgres.py, users.py, base_client.py, async_client.py, sync_client.py, document.py, graph.py, responses.py, base_utils.py, and utils/__init__.py.
- Change return type of list_user_api_keys() to list[dict] in auth.py and supabase.py.
Misc:
- Remove to_async_generator from various files.
- Fix minor issues in async_client.py and sync_client.py related to request handling.

^{This description was created by}^{for 8e7c84e. It will automatically update as commits are pushed.}

ellipsis-dev

👍 Looks good to me! Reviewed everything up to aad9dae in 1 minute and 12 seconds

More details

Looked at 551 lines of code in 18 files
Skipped 0 files when reviewing.
Skipped posting 14 drafted comments based on config settings.

1. shared/utils/base_utils.py:36

Draft comment:
The _expand_citation_span_to_sentence function uses a simple heuristic. Consider clarifying its limitations (e.g. edge cases with punctuation) in the docstring.
Reason this comment was not posted:
Comment was not on a location in the diff, so it can't be submitted as a review comment.

2. shared/utils/base_utils.py:70

Draft comment:
In my_extract_citations, the regex pattern matches bracket numbers; consider handling potential false positives if the text contains similar patterns outside citations.
Reason this comment was not posted:
Confidence changes required: 50% <= threshold 50%
None

3. shared/utils/base_utils.py:107

Draft comment:
In reassign_citations_in_order, ensure that updating the citation indices via re-extraction does not lose snippet context if new citation numbers differ from original ones.
Reason this comment was not posted:
Comment was not on a location in the diff, so it can't be submitted as a review comment.

4. shared/utils/base_utils.py:183

Draft comment:
In my_map_citations_to_sources, the mapping of citations to sources assumes order alignment; validate that the ordering of flat_source_list consistently reflects the intended citation order.
Reason this comment was not posted:
Comment was not on a location in the diff, so it can't be submitted as a review comment.

5. shared/utils/base_utils.py:265

Draft comment:
In format_search_results_for_llm, the try/except block in the graph search section could log more details on malformed results to ease debugging.
Reason this comment was not posted:
Confidence changes required: 40% <= threshold 50%
None

6. shared/utils/base_utils.py:421

Draft comment:
The deep_update function implementation is taken from Pydantic; consider citing the source URL in the docstring to attribute the code.
Reason this comment was not posted:
Confidence changes required: 40% <= threshold 50%
None

7. shared/utils/base_utils.py:481

Draft comment:
increment_version and decrement_version functions assume version strings have single digit at the end; consider documenting this limitation.
Reason this comment was not posted:
Confidence changes required: 50% <= threshold 50%
None

8. sdk/asnyc_methods/documents.py:1

Draft comment:
Consider renaming the directory/file from 'asnyc_methods' to 'async_methods' to fix the typographical error and improve clarity.
Reason this comment was not posted:
Comment was not on a location in the diff, so it can't be submitted as a review comment.

9. shared/api/models/retrieval/responses.py:221

Draft comment:
WrappedDocumentSearchResponse is defined as R2RResults[list[DocumentResponse]], but the FIXME note indicates it should return DocumentSearchResult. Please resolve this type mismatch.
Reason this comment was not posted:
Comment was not on a location in the diff, so it can't be submitted as a review comment.

10. shared/api/models/retrieval/responses.py:124

Draft comment:
The example for the Citation model includes a 'uri' field, but the model definition does not include it. Consider either removing 'uri' from the example or adding it to the model.
Reason this comment was not posted:
Comment was not on a location in the diff, so it can't be submitted as a review comment.

11. shared/utils/base_utils.py:497

Draft comment:
Consider using a more robust merging strategy in update_settings_from_dict. Directly setting nested attributes may lead to unexpected behavior when the settings have deeper nested structures.
Reason this comment was not posted:
Comment was not on a location in the diff, so it can't be submitted as a review comment.

12. shared/abstractions/document.py:120

Draft comment:
The table_name() and id_column() methods for IngestionStatus and GraphExtractionStatus return different values. Verify that this discrepancy (e.g. 'document_id' vs 'id') is intentional.
Reason this comment was not posted:
Comment was not on a location in the diff, so it can't be submitted as a review comment.

13. shared/utils/base_utils.py:436

Draft comment:
The generate_extraction_id function concatenates document_id, iteration, and version to generate a UUID. Ensure that this approach sufficiently prevents collisions and meets your versioning requirements.
Reason this comment was not posted:
Comment was not on a location in the diff, so it can't be submitted as a review comment.

14. sdk/async_client.py:60

Draft comment:
The _make_request method raises an exception if accessing 'https://api.cloud.sciphi.ai' without credentials. Please confirm that this behavior is intentional and consider documenting it for consistency in client usage.
Reason this comment was not posted:
Comment was not on a location in the diff, so it can't be submitted as a review comment.

Workflow ID: wflow_xT8xAhl8bgPpf9Ob

You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

ellipsis-dev

👍 Looks good to me! Incremental review on 8e7c84e in 56 seconds

More details

Looked at 644 lines of code in 17 files
Skipped 0 files when reviewing.
Skipped posting 12 drafted comments based on config settings.

1. py/shared/abstractions/graph.py:88

Draft comment:
In the Community init, you check for a key named 'embedding' instead of 'description_embedding' (the field defined in the class). Verify that this is intended or update to match the field name.
Reason this comment was not posted:
Comment was not on a location in the diff, so it can't be submitted as a review comment.

2. py/shared/utils/base_utils.py:122

Draft comment:
In reassign_citations_in_order, the logic re-extracts citations after in-place text replacement. This works but may be performance‐intensive for large texts. Consider adding comments on expected input sizes or optimizing if necessary.
Reason this comment was not posted:
Comment was not on a location in the diff, so it can't be submitted as a review comment.

3. py/shared/utils/base_utils.py:536

Draft comment:
The deep_update function is clear but note that using recursion for deep merging may become a bottleneck with very deeply nested dictionaries. This is acceptable if input sizes are moderate.
Reason this comment was not posted:
Confidence changes required: 50% <= threshold 50%
None

4. py/core/main/services/retrieval_service.py:145

Draft comment:
When iterating over search_settings.filters and converting UUIDs to strings, using list(filters.items()) is acceptable; just ensure this mutation is well-documented.
Reason this comment was not posted:
Confidence changes required: 30% <= threshold 50%
None

5. py/shared/utils/base_utils.py:416

Draft comment:
generate_id uses uuid5 with NAMESPACE_DNS which is deterministic. Ensure this behavior is intended for generating unique run IDs.
Reason this comment was not posted:
Comment was not on a location in the diff, so it can't be submitted as a review comment.

6. py/shared/utils/base_utils.py:516

Draft comment:
In _get_vector_column_str, returning an empty string for invalid dimensions may be fine, but consider documenting why this is desired behavior.
Reason this comment was not posted:
Confidence changes required: 40% <= threshold 50%
None

7. py/shared/utils/base_utils.py:70

Draft comment:
The citation extraction regex in my_extract_citations assumes simple [n] formats. If more complex references are expected, consider extending the regex.
Reason this comment was not posted:
Confidence changes required: 40% <= threshold 50%
None

8. py/shared/utils/base_utils.py:44

Draft comment:
Good implementation of sentence expansion. Consider handling the case when no sentence ender is found at all, to avoid an infinite loop or unexpected boundary results.
Reason this comment was not posted:
Comment was not on a location in the diff, so it can't be submitted as a review comment.

9. py/shared/utils/base_utils.py:73

Draft comment:
In my_extract_citations, the regex pattern r"[(\d+)]" works for simple cases but may miss nested cases or non-numeric content. Ensure this matches your expected citation formats.
Reason this comment was not posted:
Comment was not on a location in the diff, so it can't be submitted as a review comment.

10. py/shared/utils/base_utils.py:116

Draft comment:
In reassign_citations_in_order, replacing characters in-place on a large string via list(text) may have performance implications for very long texts. Consider using more efficient string replacement methods if performance becomes an issue.
Reason this comment was not posted:
Confidence changes required: 50% <= threshold 50%
None

11. py/shared/utils/base_utils.py:446

Draft comment:
The increment_version and decrement_version functions assume the version string has a consistent format. Adding validation to ensure the version string meets expected criteria could prevent run-time errors.
Reason this comment was not posted:
Comment was not on a location in the diff, so it can't be submitted as a review comment.

12. py/shared/utils/base_utils.py:490

Draft comment:
In update_settings_from_dict, the nested setattr call may fail if the attribute accessed is None. Consider adding a check or fallback if getattr(settings, key) returns None.
Reason this comment was not posted:
Comment was not on a location in the diff, so it can't be submitted as a review comment.

Workflow ID: wflow_X3Qk3ZgsBv5cGlcJ

You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

Clean up type errors

aad9dae

ellipsis-dev bot reviewed Feb 10, 2025

View reviewed changes

Fix fast ingestion

8e7c84e

NolanTrem changed the title ~~Fix Type Errors~~ Fix 'fast' ingestion, type errors Feb 11, 2025

ellipsis-dev bot reviewed Feb 11, 2025

View reviewed changes

NolanTrem mentioned this pull request Feb 11, 2025

SDK client.documents.create does not respect ingestion_mode setting #1844

Closed

NolanTrem linked an issue Feb 11, 2025 that may be closed by this pull request

SDK client.documents.create does not respect ingestion_mode setting #1844

Closed

NolanTrem merged commit 9062135 into main Feb 11, 2025
14 of 15 checks passed

NolanTrem deleted the Nolan/TypeErrors branch February 11, 2025 03:53

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix 'fast' ingestion, type errors #1957

Fix 'fast' ingestion, type errors #1957

NolanTrem commented Feb 10, 2025 •

edited by ellipsis-dev bot

Loading

ellipsis-dev bot left a comment

ellipsis-dev bot left a comment

Fix 'fast' ingestion, type errors #1957

Fix 'fast' ingestion, type errors #1957

Conversation

NolanTrem commented Feb 10, 2025 • edited by ellipsis-dev bot Loading

ellipsis-dev bot left a comment

Choose a reason for hiding this comment

ellipsis-dev bot left a comment

Choose a reason for hiding this comment

NolanTrem commented Feb 10, 2025 •

edited by ellipsis-dev bot

Loading