Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix 'fast' ingestion, type errors #1957

Merged
merged 2 commits into from
Feb 11, 2025
Merged

Fix 'fast' ingestion, type errors #1957

merged 2 commits into from
Feb 11, 2025

Conversation

NolanTrem
Copy link
Collaborator

@NolanTrem NolanTrem commented Feb 10, 2025

Important

Fix type errors, add tests for 'fast' ingestion mode, and update type annotations across multiple files.

  • Behavior:
    • Add tests for 'fast' ingestion mode in DocumentsIntegrationSuperUser.test.ts.
    • Comment out community-related tests in GraphsIntegrationSuperUser.test.ts.
    • Remove compose.test.yaml file.
  • Functions:
    • Add createSample() method in documents.ts with updated URL handling.
    • Modify reasoning_agent() in retrieval.ts to handle streaming and non-streaming requests.
    • Update _websocket_auth_wrapper() in auth.py to raise exceptions instead of returning None.
  • Type Annotations:
    • Update type annotations in retrieval_service.py, jwt.py, r2r_auth.py, supabase.py, documents.py, postgres.py, users.py, base_client.py, async_client.py, sync_client.py, document.py, graph.py, responses.py, base_utils.py, and utils/__init__.py.
    • Change return type of list_user_api_keys() to list[dict] in auth.py and supabase.py.
  • Misc:
    • Remove to_async_generator from various files.
    • Fix minor issues in async_client.py and sync_client.py related to request handling.

This description was created by Ellipsis for 8e7c84e. It will automatically update as commits are pushed.

Copy link
Contributor

@ellipsis-dev ellipsis-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Looks good to me! Reviewed everything up to aad9dae in 1 minute and 12 seconds

More details
  • Looked at 551 lines of code in 18 files
  • Skipped 0 files when reviewing.
  • Skipped posting 14 drafted comments based on config settings.
1. shared/utils/base_utils.py:36
  • Draft comment:
    The _expand_citation_span_to_sentence function uses a simple heuristic. Consider clarifying its limitations (e.g. edge cases with punctuation) in the docstring.
  • Reason this comment was not posted:
    Comment was not on a location in the diff, so it can't be submitted as a review comment.
2. shared/utils/base_utils.py:70
  • Draft comment:
    In my_extract_citations, the regex pattern matches bracket numbers; consider handling potential false positives if the text contains similar patterns outside citations.
  • Reason this comment was not posted:
    Confidence changes required: 50% <= threshold 50%
    None
3. shared/utils/base_utils.py:107
  • Draft comment:
    In reassign_citations_in_order, ensure that updating the citation indices via re-extraction does not lose snippet context if new citation numbers differ from original ones.
  • Reason this comment was not posted:
    Comment was not on a location in the diff, so it can't be submitted as a review comment.
4. shared/utils/base_utils.py:183
  • Draft comment:
    In my_map_citations_to_sources, the mapping of citations to sources assumes order alignment; validate that the ordering of flat_source_list consistently reflects the intended citation order.
  • Reason this comment was not posted:
    Comment was not on a location in the diff, so it can't be submitted as a review comment.
5. shared/utils/base_utils.py:265
  • Draft comment:
    In format_search_results_for_llm, the try/except block in the graph search section could log more details on malformed results to ease debugging.
  • Reason this comment was not posted:
    Confidence changes required: 40% <= threshold 50%
    None
6. shared/utils/base_utils.py:421
  • Draft comment:
    The deep_update function implementation is taken from Pydantic; consider citing the source URL in the docstring to attribute the code.
  • Reason this comment was not posted:
    Confidence changes required: 40% <= threshold 50%
    None
7. shared/utils/base_utils.py:481
  • Draft comment:
    increment_version and decrement_version functions assume version strings have single digit at the end; consider documenting this limitation.
  • Reason this comment was not posted:
    Confidence changes required: 50% <= threshold 50%
    None
8. sdk/asnyc_methods/documents.py:1
  • Draft comment:
    Consider renaming the directory/file from 'asnyc_methods' to 'async_methods' to fix the typographical error and improve clarity.
  • Reason this comment was not posted:
    Comment was not on a location in the diff, so it can't be submitted as a review comment.
9. shared/api/models/retrieval/responses.py:221
  • Draft comment:
    WrappedDocumentSearchResponse is defined as R2RResults[list[DocumentResponse]], but the FIXME note indicates it should return DocumentSearchResult. Please resolve this type mismatch.
  • Reason this comment was not posted:
    Comment was not on a location in the diff, so it can't be submitted as a review comment.
10. shared/api/models/retrieval/responses.py:124
  • Draft comment:
    The example for the Citation model includes a 'uri' field, but the model definition does not include it. Consider either removing 'uri' from the example or adding it to the model.
  • Reason this comment was not posted:
    Comment was not on a location in the diff, so it can't be submitted as a review comment.
11. shared/utils/base_utils.py:497
  • Draft comment:
    Consider using a more robust merging strategy in update_settings_from_dict. Directly setting nested attributes may lead to unexpected behavior when the settings have deeper nested structures.
  • Reason this comment was not posted:
    Comment was not on a location in the diff, so it can't be submitted as a review comment.
12. shared/abstractions/document.py:120
  • Draft comment:
    The table_name() and id_column() methods for IngestionStatus and GraphExtractionStatus return different values. Verify that this discrepancy (e.g. 'document_id' vs 'id') is intentional.
  • Reason this comment was not posted:
    Comment was not on a location in the diff, so it can't be submitted as a review comment.
13. shared/utils/base_utils.py:436
  • Draft comment:
    The generate_extraction_id function concatenates document_id, iteration, and version to generate a UUID. Ensure that this approach sufficiently prevents collisions and meets your versioning requirements.
  • Reason this comment was not posted:
    Comment was not on a location in the diff, so it can't be submitted as a review comment.
14. sdk/async_client.py:60
  • Draft comment:
    The _make_request method raises an exception if accessing 'https://api.cloud.sciphi.ai' without credentials. Please confirm that this behavior is intentional and consider documenting it for consistency in client usage.
  • Reason this comment was not posted:
    Comment was not on a location in the diff, so it can't be submitted as a review comment.

Workflow ID: wflow_xT8xAhl8bgPpf9Ob


You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

@NolanTrem NolanTrem changed the title Fix Type Errors Fix 'fast' ingestion, type errors Feb 11, 2025
Copy link
Contributor

@ellipsis-dev ellipsis-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Looks good to me! Incremental review on 8e7c84e in 56 seconds

More details
  • Looked at 644 lines of code in 17 files
  • Skipped 0 files when reviewing.
  • Skipped posting 12 drafted comments based on config settings.
1. py/shared/abstractions/graph.py:88
  • Draft comment:
    In the Community init, you check for a key named 'embedding' instead of 'description_embedding' (the field defined in the class). Verify that this is intended or update to match the field name.
  • Reason this comment was not posted:
    Comment was not on a location in the diff, so it can't be submitted as a review comment.
2. py/shared/utils/base_utils.py:122
  • Draft comment:
    In reassign_citations_in_order, the logic re-extracts citations after in-place text replacement. This works but may be performance‐intensive for large texts. Consider adding comments on expected input sizes or optimizing if necessary.
  • Reason this comment was not posted:
    Comment was not on a location in the diff, so it can't be submitted as a review comment.
3. py/shared/utils/base_utils.py:536
  • Draft comment:
    The deep_update function is clear but note that using recursion for deep merging may become a bottleneck with very deeply nested dictionaries. This is acceptable if input sizes are moderate.
  • Reason this comment was not posted:
    Confidence changes required: 50% <= threshold 50%
    None
4. py/core/main/services/retrieval_service.py:145
  • Draft comment:
    When iterating over search_settings.filters and converting UUIDs to strings, using list(filters.items()) is acceptable; just ensure this mutation is well-documented.
  • Reason this comment was not posted:
    Confidence changes required: 30% <= threshold 50%
    None
5. py/shared/utils/base_utils.py:416
  • Draft comment:
    generate_id uses uuid5 with NAMESPACE_DNS which is deterministic. Ensure this behavior is intended for generating unique run IDs.
  • Reason this comment was not posted:
    Comment was not on a location in the diff, so it can't be submitted as a review comment.
6. py/shared/utils/base_utils.py:516
  • Draft comment:
    In _get_vector_column_str, returning an empty string for invalid dimensions may be fine, but consider documenting why this is desired behavior.
  • Reason this comment was not posted:
    Confidence changes required: 40% <= threshold 50%
    None
7. py/shared/utils/base_utils.py:70
  • Draft comment:
    The citation extraction regex in my_extract_citations assumes simple [n] formats. If more complex references are expected, consider extending the regex.
  • Reason this comment was not posted:
    Confidence changes required: 40% <= threshold 50%
    None
8. py/shared/utils/base_utils.py:44
  • Draft comment:
    Good implementation of sentence expansion. Consider handling the case when no sentence ender is found at all, to avoid an infinite loop or unexpected boundary results.
  • Reason this comment was not posted:
    Comment was not on a location in the diff, so it can't be submitted as a review comment.
9. py/shared/utils/base_utils.py:73
  • Draft comment:
    In my_extract_citations, the regex pattern r"[(\d+)]" works for simple cases but may miss nested cases or non-numeric content. Ensure this matches your expected citation formats.
  • Reason this comment was not posted:
    Comment was not on a location in the diff, so it can't be submitted as a review comment.
10. py/shared/utils/base_utils.py:116
  • Draft comment:
    In reassign_citations_in_order, replacing characters in-place on a large string via list(text) may have performance implications for very long texts. Consider using more efficient string replacement methods if performance becomes an issue.
  • Reason this comment was not posted:
    Confidence changes required: 50% <= threshold 50%
    None
11. py/shared/utils/base_utils.py:446
  • Draft comment:
    The increment_version and decrement_version functions assume the version string has a consistent format. Adding validation to ensure the version string meets expected criteria could prevent run-time errors.
  • Reason this comment was not posted:
    Comment was not on a location in the diff, so it can't be submitted as a review comment.
12. py/shared/utils/base_utils.py:490
  • Draft comment:
    In update_settings_from_dict, the nested setattr call may fail if the attribute accessed is None. Consider adding a check or fallback if getattr(settings, key) returns None.
  • Reason this comment was not posted:
    Comment was not on a location in the diff, so it can't be submitted as a review comment.

Workflow ID: wflow_X3Qk3ZgsBv5cGlcJ


You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

@NolanTrem NolanTrem linked an issue Feb 11, 2025 that may be closed by this pull request
@NolanTrem NolanTrem merged commit 9062135 into main Feb 11, 2025
14 of 15 checks passed
@NolanTrem NolanTrem deleted the Nolan/TypeErrors branch February 11, 2025 03:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

SDK client.documents.create does not respect ingestion_mode setting
1 participant