-
Notifications
You must be signed in to change notification settings - Fork 355
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix 'fast' ingestion, type errors #1957
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 Looks good to me! Reviewed everything up to aad9dae in 1 minute and 12 seconds
More details
- Looked at
551
lines of code in18
files - Skipped
0
files when reviewing. - Skipped posting
14
drafted comments based on config settings.
1. shared/utils/base_utils.py:36
- Draft comment:
The _expand_citation_span_to_sentence function uses a simple heuristic. Consider clarifying its limitations (e.g. edge cases with punctuation) in the docstring. - Reason this comment was not posted:
Comment was not on a location in the diff, so it can't be submitted as a review comment.
2. shared/utils/base_utils.py:70
- Draft comment:
In my_extract_citations, the regex pattern matches bracket numbers; consider handling potential false positives if the text contains similar patterns outside citations. - Reason this comment was not posted:
Confidence changes required:50%
<= threshold50%
None
3. shared/utils/base_utils.py:107
- Draft comment:
In reassign_citations_in_order, ensure that updating the citation indices via re-extraction does not lose snippet context if new citation numbers differ from original ones. - Reason this comment was not posted:
Comment was not on a location in the diff, so it can't be submitted as a review comment.
4. shared/utils/base_utils.py:183
- Draft comment:
In my_map_citations_to_sources, the mapping of citations to sources assumes order alignment; validate that the ordering of flat_source_list consistently reflects the intended citation order. - Reason this comment was not posted:
Comment was not on a location in the diff, so it can't be submitted as a review comment.
5. shared/utils/base_utils.py:265
- Draft comment:
In format_search_results_for_llm, the try/except block in the graph search section could log more details on malformed results to ease debugging. - Reason this comment was not posted:
Confidence changes required:40%
<= threshold50%
None
6. shared/utils/base_utils.py:421
- Draft comment:
The deep_update function implementation is taken from Pydantic; consider citing the source URL in the docstring to attribute the code. - Reason this comment was not posted:
Confidence changes required:40%
<= threshold50%
None
7. shared/utils/base_utils.py:481
- Draft comment:
increment_version and decrement_version functions assume version strings have single digit at the end; consider documenting this limitation. - Reason this comment was not posted:
Confidence changes required:50%
<= threshold50%
None
8. sdk/asnyc_methods/documents.py:1
- Draft comment:
Consider renaming the directory/file from 'asnyc_methods' to 'async_methods' to fix the typographical error and improve clarity. - Reason this comment was not posted:
Comment was not on a location in the diff, so it can't be submitted as a review comment.
9. shared/api/models/retrieval/responses.py:221
- Draft comment:
WrappedDocumentSearchResponse is defined as R2RResults[list[DocumentResponse]], but the FIXME note indicates it should return DocumentSearchResult. Please resolve this type mismatch. - Reason this comment was not posted:
Comment was not on a location in the diff, so it can't be submitted as a review comment.
10. shared/api/models/retrieval/responses.py:124
- Draft comment:
The example for the Citation model includes a 'uri' field, but the model definition does not include it. Consider either removing 'uri' from the example or adding it to the model. - Reason this comment was not posted:
Comment was not on a location in the diff, so it can't be submitted as a review comment.
11. shared/utils/base_utils.py:497
- Draft comment:
Consider using a more robust merging strategy in update_settings_from_dict. Directly setting nested attributes may lead to unexpected behavior when the settings have deeper nested structures. - Reason this comment was not posted:
Comment was not on a location in the diff, so it can't be submitted as a review comment.
12. shared/abstractions/document.py:120
- Draft comment:
The table_name() and id_column() methods for IngestionStatus and GraphExtractionStatus return different values. Verify that this discrepancy (e.g. 'document_id' vs 'id') is intentional. - Reason this comment was not posted:
Comment was not on a location in the diff, so it can't be submitted as a review comment.
13. shared/utils/base_utils.py:436
- Draft comment:
The generate_extraction_id function concatenates document_id, iteration, and version to generate a UUID. Ensure that this approach sufficiently prevents collisions and meets your versioning requirements. - Reason this comment was not posted:
Comment was not on a location in the diff, so it can't be submitted as a review comment.
14. sdk/async_client.py:60
- Draft comment:
The _make_request method raises an exception if accessing 'https://api.cloud.sciphi.ai' without credentials. Please confirm that this behavior is intentional and consider documenting it for consistency in client usage. - Reason this comment was not posted:
Comment was not on a location in the diff, so it can't be submitted as a review comment.
Workflow ID: wflow_xT8xAhl8bgPpf9Ob
You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet
mode, and more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 Looks good to me! Incremental review on 8e7c84e in 56 seconds
More details
- Looked at
644
lines of code in17
files - Skipped
0
files when reviewing. - Skipped posting
12
drafted comments based on config settings.
1. py/shared/abstractions/graph.py:88
- Draft comment:
In the Community init, you check for a key named 'embedding' instead of 'description_embedding' (the field defined in the class). Verify that this is intended or update to match the field name. - Reason this comment was not posted:
Comment was not on a location in the diff, so it can't be submitted as a review comment.
2. py/shared/utils/base_utils.py:122
- Draft comment:
In reassign_citations_in_order, the logic re-extracts citations after in-place text replacement. This works but may be performance‐intensive for large texts. Consider adding comments on expected input sizes or optimizing if necessary. - Reason this comment was not posted:
Comment was not on a location in the diff, so it can't be submitted as a review comment.
3. py/shared/utils/base_utils.py:536
- Draft comment:
The deep_update function is clear but note that using recursion for deep merging may become a bottleneck with very deeply nested dictionaries. This is acceptable if input sizes are moderate. - Reason this comment was not posted:
Confidence changes required:50%
<= threshold50%
None
4. py/core/main/services/retrieval_service.py:145
- Draft comment:
When iterating over search_settings.filters and converting UUIDs to strings, using list(filters.items()) is acceptable; just ensure this mutation is well-documented. - Reason this comment was not posted:
Confidence changes required:30%
<= threshold50%
None
5. py/shared/utils/base_utils.py:416
- Draft comment:
generate_id uses uuid5 with NAMESPACE_DNS which is deterministic. Ensure this behavior is intended for generating unique run IDs. - Reason this comment was not posted:
Comment was not on a location in the diff, so it can't be submitted as a review comment.
6. py/shared/utils/base_utils.py:516
- Draft comment:
In _get_vector_column_str, returning an empty string for invalid dimensions may be fine, but consider documenting why this is desired behavior. - Reason this comment was not posted:
Confidence changes required:40%
<= threshold50%
None
7. py/shared/utils/base_utils.py:70
- Draft comment:
The citation extraction regex in my_extract_citations assumes simple [n] formats. If more complex references are expected, consider extending the regex. - Reason this comment was not posted:
Confidence changes required:40%
<= threshold50%
None
8. py/shared/utils/base_utils.py:44
- Draft comment:
Good implementation of sentence expansion. Consider handling the case when no sentence ender is found at all, to avoid an infinite loop or unexpected boundary results. - Reason this comment was not posted:
Comment was not on a location in the diff, so it can't be submitted as a review comment.
9. py/shared/utils/base_utils.py:73
- Draft comment:
In my_extract_citations, the regex pattern r"[(\d+)]" works for simple cases but may miss nested cases or non-numeric content. Ensure this matches your expected citation formats. - Reason this comment was not posted:
Comment was not on a location in the diff, so it can't be submitted as a review comment.
10. py/shared/utils/base_utils.py:116
- Draft comment:
In reassign_citations_in_order, replacing characters in-place on a large string via list(text) may have performance implications for very long texts. Consider using more efficient string replacement methods if performance becomes an issue. - Reason this comment was not posted:
Confidence changes required:50%
<= threshold50%
None
11. py/shared/utils/base_utils.py:446
- Draft comment:
The increment_version and decrement_version functions assume the version string has a consistent format. Adding validation to ensure the version string meets expected criteria could prevent run-time errors. - Reason this comment was not posted:
Comment was not on a location in the diff, so it can't be submitted as a review comment.
12. py/shared/utils/base_utils.py:490
- Draft comment:
In update_settings_from_dict, the nested setattr call may fail if the attribute accessed is None. Consider adding a check or fallback if getattr(settings, key) returns None. - Reason this comment was not posted:
Comment was not on a location in the diff, so it can't be submitted as a review comment.
Workflow ID: wflow_X3Qk3ZgsBv5cGlcJ
You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet
mode, and more.
Important
Fix type errors, add tests for 'fast' ingestion mode, and update type annotations across multiple files.
DocumentsIntegrationSuperUser.test.ts
.GraphsIntegrationSuperUser.test.ts
.compose.test.yaml
file.createSample()
method indocuments.ts
with updated URL handling.reasoning_agent()
inretrieval.ts
to handle streaming and non-streaming requests._websocket_auth_wrapper()
inauth.py
to raise exceptions instead of returningNone
.retrieval_service.py
,jwt.py
,r2r_auth.py
,supabase.py
,documents.py
,postgres.py
,users.py
,base_client.py
,async_client.py
,sync_client.py
,document.py
,graph.py
,responses.py
,base_utils.py
, andutils/__init__.py
.list_user_api_keys()
tolist[dict]
inauth.py
andsupabase.py
.to_async_generator
from various files.async_client.py
andsync_client.py
related to request handling.This description was created by
for 8e7c84e. It will automatically update as commits are pushed.