-
Notifications
You must be signed in to change notification settings - Fork 355
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Clean up parsers imports #1959
Clean up parsers imports #1959
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 Looks good to me! Reviewed everything up to 73bb973 in 1 minute and 23 seconds
More details
- Looked at
1288
lines of code in27
files - Skipped
1
files when reviewing. - Skipped posting
14
drafted comments based on config settings.
1. py/core/parsers/media/audio_parser.py:30
- Draft comment:
Direct assignmentself.atranscription = atranscription
removes the try/except block that provided a user-friendly error. This will raise an ImportError immediately if litellm is missing—ensure this behavior is intended and documented. - Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 10% vs. threshold = 50%
The change moves the import to the module level, which is actually a more Pythonic approach. If litellm is missing, Python will already provide a clear ImportError with the missing package name. The old approach was actually anti-pattern - catching ImportError during class initialization is less clear than letting Python handle missing dependencies normally. The change simplifies the code and relies on Python's built-in import system.
The comment raises a valid point about error messaging. Maybe there was a specific reason they wanted custom error handling for this particular import.
Python's built-in import error is sufficient here. If litellm is a required dependency, it should be handled like any other required import - at the module level with standard Python error handling.
The comment should be deleted. The code change is actually an improvement, making the dependency requirements more explicit and following Python conventions.
2. py/core/parsers/media/doc_parser.py:28
- Draft comment:
Directly assigningself.olefile = olefile
without a try/except now causes module import errors at load time instead of a custom message. Ensure dependency docs are updated. - Reason this comment was not posted:
Marked as duplicate.
3. py/core/parsers/media/docx_parser.py:25
- Draft comment:
Removal of the import check for docx leads to an immediate ImportError if python-docx is missing. Confirm that this behavior and dependency instructions are acceptable. - Reason this comment was not posted:
Marked as duplicate.
4. py/core/parsers/media/img_parser.py:32
- Draft comment:
Directly importing and assigning pillow_heif and PIL.Image is cleaner, but ensure thatpillow_heif.register_heif_opener()
is idempotent and that its failure provides clear messages. - Reason this comment was not posted:
Marked as duplicate.
5. py/core/parsers/media/odt_parser.py:24
- Draft comment:
Direct assignment ofself.zipfile
andself.ET
simplifies the code. Ensure that missing XML or zipfile issues are caught elsewhere if needed. - Reason this comment was not posted:
Marked as duplicate.
6. py/core/parsers/media/pdf_parser.py:43
- Draft comment:
Removal of the try/except for importing LiteLLM vision support means the dependency check is removed. Confirm that vision support dependencies are now documented clearly. - Reason this comment was not posted:
Marked as duplicate.
7. py/core/parsers/structured/p7s_parser.py:32
- Draft comment:
The direct assignment of cryptography libraries (x509, pkcs7, NameOID) removes custom error handling. Ensure that users are aware these dependencies must be installed. - Reason this comment was not posted:
Marked as duplicate.
8. py/core/providers/llm/litellm.py:16
- Draft comment:
Settinglitellm.modify_params = True
in the constructor affects global behavior. Verify that this change is safe and does not unintentionally affect other parts of the application. - Reason this comment was not posted:
Marked as duplicate.
9. py/pyproject.toml:75
- Draft comment:
New optional dependencies like 'pillow' and 'pillow-heif' have been added. Ensure that they are correctly versioned and documented as required for the image parsers. - Reason this comment was not posted:
Comment did not seem useful. Confidence is useful =20%
<= threshold50%
The comment is about new optional dependencies being added and suggests ensuring they are correctly versioned and documented. However, it doesn't ask for a specific action or point out a specific issue with the code. It seems to be more of a general reminder rather than a specific code suggestion or issue. According to the rules, comments should not be about dependencies unless they are specific about a version mismatch or similar issue. This comment doesn't meet those criteria.
10. py/core/parsers/media/audio_parser.py:30
- Draft comment:
Directly assigning atranscription simplifies the code, but it removes runtime ImportError handling. Ensure that litellm is enforced as a dependency. - Reason this comment was not posted:
Marked as duplicate.
11. py/core/parsers/media/doc_parser.py:28
- Draft comment:
Directly assigning olefile assumes the dependency is installed. Confirm that olefile is part of your requirements or extras. - Reason this comment was not posted:
Marked as duplicate.
12. py/core/parsers/media/img_parser.py:32
- Draft comment:
Calling register_heif_opener() directly is fine if idempotent; verify that multiple invocations won't cause unintended side effects. - Reason this comment was not posted:
Confidence changes required:50%
<= threshold50%
None
13. py/core/parsers/media/pdf_parser.py:216
- Draft comment:
Assigning PdfReader directly simplifies the code but removes custom error messaging. Ensure that pypdf is installed to prevent runtime issues. - Reason this comment was not posted:
Marked as duplicate.
14. py/core/parsers/structured/xlsx_parser.py:59
- Draft comment:
Using 'np.where(arr is None)' does not perform element-wise comparison. Consider using 'np.where(arr == None)' or an alternative element-wise check. - Reason this comment was not posted:
Comment was not on a location in the diff, so it can't be submitted as a review comment.
Workflow ID: wflow_AmODqtieLpU6gXeW
You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet
mode, and more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 Looks good to me! Incremental review on 9c5efea in 1 minute and 7 seconds
More details
- Looked at
91
lines of code in1
files - Skipped
0
files when reviewing. - Skipped posting
12
drafted comments based on config settings.
1. py/core/providers/database/chunks.py:23
- Draft comment:
Remove unused custom exceptions import; ValueError is now used instead. - Reason this comment was not posted:
Comment was not on a location in the diff, so it can't be submitted as a review comment.
2. py/core/providers/database/chunks.py:775
- Draft comment:
Update raise type in docstring from ArgError to ValueError for consistency. - Reason this comment was not posted:
Comment did not seem useful. Confidence is useful =0%
<= threshold50%
This comment is purely informative, suggesting a change in the docstring for consistency. It doesn't ask for a specific code change or test, nor does it point out a potential issue with the code itself. Therefore, it violates the rule against making purely informative comments.
3. py/core/providers/database/chunks.py:807
- Draft comment:
Raise ValueError for invalid table name instead of ArgError. - Reason this comment was not posted:
Comment looked like it was already resolved.
4. py/core/providers/database/chunks.py:815
- Draft comment:
Use ValueError for an invalid index method (replacing ArgError). - Reason this comment was not posted:
Marked as duplicate.
5. py/core/providers/database/chunks.py:822
- Draft comment:
Change error on disallowed index parameters when using IndexMethod.auto to ValueError. - Reason this comment was not posted:
Marked as duplicate.
6. py/core/providers/database/chunks.py:829
- Draft comment:
Replace ArgError with ValueError for index build parameters mismatch. - Reason this comment was not posted:
Comment was not on a location in the diff, so it can't be submitted as a review comment.
7. py/core/providers/database/chunks.py:845
- Draft comment:
Use ValueError for unknown index measure rather than ArgError. - Reason this comment was not posted:
Comment looked like it was already resolved.
8. py/core/providers/database/chunks.py:977
- Draft comment:
In delete_index, update exception type in docstring to ValueError. - Reason this comment was not posted:
Comment looked like it was already resolved.
9. py/core/providers/database/chunks.py:1000
- Draft comment:
Raise ValueError for invalid table name in delete_index (was ArgError). - Reason this comment was not posted:
Marked as duplicate.
10. py/core/providers/database/chunks.py:1020
- Draft comment:
Switch to ValueError for non-existent vector index in delete_index. - Reason this comment was not posted:
Comment did not seem useful. Confidence is useful =0%
<= threshold50%
This comment is purely informative and does not provide any actionable feedback or suggestions. It simply states what was done without any context or question. It doesn't ask for confirmation or suggest an improvement.
11. py/core/providers/database/chunks.py:25
- Draft comment:
Removed unused import: 'from .vecs.exc import ArgError, FilterError'. Ensure these exceptions are not used elsewhere. - Reason this comment was not posted:
Comment did not seem useful. Confidence is useful =30%
<= threshold50%
The comment is pointing out the removal of unused imports, which is a valid observation. However, it asks the author to ensure these exceptions are not used elsewhere, which violates the rule against asking the author to confirm or ensure things. The comment should focus on the fact that the imports are removed and not on asking for confirmation.
12. py/core/providers/database/chunks.py:775
- Draft comment:
Replaced raises using ArgError with ValueError. Verify that external code catching these exceptions is updated accordingly. - Reason this comment was not posted:
Marked as duplicate.
Workflow ID: wflow_aFJEFB2Zb0Pb7PO6
You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet
mode, and more.
Important
Clean up import statements in parsers by removing try-except blocks and directly importing modules, remove unused
vecs
files, and update dependencies.audio_parser.py
,doc_parser.py
,docx_parser.py
,img_parser.py
,odt_parser.py
,pdf_parser.py
,ppt_parser.py
,pptx_parser.py
,rtf_parser.py
,epub_parser.py
,msg_parser.py
,org_parser.py
,p7s_parser.py
,rst_parser.py
,tiff_parser.py
,xls_parser.py
,xlsx_parser.py
.vecs
related files:vecs/__init__.py
,vecs/adapter/__init__.py
,vecs/adapter/base.py
,vecs/adapter/markdown.py
,vecs/adapter/noop.py
,vecs/adapter/text.py
,vecs/exc.py
.ArgError
withValueError
inchunks.py
.pillow
topyproject.toml
dependencies.This description was created by
for 9c5efea. It will automatically update as commits are pushed.