Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clean up parsers imports #1959

Merged
merged 2 commits into from
Feb 11, 2025
Merged

Clean up parsers imports #1959

merged 2 commits into from
Feb 11, 2025

Conversation

NolanTrem
Copy link
Collaborator

@NolanTrem NolanTrem commented Feb 11, 2025

Important

Clean up import statements in parsers by removing try-except blocks and directly importing modules, remove unused vecs files, and update dependencies.

  • Parsers:
    • Removed try-except blocks for imports in audio_parser.py, doc_parser.py, docx_parser.py, img_parser.py, odt_parser.py, pdf_parser.py, ppt_parser.py, pptx_parser.py, rtf_parser.py, epub_parser.py, msg_parser.py, org_parser.py, p7s_parser.py, rst_parser.py, tiff_parser.py, xls_parser.py, xlsx_parser.py.
    • Directly import necessary modules assuming they are installed.
  • Database:
    • Removed vecs related files: vecs/__init__.py, vecs/adapter/__init__.py, vecs/adapter/base.py, vecs/adapter/markdown.py, vecs/adapter/noop.py, vecs/adapter/text.py, vecs/exc.py.
    • Replaced ArgError with ValueError in chunks.py.
  • Dependencies:
    • Added pillow to pyproject.toml dependencies.

This description was created by Ellipsis for 9c5efea. It will automatically update as commits are pushed.

Copy link
Contributor

@ellipsis-dev ellipsis-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Looks good to me! Reviewed everything up to 73bb973 in 1 minute and 23 seconds

More details
  • Looked at 1288 lines of code in 27 files
  • Skipped 1 files when reviewing.
  • Skipped posting 14 drafted comments based on config settings.
1. py/core/parsers/media/audio_parser.py:30
  • Draft comment:
    Direct assignment self.atranscription = atranscription removes the try/except block that provided a user-friendly error. This will raise an ImportError immediately if litellm is missing—ensure this behavior is intended and documented.
  • Reason this comment was not posted:
    Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 10% vs. threshold = 50%
    The change moves the import to the module level, which is actually a more Pythonic approach. If litellm is missing, Python will already provide a clear ImportError with the missing package name. The old approach was actually anti-pattern - catching ImportError during class initialization is less clear than letting Python handle missing dependencies normally. The change simplifies the code and relies on Python's built-in import system.
    The comment raises a valid point about error messaging. Maybe there was a specific reason they wanted custom error handling for this particular import.
    Python's built-in import error is sufficient here. If litellm is a required dependency, it should be handled like any other required import - at the module level with standard Python error handling.
    The comment should be deleted. The code change is actually an improvement, making the dependency requirements more explicit and following Python conventions.
2. py/core/parsers/media/doc_parser.py:28
  • Draft comment:
    Directly assigning self.olefile = olefile without a try/except now causes module import errors at load time instead of a custom message. Ensure dependency docs are updated.
  • Reason this comment was not posted:
    Marked as duplicate.
3. py/core/parsers/media/docx_parser.py:25
  • Draft comment:
    Removal of the import check for docx leads to an immediate ImportError if python-docx is missing. Confirm that this behavior and dependency instructions are acceptable.
  • Reason this comment was not posted:
    Marked as duplicate.
4. py/core/parsers/media/img_parser.py:32
  • Draft comment:
    Directly importing and assigning pillow_heif and PIL.Image is cleaner, but ensure that pillow_heif.register_heif_opener() is idempotent and that its failure provides clear messages.
  • Reason this comment was not posted:
    Marked as duplicate.
5. py/core/parsers/media/odt_parser.py:24
  • Draft comment:
    Direct assignment of self.zipfile and self.ET simplifies the code. Ensure that missing XML or zipfile issues are caught elsewhere if needed.
  • Reason this comment was not posted:
    Marked as duplicate.
6. py/core/parsers/media/pdf_parser.py:43
  • Draft comment:
    Removal of the try/except for importing LiteLLM vision support means the dependency check is removed. Confirm that vision support dependencies are now documented clearly.
  • Reason this comment was not posted:
    Marked as duplicate.
7. py/core/parsers/structured/p7s_parser.py:32
  • Draft comment:
    The direct assignment of cryptography libraries (x509, pkcs7, NameOID) removes custom error handling. Ensure that users are aware these dependencies must be installed.
  • Reason this comment was not posted:
    Marked as duplicate.
8. py/core/providers/llm/litellm.py:16
  • Draft comment:
    Setting litellm.modify_params = True in the constructor affects global behavior. Verify that this change is safe and does not unintentionally affect other parts of the application.
  • Reason this comment was not posted:
    Marked as duplicate.
9. py/pyproject.toml:75
  • Draft comment:
    New optional dependencies like 'pillow' and 'pillow-heif' have been added. Ensure that they are correctly versioned and documented as required for the image parsers.
  • Reason this comment was not posted:
    Comment did not seem useful. Confidence is useful = 20% <= threshold 50%
    The comment is about new optional dependencies being added and suggests ensuring they are correctly versioned and documented. However, it doesn't ask for a specific action or point out a specific issue with the code. It seems to be more of a general reminder rather than a specific code suggestion or issue. According to the rules, comments should not be about dependencies unless they are specific about a version mismatch or similar issue. This comment doesn't meet those criteria.
10. py/core/parsers/media/audio_parser.py:30
  • Draft comment:
    Directly assigning atranscription simplifies the code, but it removes runtime ImportError handling. Ensure that litellm is enforced as a dependency.
  • Reason this comment was not posted:
    Marked as duplicate.
11. py/core/parsers/media/doc_parser.py:28
  • Draft comment:
    Directly assigning olefile assumes the dependency is installed. Confirm that olefile is part of your requirements or extras.
  • Reason this comment was not posted:
    Marked as duplicate.
12. py/core/parsers/media/img_parser.py:32
  • Draft comment:
    Calling register_heif_opener() directly is fine if idempotent; verify that multiple invocations won't cause unintended side effects.
  • Reason this comment was not posted:
    Confidence changes required: 50% <= threshold 50%
    None
13. py/core/parsers/media/pdf_parser.py:216
  • Draft comment:
    Assigning PdfReader directly simplifies the code but removes custom error messaging. Ensure that pypdf is installed to prevent runtime issues.
  • Reason this comment was not posted:
    Marked as duplicate.
14. py/core/parsers/structured/xlsx_parser.py:59
  • Draft comment:
    Using 'np.where(arr is None)' does not perform element-wise comparison. Consider using 'np.where(arr == None)' or an alternative element-wise check.
  • Reason this comment was not posted:
    Comment was not on a location in the diff, so it can't be submitted as a review comment.

Workflow ID: wflow_AmODqtieLpU6gXeW


You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

Copy link
Contributor

@ellipsis-dev ellipsis-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Looks good to me! Incremental review on 9c5efea in 1 minute and 7 seconds

More details
  • Looked at 91 lines of code in 1 files
  • Skipped 0 files when reviewing.
  • Skipped posting 12 drafted comments based on config settings.
1. py/core/providers/database/chunks.py:23
  • Draft comment:
    Remove unused custom exceptions import; ValueError is now used instead.
  • Reason this comment was not posted:
    Comment was not on a location in the diff, so it can't be submitted as a review comment.
2. py/core/providers/database/chunks.py:775
  • Draft comment:
    Update raise type in docstring from ArgError to ValueError for consistency.
  • Reason this comment was not posted:
    Comment did not seem useful. Confidence is useful = 0% <= threshold 50%
    This comment is purely informative, suggesting a change in the docstring for consistency. It doesn't ask for a specific code change or test, nor does it point out a potential issue with the code itself. Therefore, it violates the rule against making purely informative comments.
3. py/core/providers/database/chunks.py:807
  • Draft comment:
    Raise ValueError for invalid table name instead of ArgError.
  • Reason this comment was not posted:
    Comment looked like it was already resolved.
4. py/core/providers/database/chunks.py:815
  • Draft comment:
    Use ValueError for an invalid index method (replacing ArgError).
  • Reason this comment was not posted:
    Marked as duplicate.
5. py/core/providers/database/chunks.py:822
  • Draft comment:
    Change error on disallowed index parameters when using IndexMethod.auto to ValueError.
  • Reason this comment was not posted:
    Marked as duplicate.
6. py/core/providers/database/chunks.py:829
  • Draft comment:
    Replace ArgError with ValueError for index build parameters mismatch.
  • Reason this comment was not posted:
    Comment was not on a location in the diff, so it can't be submitted as a review comment.
7. py/core/providers/database/chunks.py:845
  • Draft comment:
    Use ValueError for unknown index measure rather than ArgError.
  • Reason this comment was not posted:
    Comment looked like it was already resolved.
8. py/core/providers/database/chunks.py:977
  • Draft comment:
    In delete_index, update exception type in docstring to ValueError.
  • Reason this comment was not posted:
    Comment looked like it was already resolved.
9. py/core/providers/database/chunks.py:1000
  • Draft comment:
    Raise ValueError for invalid table name in delete_index (was ArgError).
  • Reason this comment was not posted:
    Marked as duplicate.
10. py/core/providers/database/chunks.py:1020
  • Draft comment:
    Switch to ValueError for non-existent vector index in delete_index.
  • Reason this comment was not posted:
    Comment did not seem useful. Confidence is useful = 0% <= threshold 50%
    This comment is purely informative and does not provide any actionable feedback or suggestions. It simply states what was done without any context or question. It doesn't ask for confirmation or suggest an improvement.
11. py/core/providers/database/chunks.py:25
  • Draft comment:
    Removed unused import: 'from .vecs.exc import ArgError, FilterError'. Ensure these exceptions are not used elsewhere.
  • Reason this comment was not posted:
    Comment did not seem useful. Confidence is useful = 30% <= threshold 50%
    The comment is pointing out the removal of unused imports, which is a valid observation. However, it asks the author to ensure these exceptions are not used elsewhere, which violates the rule against asking the author to confirm or ensure things. The comment should focus on the fact that the imports are removed and not on asking for confirmation.
12. py/core/providers/database/chunks.py:775
  • Draft comment:
    Replaced raises using ArgError with ValueError. Verify that external code catching these exceptions is updated accordingly.
  • Reason this comment was not posted:
    Marked as duplicate.

Workflow ID: wflow_aFJEFB2Zb0Pb7PO6


You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

@NolanTrem NolanTrem merged commit 4377e0d into main Feb 11, 2025
13 of 14 checks passed
@NolanTrem NolanTrem deleted the Nolan/CleanupParsers branch February 11, 2025 05:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant