Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: add support for pyproject.toml and env.yml/env.yaml files #550

Closed
wants to merge 11 commits into from

Conversation

dylanpulver
Copy link
Collaborator

@dylanpulver dylanpulver commented Jul 9, 2024

Tested this manually by adding these files to the repo and conifrming they are picked up. This supports both versions: env.yml and env.yaml

image

Original issue: #505

Summary by CodeRabbit

  • New Features

    • Enhanced scanning commands with improved functionality and error handling.
    • Introduced a summary feature for vulnerabilities in scan reports.
    • Added support for reading dependencies from pyproject.toml files.
    • Expanded file handling capabilities to include special files like pyproject.toml and environment files.
  • Improvements

    • Real-time feedback added during dependency scanning for better visibility.
    • Detailed output control adjusted for relevant information display based on vulnerabilities.
    • Streamlined report processing for various output formats, including JSON and SPDX.
    • Refined command structure for better parameter handling.
  • Bug Fixes

    • Improved clarity in error reporting during upload failures.
    • Corrected logic and indentation issues in file handling processes.

@dylanpulver dylanpulver marked this pull request as ready for review July 9, 2024 15:20
@dylanpulver
Copy link
Collaborator Author

Please note that this PR in the safety_schemas repo is needed for this to work: https://github.com/pyupio/safety_schemas/pull/1

Copy link

coderabbitai bot commented Sep 16, 2024

Walkthrough

The changes enhance the functionality of scanning commands in the safety/scan/command.py file. Key updates include the addition of print statements for better feedback during file processing, refined error handling in the process_report function, and an updated scan function that checks for an apply_updates flag. The output control has been improved for detailed reporting, and a new get_vulnerability_summary function has been introduced for better organization of vulnerability data. Additionally, the ability to read dependencies from pyproject.toml files has been added, and the FileFinder class has been updated to include special files in its processing.

Changes

File Change Summary
safety/scan/command.py Enhanced scanning commands with new print statements, refined error handling, and updated scan function. Added get_vulnerability_summary function and improved output format handling.
safety/scan/ecosystems/python/dependencies.py Introduced read_pyproject_toml_dependencies function to read dependencies from pyproject.toml files and updated get_dependencies to handle this new file type.
safety/scan/finder/file_finder.py Modified FileFinder class to include special files (pyproject.toml, env.yml, env.yaml) in the file processing logic.
safety/scan/finder/handlers.py Updated method signatures in FileHandler and SafetyProjectFileHandler classes, focusing on the download_required_assets method.

Possibly related PRs

  • fix/clarify-vulnerabilities-found/ #608: The changes in safety/scan/command.py related to the get_vulnerability_summary function and the process_report function are directly connected to the enhancements made in the same file in PR fix/clarify-vulnerabilities-found/ #608, which also introduced a new get_vulnerability_summary function and modified the process_report function to improve vulnerability reporting.

Suggested labels

bug

Poem

🐇 In the code where the errors hide,
New prints and checks now stand with pride.
Scanning through files, we take a leap,
For every report, a promise to keep.
With updates in hand, we hop and play,
A safer tomorrow starts today! 🌟


Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

Commits

Files that changed from the base of the PR and between 15e757c and 333421d.

Files selected for processing (5)
  • pyproject.toml (1 hunks)
  • safety/scan/command.py (2 hunks)
  • safety/scan/ecosystems/python/dependencies.py (3 hunks)
  • safety/scan/finder/file_finder.py (3 hunks)
  • safety/scan/finder/handlers.py (3 hunks)
Additional context used
Ruff
safety/scan/finder/handlers.py

5-5: typing.Tuple imported but unused

Remove unused import: typing.Tuple

(F401)

safety/scan/finder/file_finder.py

11-11: .handlers.PyProjectTomlHandler imported but unused

Remove unused import: .handlers.PyProjectTomlHandler

(F401)


161-161: Use enumerate() for index variable level in for loop

(SIM113)

Additional comments not posted (9)
safety/scan/finder/handlers.py (4)

112-113: LGTM!

The empty implementation of the download_required_assets method is appropriate since there are no required assets to download for Safety project files, as indicated by the docstring.


116-162: LGTM!

The implementation of the PyProjectTomlHandler class looks good:

  • It correctly extends the FileHandler class and sets the ecosystem to Ecosystem.PYPROJECT_TOML.
  • The download_required_assets method fetches both the full and partial Safety databases, consistent with the PythonFileHandler class.
  • The can_handle method correctly checks for the pyproject.toml file name and returns the corresponding FileType.
  • The handle method reads the pyproject.toml file using the toml library and extracts dependencies from the relevant sections.
  • The extracted dependencies are returned as a set, which is an appropriate data structure for unique dependencies.

Great job!


177-177: LGTM!

Adding the PyProjectTomlHandler to the ECOSYSTEM_HANDLER_MAPPING dictionary is necessary and correctly implemented. This associates the handler with the corresponding Ecosystem.PYPROJECT_TOML ecosystem.


Line range hint 1-179: Overall, the changes in this file look great!

The introduction of the PyProjectTomlHandler class is a valuable addition to support handling pyproject.toml files. The implementation is consistent with the existing FileHandler classes and follows the necessary conventions. The class correctly extends the FileHandler class and implements the required methods such as download_required_assets, can_handle, and handle appropriately.

The integration of the PyProjectTomlHandler with the existing codebase is smooth, and the class is correctly added to the ECOSYSTEM_HANDLER_MAPPING dictionary to associate it with the corresponding ecosystem.

Apart from the minor issues mentioned in the previous comments, the overall changes in this file are well-structured and contribute positively to the project's functionality.

Great work!

Tools
Ruff

167-167: Redefinition of unused download_required_assets from line 121

(F811)

safety/scan/finder/file_finder.py (1)

153-160: LGTM!

The new code block correctly handles the special files pyproject.toml, env.yml, and env.yaml by adding them to the files dictionary under their respective file types. This enhances the functionality of the file finder to support these important configuration files.

safety/scan/ecosystems/python/dependencies.py (2)

273-308: LGTM!

The read_pyproject_toml_dependencies function correctly reads dependencies from a pyproject.toml file and yields PythonDependency objects. It handles both versioned and unversioned dependencies in the relevant sections of the TOML file.

The implementation integrates well with the existing PythonDependency model and the PythonSpecification class, enabling support for pyproject.toml files in the project.


330-331: LGTM!

The changes to the get_dependencies function correctly integrate the handling of FileType.PYPROJECT_TOML files by calling the newly introduced read_pyproject_toml_dependencies function.

This modification ensures that dependencies from pyproject.toml files are properly processed and returned as PythonDependency objects, seamlessly integrating with the existing dependency resolution logic.

safety/scan/command.py (2)

48-48: LGTM!

The new member PYPROJECT_TOML is added correctly to the ScannableEcosystems enumeration to support scanning projects defined by a pyproject.toml file.


Line range hint 1-1: Skipped reviewing system_scan function.

The system_scan function is not changed in the provided code, so there is nothing to review.

pyproject.toml Outdated Show resolved Hide resolved
safety/scan/finder/handlers.py Outdated Show resolved Hide resolved
safety/scan/finder/handlers.py Outdated Show resolved Hide resolved
safety/scan/finder/file_finder.py Outdated Show resolved Hide resolved
safety/scan/command.py Outdated Show resolved Hide resolved
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Outside diff range and nitpick comments (3)
safety/scan/finder/handlers.py (3)

112-113: Consider providing a meaningful implementation or documentation.

In the SafetyProjectFileHandler, the download_required_assets method currently contains only a pass statement. If no assets are required, it's a good practice to document this explicitly.

Consider adding a docstring:

 def download_required_assets(self, session):
+    """
+    No required assets to download for Safety project files.
+    """
     pass

138-139: Remove debug print statement in can_handle method.

The print("recognized") statement is likely used for debugging purposes. Consider removing it or replacing it with appropriate logging.

Apply this diff to remove the print statement:

     def can_handle(self, root: str, file_name: str, include_files: Dict[FileType, List[Path]]) -> Optional[FileType]:
         if file_name == 'pyproject.toml':
-            print("recognized")
             return FileType.PYPROJECT_TOML
         return None

146-146: Remove debug print statement in handle method.

The print("printing data", data) statement may not be suitable for production code. Consider removing it or using a logging framework.

Apply this diff to remove the print statement:

         with open(file_path, 'r') as file:
             data = toml.load(file)
-            print("printing data", data)
             dependencies = set()
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Files that changed from the base of the PR and between 958a39e and 3630e88.

📒 Files selected for processing (1)
  • safety/scan/finder/handlers.py (3 hunks)
🧰 Additional context used
🔇 Additional comments (2)
safety/scan/finder/handlers.py (2)

55-55: ⚠️ Potential issue

Correct the use of NotImplementedError in abstract method.

In the FileHandler class, the download_required_assets method should raise NotImplementedError instead of returning it. This ensures that subclasses are properly notified to implement this method.

Apply this diff to fix the issue:

 def download_required_assets(self, session):
     """
     Abstract method to download required assets for handling files. Should be implemented
     by subclasses.

     Args:
         session: The session object for making network requests.

-    Returns:
-        Dict[str, str]: A dictionary of downloaded assets.
     """
-    return NotImplementedError(NOT_IMPLEMENTED)
+    raise NotImplementedError(NOT_IMPLEMENTED)

Likely invalid or redundant comment.


5-5: Verify necessity of all imported types from typing module.

Ensure that all imported types (Dict, List, Optional, Set) are used in the code. Unused imports can be removed to keep the code clean.

Run the following script to check for unused imports:

safety/scan/finder/handlers.py Outdated Show resolved Hide resolved
safety/scan/finder/handlers.py Outdated Show resolved Hide resolved
safety/scan/finder/handlers.py Outdated Show resolved Hide resolved
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🧹 Outside diff range and nitpick comments (3)
safety/scan/finder/handlers.py (1)

Line range hint 55-65: Update docstring to match implementation.

The docstring still indicates a Dict[str, str] return type, but the method signature no longer specifies a return type and raises NotImplementedError. Please update the docstring to reflect the current implementation.

Apply this diff to fix the docstring:

     def download_required_assets(self, session):
         """
         Abstract method to download required assets for handling files. Should be implemented
         by subclasses.

         Args:
             session: The session object for making network requests.
-
-        Returns:
-            Dict[str, str]: A dictionary of downloaded assets.
         """
         return NotImplementedError(NOT_IMPLEMENTED)
safety/scan/finder/file_finder.py (2)

154-160: Consider refactoring special files handling into dedicated handlers.

The current implementation hardcodes special files processing separately from the handler-based approach. This creates maintenance overhead and potential inconsistencies. Consider:

  1. Creating dedicated handlers for these file types
  2. Integrating with the existing handler mechanism
  3. Adding validation for file contents

Example implementation approach:

class PyProjectTomlHandler(FileHandler):
    def can_handle(self, root: str, filename: str, include_files: Optional[Dict]) -> Optional[FileType]:
        if filename == 'pyproject.toml':
            return FileType('pyproject.toml')
        return None

class EnvYamlHandler(FileHandler):
    def can_handle(self, root: str, filename: str, include_files: Optional[Dict]) -> Optional[FileType]:
        if filename in ['env.yml', 'env.yaml']:
            return FileType(filename)
        return None

Then register these handlers in ECOSYSTEM_HANDLER_MAPPING instead of using the hardcoded approach.


161-161: Remove or properly implement the level counter.

The level counter appears to be unused. Either:

  1. Remove it if not needed
  2. Use it meaningfully for tracking directory depth
  3. If tracking iteration count is needed, use enumerate()

If you need to track iterations, consider:

-            level += 1

Or if you need both the index and iteration:

for level, (root, dirs, filenames) in enumerate(os.walk(dir_path)):
    # ... rest of the code
🧰 Tools
🪛 Ruff

161-161: Use enumerate() for index variable level in for loop

(SIM113)

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Files that changed from the base of the PR and between 0684758 and db257de.

📒 Files selected for processing (3)
  • safety/scan/ecosystems/python/dependencies.py (3 hunks)
  • safety/scan/finder/file_finder.py (2 hunks)
  • safety/scan/finder/handlers.py (3 hunks)
🧰 Additional context used
🪛 Ruff
safety/scan/finder/file_finder.py

161-161: Use enumerate() for index variable level in for loop

(SIM113)

safety/scan/finder/handlers.py

5-5: typing.Set imported but unused

Remove unused import: typing.Set

(F401)


6-6: toml imported but unused

Remove unused import: toml

(F401)

🔇 Additional comments (5)
safety/scan/finder/handlers.py (1)

Line range hint 116-120: Add PyProjectTomlHandler to ecosystem mapping.

Given the PR objective to add support for pyproject.toml files, the handler for these files should be added to the ECOSYSTEM_HANDLER_MAPPING. Currently, there's no implementation visible for handling pyproject.toml files.

Let's verify if the PyProjectTomlHandler exists in the codebase:

Either:

  1. Implement a new PyProjectTomlHandler class, or
  2. Add the existing handler to the mapping:
 ECOSYSTEM_HANDLER_MAPPING = MappingProxyType({
     Ecosystem.PYTHON: PythonFileHandler,
     Ecosystem.SAFETY_PROJECT: SafetyProjectFileHandler,
+    Ecosystem.PYPROJECT_TOML: PyProjectTomlHandler,  # Add handler for pyproject.toml
 })
safety/scan/finder/file_finder.py (2)

78-78: Remove debugging print statement.

This print statement appears to be for debugging purposes and should be removed before merging.


154-160: Verify integration with safety_schemas.

Since this PR depends on changes in the safety_schemas repository (pyupio/safety_schemas#1), we should verify:

  1. The FileType values match those defined in safety_schemas
  2. The file content validation is properly integrated
✅ Verification successful

Let me gather more information to verify the FileType implementation and its integration.


Based on the gathered information, I can now generate the final response:

FileType integration with safety_schemas is properly implemented

The implementation correctly integrates with safety_schemas:

  • The FileType enum is imported from safety_schemas.models in multiple files
  • The package requires safety_schemas>=0.0.4 as a dependency
  • The special files ('pyproject.toml', 'env.yml', 'env.yaml') are properly handled using the FileType enum from safety_schemas
  • File content validation is handled through the safety_schemas models and validation system
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Check if FileType values in safety_schemas match our usage
rg -A 1 "class FileType" 
rg "FileType\(" 

# Check for any validation logic for these file types
rg -l "pyproject\.toml|env\.ya?ml" 

Length of output: 331


Script:

#!/bin/bash
# Check FileType enum definition and its values
ast-grep --pattern 'class FileType'

# Check for any validation or schema related code for these files
rg -A 5 "def validate|schema" 

# Look for imports related to safety_schemas
rg "from safety_schemas|import safety_schemas"

Length of output: 25706

safety/scan/ecosystems/python/dependencies.py (2)

5-5: Importing 'toml' module is appropriate

The addition of import toml is necessary for parsing pyproject.toml files.


347-350: Integration with 'pyproject.toml' files is correctly implemented

The get_dependencies function now correctly handles FileType.PYPROJECT_TOML by invoking read_pyproject_toml_dependencies. This enhancement allows the system to parse dependencies from pyproject.toml files.

safety/scan/finder/handlers.py Outdated Show resolved Hide resolved
safety/scan/ecosystems/python/dependencies.py Outdated Show resolved Hide resolved
safety/scan/ecosystems/python/dependencies.py Outdated Show resolved Hide resolved
Comment on lines 310 to 312
dep_name, dep_version = (dep.split("==") + [None])[:2]
print(dep_name)
print(dep_version)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Improve dependency parsing to handle various version specifiers

The current approach splits the dependency string on '==' to obtain dep_name and dep_version. This may not correctly handle dependencies with different version specifiers (e.g., '>=', '~=', '!=') or additional markers and extras. Consider using a robust parser, such as packaging.requirements.Requirement, to accurately parse dependency strings.

Apply this diff to parse dependencies using the packaging library:

+    from packaging.requirements import Requirement
...
     for dep in dependencies:
-        dep_name, dep_version = (dep.split("==") + [None])[:2]
-        print(dep_name)
-        print(dep_version)
+        req = Requirement(dep)
+        dep_name = req.name
+        dep_version = str(req.specifier) if req.specifier else None
...
         yield PythonDependency(
             name=dep_name,
             version=dep_version,
             specifications=[
-                PythonSpecification(f"{dep_name}=={dep_version}" if dep_version else dep_name, found=file)
+                PythonSpecification(str(req), found=file)
             ],
...
         )

Committable suggestion was skipped due to low confidence.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

🧹 Outside diff range and nitpick comments (7)
safety/scan/ecosystems/python/dependencies.py (2)

5-5: Consider pinning the toml package version

To ensure consistent behavior across different environments, consider specifying the required version of toml in your project's requirements.


273-320: Consider using established Python packaging libraries

The current implementation could benefit from leveraging established Python packaging libraries:

  1. Use poetry.core.packages for parsing Poetry dependencies
  2. Use tomli (PEP 680) instead of toml for better TOML parsing
  3. Consider supporting PEP 621 build backend interface for more standardized project metadata handling

This would provide more robust parsing and better compatibility with the Python packaging ecosystem.

safety/scan/command.py (5)

248-248: Remove Unnecessary Blank Line

There is an extra blank line at line 248. Removing it will improve code readability and maintain consistency.

Apply this diff to remove the blank line:

             ):
-
         if not ctx.obj.metadata.authenticated:

373-373: Remove Unnecessary Empty Line

An unnecessary empty line is present at line 373. Consider removing it to keep the code clean.

Apply this diff:

                         lines = []
-
                         if spec.remediation.recommended:

445-454: Improve Argument Alignment in print_summary Call

The arguments in the print_summary function call are misaligned, which affects readability. Align the arguments vertically for better clarity.

Apply this diff:

         print_summary(
-        console=console,
-        total_issues_with_duplicates=total_issues_with_duplicates,
-        total_ignored_issues=total_ignored_issues,
-    450~
-        project=ctx.obj.project,
-    451~
-        dependencies_count=count,
-    452~
-        fixes_count=fixes_count,
-    453~
-        resolved_vulns_per_fix=total_resolved_vulns,
-    454~
-        is_detailed_output=detailed_output,
-        ignored_vulns_data=ignored_vulns_data
+            console=console,
+            total_issues_with_duplicates=total_issues_with_duplicates,
+            total_ignored_issues=total_ignored_issues,
+            project=ctx.obj.project,
+            dependencies_count=count,
+            fixes_count=fixes_count,
+            resolved_vulns_per_fix=total_resolved_vulns,
+            is_detailed_output=detailed_output,
+            ignored_vulns_data=ignored_vulns_data
         )

Line range hint 798-806: Update Function Docstring for Accuracy

The docstring for get_vulnerability_summary mentions that it accepts a ReportModel, but the function actually processes a dictionary. Ensure that the docstring accurately reflects the function's input parameter.

Update the docstring to indicate the correct parameter type:

     def get_vulnerability_summary(report: Dict[str, Any]) -> Tuple[int, int]:
         """
-        Summarize vulnerabilities from the given report.
+        Summarize vulnerabilities from the given report dictionary.

         Args:
-            report (ReportModel): The report containing vulnerability data.
+            report (Dict[str, Any]): The report dictionary containing vulnerability data.

         Returns:
             Tuple[int, int]: A tuple containing:

373-373: Consider Combining Consecutive Empty Lines

Multiple empty lines have been added consecutively at various points in the code, such as at line 373. While spacing can improve readability, excessive empty lines may have the opposite effect.

Review the code to ensure that only necessary empty lines are included.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Files that changed from the base of the PR and between db257de and 0544d7c.

📒 Files selected for processing (3)
  • safety/scan/command.py (4 hunks)
  • safety/scan/ecosystems/python/dependencies.py (3 hunks)
  • safety/scan/finder/file_finder.py (1 hunks)
🧰 Additional context used
🪛 Ruff
safety/scan/finder/file_finder.py

160-160: Use enumerate() for index variable level in for loop

(SIM113)

🔇 Additional comments (2)
safety/scan/ecosystems/python/dependencies.py (2)

342-344: LGTM!

The implementation follows the existing pattern and cleanly integrates pyproject.toml support.


291-297: ⚠️ Potential issue

Improve handling of poetry dependencies

The current implementation doesn't handle complex dependency specifications in poetry dependencies (e.g., version constraints, extras, markers).

Consider using poetry.core.packages for proper parsing:

     if 'tool' in data and 'poetry' in data['tool'] and 'dependencies' in data['tool']['poetry']:
         for dep, version in data['tool']['poetry']['dependencies'].items():
-            if isinstance(version, str):
-                dependencies.append(f"{dep}=={version}")
-            else:
-                dependencies.append(dep)
+            if isinstance(version, dict):
+                # Handle complex version specifications
+                version_constraint = version.get('version', '')
+                extras = version.get('extras', [])
+                markers = version.get('markers', '')
+                dep_spec = f"{dep}{version_constraint}"
+                if extras:
+                    dep_spec += f"[{','.join(extras)}]"
+                if markers:
+                    dep_spec += f"; {markers}"
+            else:
+                dep_spec = f"{dep}{version}"
+            dependencies.append(dep_spec)

Likely invalid or redundant comment.

Comment on lines 153 to 159
special_files = {'pyproject.toml', 'env.yml', 'env.yaml'}
if file_name in special_files:
file_type = FileType(file_name)
inspectable_file = Path(root, file_name)
if file_type.value not in files or not files[file_type.value]:
files[file_type.value] = set()
files[file_type.value].add(inspectable_file)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Consider refactoring special files handling into a dedicated handler.

The current implementation duplicates logic and reduces maintainability. Consider these improvements:

  1. Create a dedicated SpecialFileHandler class that implements the FileHandler interface
  2. Move special files set to a configuration constant
  3. Add validation of file contents before processing

Example implementation:

# Define in a central configuration
SPECIAL_FILES = frozenset({'pyproject.toml', 'env.yml', 'env.yaml'})

class SpecialFileHandler(FileHandler):
    def can_handle(self, root: str, file_name: str, include_files: Optional[Dict[FileType, List[Path]]]) -> Optional[FileType]:
        return FileType(file_name) if file_name in SPECIAL_FILES else None

This approach would:

  • Improve maintainability by centralizing special file definitions
  • Follow the existing handler pattern
  • Make it easier to add validation logic
  • Reduce code duplication

💡 Codebase verification

Based on the search results, I can now generate a final response.

The ecosystem handlers need to be updated for the new file types

The search results reveal that while pyproject.toml and environment files (env.yml, env.yaml) are added to the special files list in file_finder.py, there are gaps in the implementation:

  • The FileType enum is used in multiple places but pyproject.toml and environment files are not included in the DEFAULT_FILE_TYPES list in scan/command.py
  • While there's a handler for pyproject.toml in ecosystems/python/dependencies.py (read_pyproject_toml_dependencies), there's no handler for environment files
  • The supported file types for fixes (SUPPORTED_FILE_TYPES in safety.py) need to be updated if these files should support remediation

Required updates:

  • Add environment file types to the DEFAULT_FILE_TYPES list if they should be scanned by default
  • Implement dependency reading for environment files
  • Update SUPPORTED_FILE_TYPES if fixes should be supported for these file types
🔗 Analysis chain

Verify related ecosystem handlers are updated.

The implementation adds support for pyproject.toml and environment files, but we should verify that corresponding handlers in the ecosystem mapping can process these files correctly.


Let me try a different search strategy to verify the ecosystem handlers.


Let me try one final search strategy focusing on the FileType enum and its usage.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Search for handlers that might need updates for the new file types
echo "Searching for related ecosystem handlers..."
rg -l "pyproject.toml|env\.ya?ml" --type python

# Search for dependency processing functions
echo "Searching for dependency processing functions..."
ast-grep --pattern 'def read_$_dependencies($$$)' 

Length of output: 336


Script:

#!/bin/bash
# Search for files containing ecosystem-related code
echo "Searching for ecosystem-related files..."
fd -e py ecosystem

# Search for files mentioning the new file types
echo "Searching for files mentioning the new file types..."
rg "pyproject\.toml|env\.ya?ml|env\.yml" 

# Search for dependency processing related files
echo "Searching for dependency processing related files..."
fd -e py dependency

Length of output: 514


Script:

#!/bin/bash
# Search for FileType enum definition and usage
echo "Searching for FileType enum..."
rg -A 5 "class FileType" 

# Search for files that handle file type processing
echo "Searching for file type processing..."
rg -B 2 -A 2 "FileType\." 

# Search for any dependency or requirement parsing
echo "Searching for dependency parsing..."
rg -l "parse.*requirements|parse.*dependencies"

Length of output: 4851

safety/scan/finder/file_finder.py Show resolved Hide resolved
safety/scan/ecosystems/python/dependencies.py Outdated Show resolved Hide resolved
Comment on lines 306 to 320
for dep in dependencies:
dep_name, dep_version = (dep.split("==") + [None])[:2]
yield PythonDependency(
name=dep_name,
version=dep_version,
specifications=[
PythonSpecification(f"{dep_name}=={dep_version}" if dep_version else dep_name, found=file)
],
found=file,
insecure_versions=[],
secure_versions=[],
latest_version=None,
latest_version_without_known_vulnerabilities=None,
more_info_url=None
)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Use packaging library for robust dependency parsing

The current approach of splitting on == is fragile and won't handle various version specifiers correctly.

Use the packaging library that's already imported:

     for dep in dependencies:
-        dep_name, dep_version = (dep.split("==") + [None])[:2]
+        try:
+            req = Requirement(dep)
+            dep_name = req.name
+            dep_version = str(req.specifier) if req.specifier else None
+        except InvalidRequirement:
+            logger.warning(f"Invalid requirement format: {dep}")
+            continue
         yield PythonDependency(
             name=dep_name,
             version=dep_version,
             specifications=[
-                PythonSpecification(f"{dep_name}=={dep_version}" if dep_version else dep_name, found=file)
+                PythonSpecification(str(req), found=file)
             ],
             found=file,
             insecure_versions=[],
             secure_versions=[],
             latest_version=None,
             latest_version_without_known_vulnerabilities=None,
             more_info_url=None
         )

Committable suggestion was skipped due to low confidence.

safety/scan/command.py Show resolved Hide resolved
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Files that changed from the base of the PR and between bc06211 and 8748d5b.

📒 Files selected for processing (2)
  • safety/scan/ecosystems/python/dependencies.py (1 hunks)
  • safety/scan/finder/file_finder.py (1 hunks)
🧰 Additional context used
🪛 Ruff
safety/scan/finder/file_finder.py

153-153: Use enumerate() for index variable level in for loop

(SIM113)

🔇 Additional comments (2)
safety/scan/finder/file_finder.py (2)

152-155: Remove unused level counter.

The level counter is:

  1. Not used anywhere in the code
  2. Redundant with the existing current_depth variable
  3. Incorrectly tracks iteration count rather than depth

The previous review comment about removing this counter is still valid. Please refer to that comment for the suggested fix.

🧰 Tools
🪛 Ruff

153-153: Use enumerate() for index variable level in for loop

(SIM113)


Line range hint 1-200: Implementation missing for special files support.

The PR objectives mention adding support for pyproject.toml and env.yml/env.yaml files, but the implementation is not visible in this file. The file finder needs to be updated to recognize these special files.

Let's verify if the special files implementation exists elsewhere:

Please ensure that:

  1. Special files are properly registered in the FileType enum
  2. Appropriate handlers are implemented for these file types
  3. The handlers are registered in ECOSYSTEM_HANDLER_MAPPING

Would you like assistance in implementing the special files support?

🧰 Tools
🪛 Ruff

153-153: Use enumerate() for index variable level in for loop

(SIM113)

@@ -285,10 +283,11 @@ def get_dependencies(f: InspectableFile) -> List[PythonDependency]:
return []

if f.file_type in [FileType.REQUIREMENTS_TXT, FileType.POETRY_LOCK,
FileType.PIPENV_LOCK]:
FileType.PIPENV_LOCK, FileType.PYPROJECT_TOML]:
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Missing implementation for pyproject.toml parsing

While FileType.PYPROJECT_TOML has been added to the list of supported file types, there's no specialized parsing logic for pyproject.toml files. The current implementation will try to use the generic read_dependencies() function, which may not correctly handle the unique structure of pyproject.toml files (e.g., project.dependencies, tool.poetry.dependencies, etc.).

Add a specialized parser for pyproject.toml files:

def read_pyproject_toml_dependencies(file: Path) -> Generator[PythonDependency, None, None]:
    """
    Reads dependencies from a pyproject.toml file.
    
    Args:
        file (Path): Path to the pyproject.toml file
        
    Returns:
        Generator[PythonDependency, None, None]: Generator of PythonDependency objects
    """
    try:
        import toml
        data = toml.load(file)
        dependencies = []
        
        # Check build-system requires
        if 'build-system' in data and 'requires' in data['build-system']:
            dependencies.extend(data['build-system']['requires'])
            
        # Check project dependencies
        if 'project' in data:
            if 'dependencies' in data['project']:
                dependencies.extend(data['project']['dependencies'])
            if 'optional-dependencies' in data['project']:
                for group in data['project']['optional-dependencies'].values():
                    dependencies.extend(group)
                    
        # Check poetry dependencies
        if 'tool' in data and 'poetry' in data['tool']:
            if 'dependencies' in data['tool']['poetry']:
                for dep, version in data['tool']['poetry']['dependencies'].items():
                    if isinstance(version, dict):
                        # Handle complex version specifications
                        dep_spec = f"{dep}{version.get('version', '')}"
                    else:
                        dep_spec = f"{dep}{version}"
                    dependencies.append(dep_spec)
                    
            if 'dev-dependencies' in data['tool']['poetry']:
                for dep, version in data['tool']['poetry']['dev-dependencies'].items():
                    if isinstance(version, dict):
                        dep_spec = f"{dep}{version.get('version', '')}"
                    else:
                        dep_spec = f"{dep}{version}"
                    dependencies.append(dep_spec)
                    
        # Parse dependencies
        from packaging.requirements import Requirement
        for dep in dependencies:
            try:
                req = Requirement(dep)
                yield PythonDependency(
                    name=req.name,
                    version=str(req.specifier) if req.specifier else None,
                    specifications=[PythonSpecification(str(req), found=file)],
                    found=file,
                    absolute_path=file,
                    insecure_versions=[],
                    secure_versions=[],
                    latest_version=None,
                    latest_version_without_known_vulnerabilities=None,
                    more_info_url=None
                )
            except Exception as e:
                logger.warning(f"Failed to parse dependency {dep}: {str(e)}")
                continue
                
    except Exception as e:
        logger.error(f"Failed to parse {file}: {str(e)}")
        return

Then update the get_dependencies function to use this specialized parser:

     if f.file_type in [FileType.REQUIREMENTS_TXT, FileType.POETRY_LOCK,
                        FileType.PIPENV_LOCK, FileType.PYPROJECT_TOML]:
+        if f.file_type == FileType.PYPROJECT_TOML:
+            return list(read_pyproject_toml_dependencies(Path(f.file.name)))
         return list(read_dependencies(f.file, resolve=True))

⚠️ Potential issue

Add support for env.yml/env.yaml files

According to the PR objectives, support for env.yml and env.yaml files should be added, but this functionality is missing from the implementation.

Add support for conda environment files:

def read_conda_environment_dependencies(file: Path) -> Generator[PythonDependency, None, None]:
    """
    Reads dependencies from a conda environment file (env.yml/env.yaml).
    
    Args:
        file (Path): Path to the environment file
        
    Returns:
        Generator[PythonDependency, None, None]: Generator of PythonDependency objects
    """
    try:
        import yaml
        with open(file) as f:
            data = yaml.safe_load(f)
            
        if not data or 'dependencies' not in data:
            return
            
        for dep in data['dependencies']:
            # Skip conda-forge and other channels
            if isinstance(dep, dict) or not isinstance(dep, str):
                continue
                
            # Skip non-pip packages
            if dep.startswith('pip:'):
                continue
                
            try:
                from packaging.requirements import Requirement
                req = Requirement(dep)
                yield PythonDependency(
                    name=req.name,
                    version=str(req.specifier) if req.specifier else None,
                    specifications=[PythonSpecification(str(req), found=file)],
                    found=file,
                    absolute_path=file,
                    insecure_versions=[],
                    secure_versions=[],
                    latest_version=None,
                    latest_version_without_known_vulnerabilities=None,
                    more_info_url=None
                )
            except Exception as e:
                logger.warning(f"Failed to parse dependency {dep}: {str(e)}")
                continue
                
        # Handle pip dependencies
        pip_deps = next((deps for dep in data['dependencies'] 
                        if isinstance(dep, dict) and 'pip' in dep 
                        for deps in dep['pip']), [])
        for dep in pip_deps:
            try:
                from packaging.requirements import Requirement
                req = Requirement(dep)
                yield PythonDependency(
                    name=req.name,
                    version=str(req.specifier) if req.specifier else None,
                    specifications=[PythonSpecification(str(req), found=file)],
                    found=file,
                    absolute_path=file,
                    insecure_versions=[],
                    secure_versions=[],
                    latest_version=None,
                    latest_version_without_known_vulnerabilities=None,
                    more_info_url=None
                )
            except Exception as e:
                logger.warning(f"Failed to parse pip dependency {dep}: {str(e)}")
                continue
                
    except Exception as e:
        logger.error(f"Failed to parse {file}: {str(e)}")
        return

Then update the get_dependencies function to handle these files:

     if f.file_type in [FileType.REQUIREMENTS_TXT, FileType.POETRY_LOCK,
-                       FileType.PIPENV_LOCK, FileType.PYPROJECT_TOML]:
+                       FileType.PIPENV_LOCK, FileType.PYPROJECT_TOML,
+                       FileType.CONDA_ENVIRONMENT]:
+        if f.file_type == FileType.PYPROJECT_TOML:
+            return list(read_pyproject_toml_dependencies(Path(f.file.name)))
+        if f.file_type == FileType.CONDA_ENVIRONMENT:
+            return list(read_conda_environment_dependencies(Path(f.file.name)))
         return list(read_dependencies(f.file, resolve=True))

@dylanpulver
Copy link
Collaborator Author

After changes in our approach, the code changes we need are much simpler and it is easier to just open them in a new PR here: #621

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants