Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Better handling for failures #12

Merged
merged 2 commits into from
Oct 15, 2024
Merged

Better handling for failures #12

merged 2 commits into from
Oct 15, 2024

Conversation

mr-eyes
Copy link
Member

@mr-eyes mr-eyes commented Oct 15, 2024

No description provided.

Copy link
Contributor

coderabbitai bot commented Oct 15, 2024

Walkthrough

The changes across the specified files enhance functionality, logging, error handling, and testing for genomic data processing. Key updates include new methods for calculating metrics related to sex chromosomes, improved logging and error handling across various components, and modifications to existing methods and properties to support better data management and analysis. Additionally, tests have been expanded and refined to ensure comprehensive coverage of the new and updated functionalities.

Changes

File(s) Change Summary
src/snipe/api/reference_QC.py Updated methods for coverage prediction and chromosome metrics; added calculate_sex_chrs_metrics; improved logging and error handling; code cleanup.
src/snipe/api/snipe_sig.py Updated imports; modified export method to include force parameter; added statistical properties; enhanced logging for export and statistics calculations.
src/snipe/cli/cli_qc.py Added signal handling for graceful shutdowns; improved error handling in sample processing; modified output TSV generation; updated internal logic of qc.
tests/api/test_reference_qc.py Added new tests for predict_coverage; adjusted existing tests; expanded assertions; added error handling tests; enhanced tests for advanced statistics.

Poem

In fields of code where rabbits play,
New metrics hop in bright array.
With logs that sing and errors tamed,
Our genomic quest is now proclaimed!
So let us dance in data's light,
For every change brings joy in sight! 🐇✨


Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

‼️ IMPORTANT
Auto-reply has been disabled for this repository in the CodeRabbit settings. The CodeRabbit bot will not respond to your replies unless it is explicitly tagged.

  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@mr-eyes mr-eyes merged commit 2675856 into main Oct 15, 2024
3 of 4 checks passed
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 13

🧹 Outside diff range and nitpick comments (5)
src/snipe/cli/cli_qc.py (1)

560-566: Add encoding to CSV export to ensure consistent character handling.

When writing the DataFrame to the file, specifying the encoding parameter can help prevent issues with non-ASCII characters.

Apply this diff to specify the encoding:

         # Write the DataFrame to the file
-        df.to_csv(f, sep='\t', index=False)
+        df.to_csv(f, sep='\t', index=False, encoding='utf-8')
tests/api/test_reference_qc.py (1)

614-616: Consider consolidating redundant tests for predict_coverage.

The test method test_predict_coverage_already_full_coverage seems to overlap with test_predict_coverage_does_not_exceed_one. Both tests verify that the predicted coverage does not exceed 1.0 when using larger extra_fold values. To improve maintainability, consider combining these tests or clarifying their distinct purposes.

src/snipe/api/snipe_sig.py (1)

510-510: Remove commented-out code

Line 510 contains a commented-out reference to sourmash.save_load.SaveSignatures_SigFile, which is unnecessary. Removing commented code improves readability.

Apply this diff:

-# sourmash.save_load.SaveSignatures_SigFile
src/snipe/api/reference_QC.py (2)

Line range hint 397-510: Duplicate definition of _calculate_advanced_stats method

The _calculate_advanced_stats method is defined twice (lines 397-454 and lines 455-510). This duplication can lead to confusion and maintenance issues.

Please remove the duplicate method definition. Ensure that the retained method includes all necessary logic.

-        def _calculate_advanced_stats(self):
-            r"""
-            Calculate advanced statistics, such as median-trimmed metrics.
-            """
-            self.logger.debug("Calculating advanced statistics.")
-            # [Rest of the duplicate method code]

Line range hint 1237-1253: Missing return type in docstring for load_genome_sig_to_dict

The docstring for load_genome_sig_to_dict is missing descriptions for the return type and the functionality.

Please update the docstring to include detailed information about the return value.

Apply this diff to improve the docstring:

 def load_genome_sig_to_dict(self, *, zip_file_path: str, **kwargs) -> Dict[str, 'SnipeSig']:
     """
-    Load a genome signature into a dictionary of SnipeSig instances.
+    Load genome and chromosome signatures from a zip file into a dictionary of SnipeSig instances.
     
     Parameters:
         zip_file_path (str): Path to the zip file containing the genome signatures.
         **kwargs: Additional keyword arguments to pass to the SnipeSig constructor.
         
-    Returns:
+    Returns:
+        Dict[str, SnipeSig]: A dictionary mapping chromosome names to their corresponding SnipeSig instances.
     """
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Files that changed from the base of the PR and between 3082e02 and a46d927.

📒 Files selected for processing (4)
  • src/snipe/api/reference_QC.py (2 hunks)
  • src/snipe/api/snipe_sig.py (3 hunks)
  • src/snipe/cli/cli_qc.py (4 hunks)
  • tests/api/test_reference_qc.py (1 hunks)
🧰 Additional context used
🪛 Ruff
src/snipe/api/snipe_sig.py

6-6: typing.Any imported but unused

Remove unused import

(F401)


6-6: typing.Optional imported but unused

Remove unused import

(F401)

src/snipe/cli/cli_qc.py

17-17: Redefinition of unused concurrent from line 10

Remove definition: concurrent

(F811)

🔇 Additional comments (2)
src/snipe/api/reference_QC.py (2)

1029-1029: Ensure consistent return type in predict_coverage

Explicitly converting predicted_coverage to float before returning ensures that the method consistently returns a float value, which can help prevent unexpected issues where integer division or other operations might result in an integer.


Line range hint 1155-1164: Potential issue with variable initialization in nonref_consume_from_vars

Within the nonref_consume_from_vars method, the variables sample_nonref_unique_hashes and sample_nonref_var_coverage_index may lead to division by zero errors if sample_nonref_unique_hashes is zero.

Please ensure that sample_nonref_unique_hashes is not zero before performing division. Add a check to handle this scenario gracefully.

@@ -14,6 +14,8 @@
from snipe.api.snipe_sig import SnipeSig
from snipe.api.reference_QC import ReferenceQC

import concurrent.futures
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Remove duplicate import of concurrent.futures.

The module concurrent.futures is already imported at line 10. This duplicate import is unnecessary and can be removed to clean up the imports.

Apply this diff to remove the duplicate import:

- import concurrent.futures
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
import concurrent.futures
🧰 Tools
🪛 Ruff

17-17: Redefinition of unused concurrent from line 10

Remove definition: concurrent

(F811)

Comment on lines +501 to +504
def shutdown(signum, frame):
logger.warning("Shutdown signal received. Terminating all worker processes...")
executor.shutdown(wait=False, cancel_futures=True)
sys.exit(1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Handle potential NameError for executor in the shutdown function.

The shutdown function references executor, which may not be defined if a shutdown signal is received before the ProcessPoolExecutor is initialized. This can lead to a NameError. To prevent this, check if executor exists before attempting to shut it down.

Apply this diff to handle the potential undefined executor:

 def shutdown(signum, frame):
     logger.warning("Shutdown signal received. Terminating all worker processes...")
+    try:
+        executor.shutdown(wait=False, cancel_futures=True)
+    except NameError:
+        logger.warning("Executor not initialized; skipping shutdown.")
     sys.exit(1)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def shutdown(signum, frame):
logger.warning("Shutdown signal received. Terminating all worker processes...")
executor.shutdown(wait=False, cancel_futures=True)
sys.exit(1)
def shutdown(signum, frame):
logger.warning("Shutdown signal received. Terminating all worker processes...")
try:
executor.shutdown(wait=False, cancel_futures=True)
except NameError:
logger.warning("Executor not initialized; skipping shutdown.")
sys.exit(1)

Comment on lines +516 to +527
for future in tqdm(concurrent.futures.as_completed(futures), total=len(futures), desc="Processing samples"):
sample = futures[future]
try:
result = future.result()
results.append(result)
except Exception as exc:
logger.error(f"Sample {sample['sample_path']} generated an exception: {exc}")
results.append({
"sample": os.path.splitext(os.path.basename(sample['sample_path']))[0],
"file_path": sample['sample_path'],
"QC_Error": str(exc)
})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Improve exception handling during sample processing.

Currently, if an exception occurs in a worker process, it may not capture all relevant information about the failure. To enhance debugging and error reporting, consider capturing the traceback or more detailed exception information.

Apply this diff to include traceback information:

 import traceback

 for future in tqdm(concurrent.futures.as_completed(futures), total=len(futures), desc="Processing samples"):
     sample = futures[future]
     try:
         result = future.result()
         results.append(result)
     except Exception as exc:
         logger.error(f"Sample {sample['sample_path']} generated an exception: {exc}")
+        logger.debug(traceback.format_exc())
         results.append({
             "sample": os.path.splitext(os.path.basename(sample['sample_path']))[0],
             "file_path": sample['sample_path'],
             "QC_Error": str(exc)
         })
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
for future in tqdm(concurrent.futures.as_completed(futures), total=len(futures), desc="Processing samples"):
sample = futures[future]
try:
result = future.result()
results.append(result)
except Exception as exc:
logger.error(f"Sample {sample['sample_path']} generated an exception: {exc}")
results.append({
"sample": os.path.splitext(os.path.basename(sample['sample_path']))[0],
"file_path": sample['sample_path'],
"QC_Error": str(exc)
})
import traceback
for future in tqdm(concurrent.futures.as_completed(futures), total=len(futures), desc="Processing samples"):
sample = futures[future]
try:
result = future.result()
results.append(result)
except Exception as exc:
logger.error(f"Sample {sample['sample_path']} generated an exception: {exc}")
logger.debug(traceback.format_exc())
results.append({
"sample": os.path.splitext(os.path.basename(sample['sample_path']))[0],
"file_path": sample['sample_path'],
"QC_Error": str(exc)
})

Comment on lines +545 to +546
command_invocation = ' '.join(sys.argv)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Ensure sensitive information is not logged in the command invocation comment.

Writing the full command-line invocation, including all arguments, to the output file may inadvertently expose sensitive information (e.g., file paths, user data). Consider sanitizing the command or omitting sensitive details before writing it to the output file.

Apply this diff to sanitize the command invocation:

 command_invocation = ' '.join(sys.argv)
+# Sanitize command invocation to remove sensitive information if necessary
+command_invocation = sanitize_command(command_invocation)

Implement a sanitize_command function to handle the sanitization logic as needed.

Committable suggestion was skipped due to low confidence.

Comment on lines +528 to +533
except KeyboardInterrupt:
logger.warning("KeyboardInterrupt received. Shutting down...")
sys.exit(1)
except Exception as e:
logger.error(f"An unexpected error occurred: {e}")
sys.exit(1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Remove redundant KeyboardInterrupt exception handling.

Since signal handlers for SIGINT (triggered by Ctrl+C) and SIGTERM are already registered, the except KeyboardInterrupt block may be redundant. Additionally, KeyboardInterrupt may not be raised as expected when using ProcessPoolExecutor.

Apply this diff to remove the redundant exception handling:

 except Exception as e:
     logger.error(f"An unexpected error occurred: {e}")
     sys.exit(1)
- except KeyboardInterrupt:
-     logger.warning("KeyboardInterrupt received. Shutting down...")
-     sys.exit(1)

Committable suggestion was skipped due to low confidence.

@@ -1,10 +1,12 @@
import heapq
import logging

import sourmash.save_load
from snipe.api.enums import SigType
from typing import Any, Dict, Iterator, List, Optional, Union
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Remove unused imports (Any, Optional)

The imports Any and Optional from the typing module are not used in the code. It's best practice to remove unused imports to keep the code clean.

Apply this diff to remove unused imports:

-from typing import Any, Dict, Iterator, List, Optional, Union
+from typing import Dict, Iterator, List, Union
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
from typing import Any, Dict, Iterator, List, Optional, Union
from typing import Dict, Iterator, List, Union
🧰 Tools
🪛 Ruff

6-6: typing.Any imported but unused

Remove unused import

(F401)


6-6: typing.Optional imported but unused

Remove unused import

(F401)

with sourmash.save_load.SaveSignatures_ZipFile(path) as save_sigs:
save_sigs.add(self.sourmash_sig)
except Exception as e:
logging.error("Failed to export signatures to zip: %s", e)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Use self.logger.error instead of logging.error

At line 519, logging.error is used, which logs to the root logger. To maintain consistency and respect the logging configuration of the class, use self.logger.error instead.

Apply this diff:

-        logging.error("Failed to export signatures to zip: %s", e)
+        self.logger.error("Failed to export signatures to zip: %s", e)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
logging.error("Failed to export signatures to zip: %s", e)
self.logger.error("Failed to export signatures to zip: %s", e)

from snipe.api.enums import SigType
from typing import Any, Dict, Iterator, List, Optional, Union
import numpy as np
import sourmash

import os
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Remove unused import of os module

The os module is imported but not used in the code. Removing unused imports improves code readability and efficiency.

Apply this diff to remove the unused import:

-import os
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
import os

Comment on lines +1114 to +1121
def sort_chromosomes(chr_name):
try:
# Try to convert to integer for numeric chromosomes
return (0, int(chr_name))
except ValueError:
# Non-numeric chromosomes (like 'x', 'y', 'z', etc.)
return (1, chr_name)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Consider simplifying the sort_chromosomes function

The sort_chromosomes function can be simplified for better readability and efficiency.

Consider using a more concise approach:

def sort_chromosomes(chr_name):
    return (0, int(chr_name)) if chr_name.isdigit() else (1, chr_name)

This leverages the str.isdigit() method to check if the chromosome name is numeric.


self.chrs_stats.update(chr_to_mean_abundance)

# chromosomes are numberd from 1 to ..., sort them by numer (might be string for sex chromosomes) then prefix them with chr-
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Typographical errors in the comment

There are typographical errors in the comment. It should read "numbered" instead of "numberd" and "number" instead of "numer".

Apply this diff to correct the typos:

-            # chromosomes are numberd from 1 to ..., sort them by numer (might be string for sex chromosomes) then prefix them with chr-
+            # Chromosomes are numbered from 1 to ..., sort them by number (might be string for sex chromosomes), then prefix them with 'chr-'
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
# chromosomes are numberd from 1 to ..., sort them by numer (might be string for sex chromosomes) then prefix them with chr-
# Chromosomes are numbered from 1 to ..., sort them by number (might be string for sex chromosomes), then prefix them with 'chr-'

@mr-eyes mr-eyes deleted the minor_fixes_qc branch October 15, 2024 16:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant