Better handling for failures #12

mr-eyes · 2024-10-15T16:25:02Z

No description provided.

coderabbitai · 2024-10-15T16:25:10Z

Walkthrough

The changes across the specified files enhance functionality, logging, error handling, and testing for genomic data processing. Key updates include new methods for calculating metrics related to sex chromosomes, improved logging and error handling across various components, and modifications to existing methods and properties to support better data management and analysis. Additionally, tests have been expanded and refined to ensure comprehensive coverage of the new and updated functionalities.

Changes

File(s)	Change Summary
`src/snipe/api/reference_QC.py`	Updated methods for coverage prediction and chromosome metrics; added `calculate_sex_chrs_metrics`; improved logging and error handling; code cleanup.
`src/snipe/api/snipe_sig.py`	Updated imports; modified export method to include `force` parameter; added statistical properties; enhanced logging for export and statistics calculations.
`src/snipe/cli/cli_qc.py`	Added signal handling for graceful shutdowns; improved error handling in sample processing; modified output TSV generation; updated internal logic of `qc`.
`tests/api/test_reference_qc.py`	Added new tests for `predict_coverage`; adjusted existing tests; expanded assertions; added error handling tests; enhanced tests for advanced statistics.

Poem

In fields of code where rabbits play,
New metrics hop in bright array.
With logs that sing and errors tamed,
Our genomic quest is now proclaimed!
So let us dance in data's light,
For every change brings joy in sight! 🐇✨

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

‼️ IMPORTANT
Auto-reply has been disabled for this repository in the CodeRabbit settings. The CodeRabbit bot will not respond to your replies unless it is explicitly tagged.

Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai generate unit testing code for this file.
- @coderabbitai modularize this function.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read src/utils.ts and generate unit testing code.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
- @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai help to get help.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

coderabbitai

Actionable comments posted: 13

🧹 Outside diff range and nitpick comments (5)

src/snipe/cli/cli_qc.py (1)
560-566: Add encoding to CSV export to ensure consistent character handling.

When writing the DataFrame to the file, specifying the encoding parameter can help prevent issues with non-ASCII characters.

Apply this diff to specify the encoding:
         # Write the DataFrame to the file
-        df.to_csv(f, sep='\t', index=False)
+        df.to_csv(f, sep='\t', index=False, encoding='utf-8')
tests/api/test_reference_qc.py (1)

614-616: Consider consolidating redundant tests for predict_coverage.

The test method test_predict_coverage_already_full_coverage seems to overlap with test_predict_coverage_does_not_exceed_one. Both tests verify that the predicted coverage does not exceed 1.0 when using larger extra_fold values. To improve maintainability, consider combining these tests or clarifying their distinct purposes.
src/snipe/api/snipe_sig.py (1)
510-510: Remove commented-out code

Line 510 contains a commented-out reference to sourmash.save_load.SaveSignatures_SigFile, which is unnecessary. Removing commented code improves readability.

Apply this diff:
-# sourmash.save_load.SaveSignatures_SigFile
src/snipe/api/reference_QC.py (2)
Line range hint 397-510: Duplicate definition of _calculate_advanced_stats method

The _calculate_advanced_stats method is defined twice (lines 397-454 and lines 455-510). This duplication can lead to confusion and maintenance issues.

Please remove the duplicate method definition. Ensure that the retained method includes all necessary logic.
-        def _calculate_advanced_stats(self):
-            r"""
-            Calculate advanced statistics, such as median-trimmed metrics.
-            """
-            self.logger.debug("Calculating advanced statistics.")
-            # [Rest of the duplicate method code]
Line range hint 1237-1253: Missing return type in docstring for load_genome_sig_to_dict

The docstring for load_genome_sig_to_dict is missing descriptions for the return type and the functionality.

Please update the docstring to include detailed information about the return value.

Apply this diff to improve the docstring:
 def load_genome_sig_to_dict(self, *, zip_file_path: str, **kwargs) -> Dict[str, 'SnipeSig']:
     """
-    Load a genome signature into a dictionary of SnipeSig instances.
+    Load genome and chromosome signatures from a zip file into a dictionary of SnipeSig instances.
     
     Parameters:
         zip_file_path (str): Path to the zip file containing the genome signatures.
         **kwargs: Additional keyword arguments to pass to the SnipeSig constructor.
         
-    Returns:
+    Returns:
+        Dict[str, SnipeSig]: A dictionary mapping chromosome names to their corresponding SnipeSig instances.
     """

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Files that changed from the base of the PR and between 3082e02 and a46d927.

📒 Files selected for processing (4)

src/snipe/api/reference_QC.py (2 hunks)
src/snipe/api/snipe_sig.py (3 hunks)
src/snipe/cli/cli_qc.py (4 hunks)
tests/api/test_reference_qc.py (1 hunks)

🧰 Additional context used

🪛 Ruff

src/snipe/api/snipe_sig.py

6-6: typing.Any imported but unused

Remove unused import

(F401)

6-6: typing.Optional imported but unused

Remove unused import

(F401)

src/snipe/cli/cli_qc.py

17-17: Redefinition of unused concurrent from line 10

Remove definition: concurrent

(F811)

🔇 Additional comments (2)

src/snipe/api/reference_QC.py (2)

1029-1029: Ensure consistent return type in predict_coverage

Explicitly converting predicted_coverage to float before returning ensures that the method consistently returns a float value, which can help prevent unexpected issues where integer division or other operations might result in an integer.

Line range hint 1155-1164: Potential issue with variable initialization in nonref_consume_from_vars

Within the nonref_consume_from_vars method, the variables sample_nonref_unique_hashes and sample_nonref_var_coverage_index may lead to division by zero errors if sample_nonref_unique_hashes is zero.

Please ensure that sample_nonref_unique_hashes is not zero before performing division. Add a check to handle this scenario gracefully.

coderabbitai · 2024-10-15T16:50:50Z

src/snipe/cli/cli_qc.py

@@ -14,6 +14,8 @@
 from snipe.api.snipe_sig import SnipeSig
 from snipe.api.reference_QC import ReferenceQC

+import concurrent.futures


⚠️ Potential issue

Remove duplicate import of concurrent.futures.

The module concurrent.futures is already imported at line 10. This duplicate import is unnecessary and can be removed to clean up the imports.

Apply this diff to remove the duplicate import:

- import concurrent.futures

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

import concurrent.futures

🧰 Tools

🪛 Ruff

17-17: Redefinition of unused concurrent from line 10

Remove definition: concurrent

(F811)

coderabbitai · 2024-10-15T16:50:50Z

src/snipe/cli/cli_qc.py

+    def shutdown(signum, frame):
+        logger.warning("Shutdown signal received. Terminating all worker processes...")
+        executor.shutdown(wait=False, cancel_futures=True)
+        sys.exit(1)


⚠️ Potential issue

Handle potential NameError for executor in the shutdown function.

The shutdown function references executor, which may not be defined if a shutdown signal is received before the ProcessPoolExecutor is initialized. This can lead to a NameError. To prevent this, check if executor exists before attempting to shut it down.

Apply this diff to handle the potential undefined executor:

def shutdown(signum, frame): logger.warning("Shutdown signal received. Terminating all worker processes...") + try: + executor.shutdown(wait=False, cancel_futures=True) + except NameError: + logger.warning("Executor not initialized; skipping shutdown.") sys.exit(1)

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

def shutdown(signum, frame):

logger.warning("Shutdown signal received. Terminating all worker processes...")

executor.shutdown(wait=False, cancel_futures=True)

sys.exit(1)

def shutdown(signum, frame):

logger.warning("Shutdown signal received. Terminating all worker processes...")

try:

executor.shutdown(wait=False, cancel_futures=True)

except NameError:

logger.warning("Executor not initialized; skipping shutdown.")

sys.exit(1)

coderabbitai · 2024-10-15T16:50:50Z

src/snipe/cli/cli_qc.py

+            for future in tqdm(concurrent.futures.as_completed(futures), total=len(futures), desc="Processing samples"):
+                sample = futures[future]
+                try:
+                    result = future.result()
+                    results.append(result)
+                except Exception as exc:
+                    logger.error(f"Sample {sample['sample_path']} generated an exception: {exc}")
+                    results.append({
+                        "sample": os.path.splitext(os.path.basename(sample['sample_path']))[0],
+                        "file_path": sample['sample_path'],
+                        "QC_Error": str(exc)
+                    })


🛠️ Refactor suggestion

Improve exception handling during sample processing.

Currently, if an exception occurs in a worker process, it may not capture all relevant information about the failure. To enhance debugging and error reporting, consider capturing the traceback or more detailed exception information.

Apply this diff to include traceback information:

import traceback for future in tqdm(concurrent.futures.as_completed(futures), total=len(futures), desc="Processing samples"): sample = futures[future] try: result = future.result() results.append(result) except Exception as exc: logger.error(f"Sample {sample['sample_path']} generated an exception: {exc}") + logger.debug(traceback.format_exc()) results.append({ "sample": os.path.splitext(os.path.basename(sample['sample_path']))[0], "file_path": sample['sample_path'], "QC_Error": str(exc) })

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

for future in tqdm(concurrent.futures.as_completed(futures), total=len(futures), desc="Processing samples"):

sample = futures[future]

try:

result = future.result()

results.append(result)

except Exception as exc:

logger.error(f"Sample {sample['sample_path']} generated an exception: {exc}")

results.append({

"sample": os.path.splitext(os.path.basename(sample['sample_path']))[0],

"file_path": sample['sample_path'],

"QC_Error": str(exc)

})

import traceback

for future in tqdm(concurrent.futures.as_completed(futures), total=len(futures), desc="Processing samples"):

sample = futures[future]

try:

result = future.result()

results.append(result)

except Exception as exc:

logger.error(f"Sample {sample['sample_path']} generated an exception: {exc}")

logger.debug(traceback.format_exc())

results.append({

"sample": os.path.splitext(os.path.basename(sample['sample_path']))[0],

"file_path": sample['sample_path'],

"QC_Error": str(exc)

})

coderabbitai · 2024-10-15T16:50:50Z

src/snipe/cli/cli_qc.py

+    command_invocation = ' '.join(sys.argv)
+


⚠️ Potential issue

Ensure sensitive information is not logged in the command invocation comment.

Writing the full command-line invocation, including all arguments, to the output file may inadvertently expose sensitive information (e.g., file paths, user data). Consider sanitizing the command or omitting sensitive details before writing it to the output file.

Apply this diff to sanitize the command invocation:

command_invocation = ' '.join(sys.argv) +# Sanitize command invocation to remove sensitive information if necessary +command_invocation = sanitize_command(command_invocation)

Implement a sanitize_command function to handle the sanitization logic as needed.

Committable suggestion was skipped due to low confidence.

coderabbitai · 2024-10-15T16:50:51Z

src/snipe/cli/cli_qc.py

+    except KeyboardInterrupt:
+        logger.warning("KeyboardInterrupt received. Shutting down...")
+        sys.exit(1)
+    except Exception as e:
+        logger.error(f"An unexpected error occurred: {e}")
+        sys.exit(1)


🛠️ Refactor suggestion

Remove redundant KeyboardInterrupt exception handling.

Since signal handlers for SIGINT (triggered by Ctrl+C) and SIGTERM are already registered, the except KeyboardInterrupt block may be redundant. Additionally, KeyboardInterrupt may not be raised as expected when using ProcessPoolExecutor.

Apply this diff to remove the redundant exception handling:

except Exception as e: logger.error(f"An unexpected error occurred: {e}") sys.exit(1) - except KeyboardInterrupt: - logger.warning("KeyboardInterrupt received. Shutting down...") - sys.exit(1)

Committable suggestion was skipped due to low confidence.

coderabbitai · 2024-10-15T16:50:51Z

src/snipe/api/snipe_sig.py

@@ -1,10 +1,12 @@
 import heapq
 import logging
+
+import sourmash.save_load
 from snipe.api.enums import SigType
 from typing import Any, Dict, Iterator, List, Optional, Union


⚠️ Potential issue

Remove unused imports (Any, Optional)

The imports Any and Optional from the typing module are not used in the code. It's best practice to remove unused imports to keep the code clean.

Apply this diff to remove unused imports:

-from typing import Any, Dict, Iterator, List, Optional, Union +from typing import Dict, Iterator, List, Union

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

from typing import Any, Dict, Iterator, List, Optional, Union

from typing import Dict, Iterator, List, Union

🧰 Tools

🪛 Ruff

6-6: typing.Any imported but unused

Remove unused import

(F401)

6-6: typing.Optional imported but unused

Remove unused import

(F401)

coderabbitai · 2024-10-15T16:50:51Z

src/snipe/api/snipe_sig.py

+                with sourmash.save_load.SaveSignatures_ZipFile(path) as save_sigs:
+                    save_sigs.add(self.sourmash_sig)
+            except Exception as e:
+                logging.error("Failed to export signatures to zip: %s", e)


⚠️ Potential issue

Use self.logger.error instead of logging.error

At line 519, logging.error is used, which logs to the root logger. To maintain consistency and respect the logging configuration of the class, use self.logger.error instead.

Apply this diff:

- logging.error("Failed to export signatures to zip: %s", e) + self.logger.error("Failed to export signatures to zip: %s", e)

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

logging.error("Failed to export signatures to zip: %s", e)

self.logger.error("Failed to export signatures to zip: %s", e)

coderabbitai · 2024-10-15T16:50:51Z

src/snipe/api/snipe_sig.py

 from snipe.api.enums import SigType
 from typing import Any, Dict, Iterator, List, Optional, Union
 import numpy as np
 import sourmash
-
+import os


⚠️ Potential issue

Remove unused import of os module

The os module is imported but not used in the code. Removing unused imports improves code readability and efficiency.

Apply this diff to remove the unused import:

-import os

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

import os

coderabbitai · 2024-10-15T16:50:51Z

src/snipe/api/reference_QC.py

+        def sort_chromosomes(chr_name):
+            try:
+                # Try to convert to integer for numeric chromosomes
+                return (0, int(chr_name))
+            except ValueError:
+                # Non-numeric chromosomes (like 'x', 'y', 'z', etc.)
+                return (1, chr_name)
+


🛠️ Refactor suggestion

Consider simplifying the sort_chromosomes function

The sort_chromosomes function can be simplified for better readability and efficiency.

Consider using a more concise approach:

def sort_chromosomes(chr_name): return (0, int(chr_name)) if chr_name.isdigit() else (1, chr_name)

This leverages the str.isdigit() method to check if the chromosome name is numeric.

coderabbitai · 2024-10-15T16:50:51Z

src/snipe/api/reference_QC.py

-
-        self.chrs_stats.update(chr_to_mean_abundance)
+
+        # chromosomes are numberd from 1 to ..., sort them by numer (might be string for sex chromosomes) then prefix them with chr-


⚠️ Potential issue

Typographical errors in the comment

There are typographical errors in the comment. It should read "numbered" instead of "numberd" and "number" instead of "numer".

Apply this diff to correct the typos:

- # chromosomes are numberd from 1 to ..., sort them by numer (might be string for sex chromosomes) then prefix them with chr- + # Chromosomes are numbered from 1 to ..., sort them by number (might be string for sex chromosomes), then prefix them with 'chr-'

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

# chromosomes are numberd from 1 to ..., sort them by numer (might be string for sex chromosomes) then prefix them with chr-

# Chromosomes are numbered from 1 to ..., sort them by number (might be string for sex chromosomes), then prefix them with 'chr-'

mr-eyes added 2 commits October 15, 2024 09:09

minor fix in an assertion

e7fdb9a

Rerport handling

a46d927

mr-eyes merged commit 2675856 into main Oct 15, 2024
3 of 4 checks passed

coderabbitai bot reviewed Oct 15, 2024

View reviewed changes

mr-eyes deleted the minor_fixes_qc branch October 15, 2024 16:57

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Better handling for failures #12

Better handling for failures #12

mr-eyes commented Oct 15, 2024

coderabbitai bot commented Oct 15, 2024 •

edited

Loading

Chat

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (`.coderabbit.yaml`)

Documentation and Community

coderabbitai bot left a comment

coderabbitai bot Oct 15, 2024

coderabbitai bot Oct 15, 2024

coderabbitai bot Oct 15, 2024

coderabbitai bot Oct 15, 2024

coderabbitai bot Oct 15, 2024

coderabbitai bot Oct 15, 2024

coderabbitai bot Oct 15, 2024

coderabbitai bot Oct 15, 2024

coderabbitai bot Oct 15, 2024

coderabbitai bot Oct 15, 2024

	from typing import Any, Dict, Iterator, List, Optional, Union
	from typing import Dict, Iterator, List, Union

	logging.error("Failed to export signatures to zip: %s", e)
	self.logger.error("Failed to export signatures to zip: %s", e)


		self.chrs_stats.update(chr_to_mean_abundance)

		# chromosomes are numberd from 1 to ..., sort them by numer (might be string for sex chromosomes) then prefix them with chr-

	# chromosomes are numberd from 1 to ..., sort them by numer (might be string for sex chromosomes) then prefix them with chr-
	# Chromosomes are numbered from 1 to ..., sort them by number (might be string for sex chromosomes), then prefix them with 'chr-'

Better handling for failures #12

Better handling for failures #12

Conversation

mr-eyes commented Oct 15, 2024

coderabbitai bot commented Oct 15, 2024 • edited Loading

Walkthrough

Changes

Poem

Chat

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (.coderabbit.yaml)

Documentation and Community

coderabbitai bot left a comment

Choose a reason for hiding this comment

coderabbitai bot Oct 15, 2024

Choose a reason for hiding this comment

coderabbitai bot Oct 15, 2024

Choose a reason for hiding this comment

coderabbitai bot Oct 15, 2024

Choose a reason for hiding this comment

coderabbitai bot Oct 15, 2024

Choose a reason for hiding this comment

coderabbitai bot Oct 15, 2024

Choose a reason for hiding this comment

coderabbitai bot Oct 15, 2024

Choose a reason for hiding this comment

coderabbitai bot Oct 15, 2024

Choose a reason for hiding this comment

coderabbitai bot Oct 15, 2024

Choose a reason for hiding this comment

coderabbitai bot Oct 15, 2024

Choose a reason for hiding this comment

coderabbitai bot Oct 15, 2024

Choose a reason for hiding this comment

coderabbitai bot commented Oct 15, 2024 •

edited

Loading

CodeRabbit Configuration File (`.coderabbit.yaml`)