-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Better handling for failures #12
Conversation
WalkthroughThe changes across the specified files enhance functionality, logging, error handling, and testing for genomic data processing. Key updates include new methods for calculating metrics related to sex chromosomes, improved logging and error handling across various components, and modifications to existing methods and properties to support better data management and analysis. Additionally, tests have been expanded and refined to ensure comprehensive coverage of the new and updated functionalities. Changes
Poem
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 13
🧹 Outside diff range and nitpick comments (5)
src/snipe/cli/cli_qc.py (1)
560-566
: Add encoding to CSV export to ensure consistent character handling.When writing the DataFrame to the file, specifying the
encoding
parameter can help prevent issues with non-ASCII characters.Apply this diff to specify the encoding:
# Write the DataFrame to the file - df.to_csv(f, sep='\t', index=False) + df.to_csv(f, sep='\t', index=False, encoding='utf-8')tests/api/test_reference_qc.py (1)
614-616
: Consider consolidating redundant tests forpredict_coverage
.The test method
test_predict_coverage_already_full_coverage
seems to overlap withtest_predict_coverage_does_not_exceed_one
. Both tests verify that the predicted coverage does not exceed1.0
when using largerextra_fold
values. To improve maintainability, consider combining these tests or clarifying their distinct purposes.src/snipe/api/snipe_sig.py (1)
510-510
: Remove commented-out codeLine 510 contains a commented-out reference to
sourmash.save_load.SaveSignatures_SigFile
, which is unnecessary. Removing commented code improves readability.Apply this diff:
-# sourmash.save_load.SaveSignatures_SigFile
src/snipe/api/reference_QC.py (2)
Line range hint
397-510
: Duplicate definition of_calculate_advanced_stats
methodThe
_calculate_advanced_stats
method is defined twice (lines 397-454 and lines 455-510). This duplication can lead to confusion and maintenance issues.Please remove the duplicate method definition. Ensure that the retained method includes all necessary logic.
- def _calculate_advanced_stats(self): - r""" - Calculate advanced statistics, such as median-trimmed metrics. - """ - self.logger.debug("Calculating advanced statistics.") - # [Rest of the duplicate method code]
Line range hint
1237-1253
: Missing return type in docstring forload_genome_sig_to_dict
The docstring for
load_genome_sig_to_dict
is missing descriptions for the return type and the functionality.Please update the docstring to include detailed information about the return value.
Apply this diff to improve the docstring:
def load_genome_sig_to_dict(self, *, zip_file_path: str, **kwargs) -> Dict[str, 'SnipeSig']: """ - Load a genome signature into a dictionary of SnipeSig instances. + Load genome and chromosome signatures from a zip file into a dictionary of SnipeSig instances. Parameters: zip_file_path (str): Path to the zip file containing the genome signatures. **kwargs: Additional keyword arguments to pass to the SnipeSig constructor. - Returns: + Returns: + Dict[str, SnipeSig]: A dictionary mapping chromosome names to their corresponding SnipeSig instances. """
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (4)
- src/snipe/api/reference_QC.py (2 hunks)
- src/snipe/api/snipe_sig.py (3 hunks)
- src/snipe/cli/cli_qc.py (4 hunks)
- tests/api/test_reference_qc.py (1 hunks)
🧰 Additional context used
🪛 Ruff
src/snipe/api/snipe_sig.py
6-6:
typing.Any
imported but unusedRemove unused import
(F401)
6-6:
typing.Optional
imported but unusedRemove unused import
(F401)
src/snipe/cli/cli_qc.py
17-17: Redefinition of unused
concurrent
from line 10Remove definition:
concurrent
(F811)
🔇 Additional comments (2)
src/snipe/api/reference_QC.py (2)
1029-1029
: Ensure consistent return type inpredict_coverage
Explicitly converting
predicted_coverage
tofloat
before returning ensures that the method consistently returns a float value, which can help prevent unexpected issues where integer division or other operations might result in an integer.
Line range hint
1155-1164
: Potential issue with variable initialization innonref_consume_from_vars
Within the
nonref_consume_from_vars
method, the variablessample_nonref_unique_hashes
andsample_nonref_var_coverage_index
may lead to division by zero errors ifsample_nonref_unique_hashes
is zero.Please ensure that
sample_nonref_unique_hashes
is not zero before performing division. Add a check to handle this scenario gracefully.
@@ -14,6 +14,8 @@ | |||
from snipe.api.snipe_sig import SnipeSig | |||
from snipe.api.reference_QC import ReferenceQC | |||
|
|||
import concurrent.futures |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove duplicate import of concurrent.futures
.
The module concurrent.futures
is already imported at line 10. This duplicate import is unnecessary and can be removed to clean up the imports.
Apply this diff to remove the duplicate import:
- import concurrent.futures
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
import concurrent.futures |
🧰 Tools
🪛 Ruff
17-17: Redefinition of unused
concurrent
from line 10Remove definition:
concurrent
(F811)
def shutdown(signum, frame): | ||
logger.warning("Shutdown signal received. Terminating all worker processes...") | ||
executor.shutdown(wait=False, cancel_futures=True) | ||
sys.exit(1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Handle potential NameError
for executor
in the shutdown
function.
The shutdown
function references executor
, which may not be defined if a shutdown signal is received before the ProcessPoolExecutor
is initialized. This can lead to a NameError
. To prevent this, check if executor
exists before attempting to shut it down.
Apply this diff to handle the potential undefined executor
:
def shutdown(signum, frame):
logger.warning("Shutdown signal received. Terminating all worker processes...")
+ try:
+ executor.shutdown(wait=False, cancel_futures=True)
+ except NameError:
+ logger.warning("Executor not initialized; skipping shutdown.")
sys.exit(1)
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
def shutdown(signum, frame): | |
logger.warning("Shutdown signal received. Terminating all worker processes...") | |
executor.shutdown(wait=False, cancel_futures=True) | |
sys.exit(1) | |
def shutdown(signum, frame): | |
logger.warning("Shutdown signal received. Terminating all worker processes...") | |
try: | |
executor.shutdown(wait=False, cancel_futures=True) | |
except NameError: | |
logger.warning("Executor not initialized; skipping shutdown.") | |
sys.exit(1) |
for future in tqdm(concurrent.futures.as_completed(futures), total=len(futures), desc="Processing samples"): | ||
sample = futures[future] | ||
try: | ||
result = future.result() | ||
results.append(result) | ||
except Exception as exc: | ||
logger.error(f"Sample {sample['sample_path']} generated an exception: {exc}") | ||
results.append({ | ||
"sample": os.path.splitext(os.path.basename(sample['sample_path']))[0], | ||
"file_path": sample['sample_path'], | ||
"QC_Error": str(exc) | ||
}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Improve exception handling during sample processing.
Currently, if an exception occurs in a worker process, it may not capture all relevant information about the failure. To enhance debugging and error reporting, consider capturing the traceback or more detailed exception information.
Apply this diff to include traceback information:
import traceback
for future in tqdm(concurrent.futures.as_completed(futures), total=len(futures), desc="Processing samples"):
sample = futures[future]
try:
result = future.result()
results.append(result)
except Exception as exc:
logger.error(f"Sample {sample['sample_path']} generated an exception: {exc}")
+ logger.debug(traceback.format_exc())
results.append({
"sample": os.path.splitext(os.path.basename(sample['sample_path']))[0],
"file_path": sample['sample_path'],
"QC_Error": str(exc)
})
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
for future in tqdm(concurrent.futures.as_completed(futures), total=len(futures), desc="Processing samples"): | |
sample = futures[future] | |
try: | |
result = future.result() | |
results.append(result) | |
except Exception as exc: | |
logger.error(f"Sample {sample['sample_path']} generated an exception: {exc}") | |
results.append({ | |
"sample": os.path.splitext(os.path.basename(sample['sample_path']))[0], | |
"file_path": sample['sample_path'], | |
"QC_Error": str(exc) | |
}) | |
import traceback | |
for future in tqdm(concurrent.futures.as_completed(futures), total=len(futures), desc="Processing samples"): | |
sample = futures[future] | |
try: | |
result = future.result() | |
results.append(result) | |
except Exception as exc: | |
logger.error(f"Sample {sample['sample_path']} generated an exception: {exc}") | |
logger.debug(traceback.format_exc()) | |
results.append({ | |
"sample": os.path.splitext(os.path.basename(sample['sample_path']))[0], | |
"file_path": sample['sample_path'], | |
"QC_Error": str(exc) | |
}) |
command_invocation = ' '.join(sys.argv) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ensure sensitive information is not logged in the command invocation comment.
Writing the full command-line invocation, including all arguments, to the output file may inadvertently expose sensitive information (e.g., file paths, user data). Consider sanitizing the command or omitting sensitive details before writing it to the output file.
Apply this diff to sanitize the command invocation:
command_invocation = ' '.join(sys.argv)
+# Sanitize command invocation to remove sensitive information if necessary
+command_invocation = sanitize_command(command_invocation)
Implement a sanitize_command
function to handle the sanitization logic as needed.
Committable suggestion was skipped due to low confidence.
except KeyboardInterrupt: | ||
logger.warning("KeyboardInterrupt received. Shutting down...") | ||
sys.exit(1) | ||
except Exception as e: | ||
logger.error(f"An unexpected error occurred: {e}") | ||
sys.exit(1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Remove redundant KeyboardInterrupt
exception handling.
Since signal handlers for SIGINT
(triggered by Ctrl+C
) and SIGTERM
are already registered, the except KeyboardInterrupt
block may be redundant. Additionally, KeyboardInterrupt
may not be raised as expected when using ProcessPoolExecutor
.
Apply this diff to remove the redundant exception handling:
except Exception as e:
logger.error(f"An unexpected error occurred: {e}")
sys.exit(1)
- except KeyboardInterrupt:
- logger.warning("KeyboardInterrupt received. Shutting down...")
- sys.exit(1)
Committable suggestion was skipped due to low confidence.
@@ -1,10 +1,12 @@ | |||
import heapq | |||
import logging | |||
|
|||
import sourmash.save_load | |||
from snipe.api.enums import SigType | |||
from typing import Any, Dict, Iterator, List, Optional, Union |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove unused imports (Any
, Optional
)
The imports Any
and Optional
from the typing
module are not used in the code. It's best practice to remove unused imports to keep the code clean.
Apply this diff to remove unused imports:
-from typing import Any, Dict, Iterator, List, Optional, Union
+from typing import Dict, Iterator, List, Union
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
from typing import Any, Dict, Iterator, List, Optional, Union | |
from typing import Dict, Iterator, List, Union |
🧰 Tools
🪛 Ruff
6-6:
typing.Any
imported but unusedRemove unused import
(F401)
6-6:
typing.Optional
imported but unusedRemove unused import
(F401)
with sourmash.save_load.SaveSignatures_ZipFile(path) as save_sigs: | ||
save_sigs.add(self.sourmash_sig) | ||
except Exception as e: | ||
logging.error("Failed to export signatures to zip: %s", e) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use self.logger.error
instead of logging.error
At line 519, logging.error
is used, which logs to the root logger. To maintain consistency and respect the logging configuration of the class, use self.logger.error
instead.
Apply this diff:
- logging.error("Failed to export signatures to zip: %s", e)
+ self.logger.error("Failed to export signatures to zip: %s", e)
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
logging.error("Failed to export signatures to zip: %s", e) | |
self.logger.error("Failed to export signatures to zip: %s", e) |
from snipe.api.enums import SigType | ||
from typing import Any, Dict, Iterator, List, Optional, Union | ||
import numpy as np | ||
import sourmash | ||
|
||
import os |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove unused import of os
module
The os
module is imported but not used in the code. Removing unused imports improves code readability and efficiency.
Apply this diff to remove the unused import:
-import os
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
import os |
def sort_chromosomes(chr_name): | ||
try: | ||
# Try to convert to integer for numeric chromosomes | ||
return (0, int(chr_name)) | ||
except ValueError: | ||
# Non-numeric chromosomes (like 'x', 'y', 'z', etc.) | ||
return (1, chr_name) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Consider simplifying the sort_chromosomes
function
The sort_chromosomes
function can be simplified for better readability and efficiency.
Consider using a more concise approach:
def sort_chromosomes(chr_name):
return (0, int(chr_name)) if chr_name.isdigit() else (1, chr_name)
This leverages the str.isdigit()
method to check if the chromosome name is numeric.
|
||
self.chrs_stats.update(chr_to_mean_abundance) | ||
|
||
# chromosomes are numberd from 1 to ..., sort them by numer (might be string for sex chromosomes) then prefix them with chr- |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typographical errors in the comment
There are typographical errors in the comment. It should read "numbered" instead of "numberd" and "number" instead of "numer".
Apply this diff to correct the typos:
- # chromosomes are numberd from 1 to ..., sort them by numer (might be string for sex chromosomes) then prefix them with chr-
+ # Chromosomes are numbered from 1 to ..., sort them by number (might be string for sex chromosomes), then prefix them with 'chr-'
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
# chromosomes are numberd from 1 to ..., sort them by numer (might be string for sex chromosomes) then prefix them with chr- | |
# Chromosomes are numbered from 1 to ..., sort them by number (might be string for sex chromosomes), then prefix them with 'chr-' |
No description provided.