Safe inference and parallel bug fixes #55

asiomchen · 2024-12-28T21:15:24Z

Hi @EBjerrum, thank you and other contributors for making this package possible

I had a plan to contribute to the library during the Christmas holidays, but after forking it suddenly realized, that a lot of tests are failing and this PR aims to solve this prior to making any other contributions

Overview

This pull request includes various changes aimed at improving the handling of invalid SMILES strings, enhancing the SafeInferenceWrapper functionality, and updating test fixtures for better consistency. The most important changes include adjustments to chunk processing, updates to the SafeInferenceWrapper class, and modifications to test fixtures.

Enhancements to chunk processing:

scikit_mol/descriptors.py: Added a check to ensure n_chunks does not exceed the length of X to avoid empty chunks.
scikit_mol/fingerprints/baseclasses.py: Added a check to ensure n_chunks does not exceed the length of X to avoid empty chunks.

Updates to `SafeInferenceWrapper`:

In current version value for replace_value is not propagated from the SafeInferenceWrapper to the filter_invalid_rows causing the fill value to always be np.nan instead of actual value. Also when the input contains only invalid SMILES in the safe inference mode, empty array is passed to the estimator causing error

scikit_mol/safeinference.py: Added __all__ to export specific classes and functions, updated the filter_invalid_rows function to use replace_value from the class it is applied to, and added a check for replace_value in the SafeInferenceWrapper, added check for the case, where all the inputs are invalid in the safe inference mode

Modifications to test fixtures:

tests/fixtures.py: Replaced the invalid_smiles_list fixture with smiles_list_with_invalid to include both valid and invalid SMILES strings, added new invalid_smiles_list fixture to include only invalid smiles strings
tests/test_safeinferencemode.py: Updated tests to use the new smiles_list_with_invalid fixture and added new tests for handling single invalid SMILES and using different fill values

Other test updates:

tests/test_sanitizer.py: Updated tests to use the smiles_list_with_invalid fixture
tests/test_smilestomol.py: Updated tests to use the smiles_list_with_invalid fixture.

…redictions

…ctually use replace_value parameter

EBjerrum · 2024-12-31T10:51:40Z

Thanks. Been looking theough the code. Looks solid and some good changes! Thanks a lot!

asiomchen added 7 commits December 28, 2024 21:15

Adjust chunk size in multiprocessing to avoid empty chunks

b43494b

Adjust multiprocessing chunk size for MolecularDescriptorTransformer

aa3a90f

Add fixture for trained smiles_pipeline and test for invalid SMILES p…

5849448

…redictions

improve safe inference for single mol inputs

22cac6e

Create separate fixtures for fully and partially invalid smiles

632f87c

Refactor safe inference handling by updating filter_invalid_rows to a…

941a0d6

…ctually use replace_value parameter

Update safe inference tests

87a66c1

EBjerrum merged commit 30c74b3 into EBjerrum:main Dec 31, 2024
8 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Safe inference and parallel bug fixes #55

Safe inference and parallel bug fixes #55

asiomchen commented Dec 28, 2024 •

edited

Loading

EBjerrum commented Dec 31, 2024

Safe inference and parallel bug fixes #55

Safe inference and parallel bug fixes #55

Conversation

asiomchen commented Dec 28, 2024 • edited Loading

Overview

Enhancements to chunk processing:

Updates to SafeInferenceWrapper:

Modifications to test fixtures:

Other test updates:

EBjerrum commented Dec 31, 2024

asiomchen commented Dec 28, 2024 •

edited

Loading

Updates to `SafeInferenceWrapper`: