Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Safe inference and parallel bug fixes #55

Merged
merged 7 commits into from
Dec 31, 2024

Conversation

asiomchen
Copy link
Contributor

@asiomchen asiomchen commented Dec 28, 2024

Hi @EBjerrum, thank you and other contributors for making this package possible

I had a plan to contribute to the library during the Christmas holidays, but after forking it suddenly realized, that a lot of tests are failing and this PR aims to solve this prior to making any other contributions

Overview

This pull request includes various changes aimed at improving the handling of invalid SMILES strings, enhancing the SafeInferenceWrapper functionality, and updating test fixtures for better consistency. The most important changes include adjustments to chunk processing, updates to the SafeInferenceWrapper class, and modifications to test fixtures.

Enhancements to chunk processing:

  • scikit_mol/descriptors.py: Added a check to ensure n_chunks does not exceed the length of X to avoid empty chunks.
  • scikit_mol/fingerprints/baseclasses.py: Added a check to ensure n_chunks does not exceed the length of X to avoid empty chunks.

Updates to SafeInferenceWrapper:

In current version value for replace_value is not propagated from the SafeInferenceWrapper to the filter_invalid_rows causing the fill value to always be np.nan instead of actual value. Also when the input contains only invalid SMILES in the safe inference mode, empty array is passed to the estimator causing error

  • scikit_mol/safeinference.py: Added __all__ to export specific classes and functions, updated the filter_invalid_rows function to use replace_value from the class it is applied to, and added a check for replace_value in the SafeInferenceWrapper, added check for the case, where all the inputs are invalid in the safe inference mode

Modifications to test fixtures:

  • tests/fixtures.py: Replaced the invalid_smiles_list fixture with smiles_list_with_invalid to include both valid and invalid SMILES strings, added new invalid_smiles_list fixture to include only invalid smiles strings
  • tests/test_safeinferencemode.py: Updated tests to use the new smiles_list_with_invalid fixture and added new tests for handling single invalid SMILES and using different fill values

Other test updates:

  • tests/test_sanitizer.py: Updated tests to use the smiles_list_with_invalid fixture
  • tests/test_smilestomol.py: Updated tests to use the smiles_list_with_invalid fixture.

@EBjerrum
Copy link
Owner

Thanks. Been looking theough the code. Looks solid and some good changes! Thanks a lot!

@EBjerrum EBjerrum merged commit 30c74b3 into EBjerrum:main Dec 31, 2024
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants