Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Migration to joblib + new build system + separate reusable publish workflow #58

Open
wants to merge 24 commits into
base: main
Choose a base branch
from

Conversation

asiomchen
Copy link
Contributor

@asiomchen asiomchen commented Jan 12, 2025

This pull request introduces several updates to the parallel processing, GitHub workflows, project configuration, and documentation for the scikit-mol project.

Migration to joblib

Solves #59

  • multiprocessing was fully replaced with joblib and separate function in parallel.py added to handle the parallel processing, this decreases tests run time on Windows from ~7 minutes to ~1.5 minutes
  • parallel was replaced with n_jobs to match sklearn API
  • ploting.py was added for parallel benchmarks
  • notebook 7 updated with new benchmarks

Fix safe inference on parallel runs

When the safe inference mode is enabled and results are concatenated in case on fully valid/invalid inputs np.ma.concatenate simplifies mask to the single boolean, causing filter_invalid_rows to fail, additional condition was added to handle this

Fix fit check for transformers and SafeInferenceWrapper

  • Mixin was added to ensure that sklearn think, that transformers are already fitted (because they don't need fitting)
  • Added custom method to the SafeInferenceWrapper to ensure, that it's fit check is based on wrapped estimator

Python Support

  • dropped support for python 3.8 - we are already past python 3.8 end of life and leaving it will not be any good

GitHub Workflows:

  • .github/workflows/publish.yaml: Added a new workflow to build and publish Python distributions using uv and sign them with Sigstore. This workflow handles building the distribution, storing artifacts, and creating GitHub releases.
  • .github/workflows/pytest.yaml: Renamed from run_pytests.yaml and updated to include concurrency control, install uv, and use the new publish workflow for tagged commits. [1] [2]
  • .github/workflows/welcome.yaml: Added a new workflow to welcome new contributors when they open issues or pull requests.

Project Configuration:

  • pyproject.toml: Migrated the build system to use hatchling and hatch-vcs, updated project metadata, and added dependencies and optional dependencies for development. [1] [2]

Documentation:

  • CONTRIBUTING.md: Updated the release instructions to reflect the new automated workflow for PyPi releases.
  • README.md: Updated the logo display to use the <picture> element for better support of dark and light modes on PyPi

@asiomchen asiomchen marked this pull request as draft January 25, 2025 16:21
@asiomchen asiomchen changed the title New build system + separate reusable publish workflow Migration to joblib + new build system + separate reusable publish workflow Jan 25, 2025
@asiomchen asiomchen marked this pull request as ready for review January 25, 2025 16:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant