Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add tolerance to enforce strict inequalities in linear tree formulations #163

Merged
merged 9 commits into from
Aug 22, 2024

Conversation

emma58
Copy link
Contributor

@emma58 emma58 commented Aug 21, 2024

This PR adds a tolerance at which to enforce ``strict'' inequalities in linear model trees: That is, the right branch will require that the feature value be greater than or equal to the bound plus this tolerance (epsilon). This means that users can tune epsilon in order to ensure that the MIP solution will match the tree prediction.

Additionally, the PR simplifies the implementation of the hybrid bigm linear tree formulation by using two modern pyomo.gdp transformations. This does mean that the linear tree formulations will rely on pyomo>=6.7.1 though, if that's okay.

Legal Acknowledgement
By contributing to this software project, I agree my contributions are submitted under the BSD license.
I represent I am authorized to make the contributions and grant the license.
If my employer has rights to intellectual property that includes these contributions,
I represent that I have received permission to make contributions and grant the required license on behalf of that employer.

Copy link

codecov bot commented Aug 21, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 90.27%. Comparing base (d43643a) to head (d098622).
Report is 3 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #163      +/-   ##
==========================================
- Coverage   90.36%   90.27%   -0.09%     
==========================================
  Files          32       32              
  Lines        2034     2005      -29     
  Branches      471      460      -11     
==========================================
- Hits         1838     1810      -28     
  Misses        123      123              
+ Partials       73       72       -1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.


🚨 Try these New Features:

@jalving
Copy link
Collaborator

jalving commented Aug 22, 2024

looks like a few minor ruff errors. you can try using # noqa: PLR0913 for the too many arguments one.

@jalving
Copy link
Collaborator

jalving commented Aug 22, 2024

looks good to me. thanks for contributing @emma58!

@jalving jalving merged commit c6d274f into cog-imperial:main Aug 22, 2024
16 of 17 checks passed
@emma58 emma58 deleted the add-linear-tree-epsilon branch August 22, 2024 23:49
jezsadler added a commit to jezsadler/OMLT that referenced this pull request Sep 27, 2024
commit 321a2e2
Author: Jiří Němeček <[email protected]>
Date:   Sat Aug 24 19:15:47 2024 +0200

    Fixing 404 errors of links to notebooks in the documentation (cog-imperial#143)

    I assume that the notebooks have been moved, but the documentation links
    did not reflect that

    **Legal Acknowledgement**\
    By contributing to this software project, I agree my contributions are
    submitted under the BSD license.
    I represent I am authorized to make the contributions and grant the
    license.
    If my employer has rights to intellectual property that includes these
    contributions,
    I represent that I have received permission to make contributions and
    grant the required license on behalf of that employer.

commit caebfc4
Author: Andrew Lee <[email protected]>
Date:   Thu Aug 22 13:28:24 2024 -0400

    Replace _BlockData with BlockData (cog-imperial#144)

    Pyomo recently made ComponentData classes public
    (Pyomo/pyomo#3221) which will be part of the
    upcoming release. Currently, this causes the following error to occur in
    OMLT:

    ```
    TypeError: metaclass conflict: the metaclass of a derived class must be a (non-strict) subclass of the metaclasses of all its bases
    ```

    The Pyomo team is working to try to address this issue, however OMLT
    should update its code to address this as otherwise deprecation warnings
    will be emitted when using the old class names.

    The fix is to replace all instances of `_BlockData` with `BlockData`
    (just removing the underscore) - this applies to any other instance of
    Pyomo component data objects as well (although I could only find 2
    instances of these in the OMLT code).

    **Legal Acknowledgement**\
    By contributing to this software project, I agree my contributions are
    submitted under the BSD license.
    I represent I am authorized to make the contributions and grant the
    license.
    If my employer has rights to intellectual property that includes these
    contributions,
    I represent that I have received permission to make contributions and
    grant the required license on behalf of that employer.

    Co-authored-by: jalving <[email protected]>

commit c6d274f
Author: Emma Johnson <[email protected]>
Date:   Thu Aug 22 10:56:10 2024 -0400

    Add tolerance to enforce strict inequalities in linear tree formulations (cog-imperial#163)

    This PR adds a tolerance at which to enforce ``strict'' inequalities in
    linear model trees: That is, the right branch will require that the
    feature value be greater than or equal to the bound plus this tolerance
    (epsilon). This means that users can tune epsilon in order to ensure
    that the MIP solution will match the tree prediction.

    Additionally, the PR simplifies the implementation of the hybrid bigm
    linear tree formulation by using two modern pyomo.gdp transformations.
    This does mean that the linear tree formulations will rely on
    pyomo>=6.7.1 though, if that's okay.

    **Legal Acknowledgement**\
    By contributing to this software project, I agree my contributions are
    submitted under the BSD license.
    I represent I am authorized to make the contributions and grant the
    license.
    If my employer has rights to intellectual property that includes these
    contributions,
    I represent that I have received permission to make contributions and
    grant the required license on behalf of that employer.

    ---------

    Co-authored-by: Emma Johnson <[email protected]>

commit d43643a
Author: Lukas Turcani <[email protected]>
Date:   Tue Aug 20 23:53:51 2024 +0100

    Clean up package boilerplate (cog-imperial#149)

    This PR does a couple of things to clean up the boilerplate related to
    packaging OMLT, see sections below for detailed explanations of the
    changes.

    * Remove `setup.cfg` , `setup.py`, `docs/requirements.txt`, `tox.ini` in
    favour of `pyproject.toml`.
    * Place `conda` requirements into `environment.yml`
    * Create new workflows `tests.yml` and `publish_release.yml`
    * Add quality checks using `ruff`, `mypy`, `doctest`
    * Use `just` for developer experience
    * Updated the `Development` section of `README`  to talk about `just`
    * Clean up `conf.py`
    * Move `pull_request_template.md`
    * Allow publishing of package to pypi by pushing a new version tag

    # Other comments

    * consider internal package structure
    * force squash merge of PRs - this keeps git history for the `main`
    branch nice and clean

    # Using `pyproject.toml`

    `pyrpoject.toml` is the simplest way to provide package metadata for a
    Python package. It is easy to read and also provides sections for
    configurating tools such as `pytest`, `ruff` and `mypy` all in one
    place. It works seamlessly with the modern Python ecosystem.

    I set up `pyproject.toml` to automactically detect the version of the
    code from git tags. No need to duplicate version numbers across the
    repo. Just add a new tag and everything will be updated. In addition,
    when a new git tag is pushed to the GitHub repo, the new
    `publish_release` workflow will be triggered and a new PYPI version
    released. (See more on this below).

    I also set it up so that the version is automatically added to a file
    called `src/omlt/_version.py` which holds the `__version__` variable.
    this file is autogenerated and therefore added to `.gitignore`. The
    `__version__` veriable is then re-exported in `src/omlt/__init__.py` so
    that our users have access to it.

    I tried to perserve all the information stored in the `setup.cfg` and
    other deleted files -- let me know if there is something i missed!

    ## Optional dependencies

    The `pyproject.toml` file allows the creation of optional dependencies.
    For example, our users can install

    ```bash
    pip install omlt[keras]
    # or
    pip install omlt[torch]
    # or
    pip install omlt[linear-tree,keras-gpu]
    ```
    Ofc any combination of optional dependencies is valid too. This allows
    our users to install the dependencies specific to their use case. Note
    that:

    * I made `onnx` and `onnxruntime` a required dependency because from my
    understanding it is almost always used
    * I added an optinoal dependency set called `dev` which developers can
    use to install all developer tools and all dependencies -- you need this
    to run all the tests for example
    * There is also `dev-gpu` which installs the GPU version of tensorflow
    in case the developer has a GPU

    The available optional dependencies are:

    * `linear-tree`, installs the linear tree dependency
    * `keras`, installs tensorflow and keras
    * `keras-gpu`, installs tensorflow for the gpu and keras
    * `torch`, installs torch and torch geometric
    * `dev-tools` - this is not to be used directly but allows easy re-use
    of dev tools in other optional dependencies, namely dev and dev-gpu
    * `docs` - installs dependencies required to compile docs
    * `dev` - dependecies needed for developing the project, such tooling
    * `dev-gpu` - same as dev but installed with gpu support

    Our documentation probably needs to be updated to tell users they wanna
    install omlt with some combination of `linear-tree`, `keras`,
    `keras-gpu`, `torch` optional dependencies depending on what features of
    the package they are using

    # Quality checks with `ruff`, `mypy` and `doctest`

    I've enabled `ruff`, `mypy` and `doctest`. Currently there are no
    doctests, but its good to have it set up so that it runs in case any are
    added in the future.

    Both `ruff` and `mypy` are failing because there are a number of things
    which need to fixed. For both `ruff` and `mypy` I have disabled some
    checks which it would be good to enable eventually but are probably a
    fair amount of work to fix -- these have comments in `pyproject.toml`.
    The remaining failing checks are ones which I would reccomend fixing
    ASAP. There's two approaches, merge now and fix these errors later. Or
    keep a separate branch where these are incrementally fixed. Up to you to
    decide what you prefer.

    I told ruff to check for `google` style docstrings. I think these are
    the best because they have good readbility and work the best with type
    hints in my opinion.

    # Using `just` instead of `tox`

    https://github.com/casey/just is a simple command runner. It allows the
    developers to define and re-use common operations, for example I can
    define a `check` recipe and then run

    ```bash
    just check
    ```

    in my command line and it will run all the tests. The beauty of this is
    that `just` is extremely simple. If you read the file its basically a
    sequence of bash instructions for each recipe. This makes the `recipes`
    really transparent, and easy to understand, and works as
    code-as-documentation. Users can just read the recipe and run the
    commands one by one to get the same effect without having `just`
    installed. There is no magic which helps with debugging issues. It's
    also language agnostic. `just` comes as a small stand-alone binary,
    which makes it a very non-intrusive tool to have on your computer that
    does not need any dependencies.

    The downside is that it does not provide automatic management for Python
    environments, which I belive tox does provide. The other side of this is
    that we allow developers to use their favorite tools for managing venvs
    rather than proscribing certain tools for this repo. (the difference
    with `just` being that it is essentially optional tool and also serving
    as documentation)

    I may be overly opinionated on this one, so feel free to push back.

    # Cleaning up `docs/conf.py`

    I removed a bunch of the commented out code. This makes it easier to see
    what the configuration is and also prevents the commented out options
    from becoming out of date when a new release of sphinx is made.

    # Moving `pull_request_template.md`

    I moved this into the `.github` folder because it is GitHub
    configuration. Very optional, but makes more sense to me.

    # `readthedocs` automated action

    this guide
    https://docs.readthedocs.io/en/stable/guides/pull-requests.html shows
    how to set it up. requires admin permissions on readthedocs -- can jump
    on a call to help with this

    # publishing with to `PYPI` with a git tag

    for this an API key for PYPI needs to be created and added to the repos
    secrets -- can jump on a call to help with this

    # consider `_internal` package structure

    One way to make it easier to manage private vs public code in a
    repository is to create an `_internal` folder where all the code goes.
    This way all code can be shared easily and moved between modules and its
    by default private, so changes to internal code does not break users.
    Public modules then just re-export code in the `_internal` submodules.
    You can see an example of this structure here
    https://github.com/lukasturcani/stk. Not a huge issue but I find it very
    helpful for managing what things are actually exposed to users the
    code-base grows.

    **Legal Acknowledgement**\
    By contributing to this software project, I agree my contributions are
    submitted under the BSD license.
    I represent I am authorized to make the contributions and grant the
    license.
    If my employer has rights to intellectual property that includes these
    contributions,
    I represent that I have received permission to make contributions and
    grant the required license on behalf of that employer.

    ---------

    Co-authored-by: Jeremy Sadler <[email protected]>
jezsadler added a commit to jezsadler/OMLT that referenced this pull request Oct 3, 2024
commit 877aac1
Author: Jeremy Sadler <[email protected]>
Date:   Thu Oct 3 01:14:21 2024 +0000

    Cleaning up integrated changes

commit 6dfd2c0
Author: Jeremy Sadler <[email protected]>
Date:   Fri Sep 27 22:32:48 2024 +0000

    Making blocks more generic

commit 164b179
Author: Jeremy Sadler <[email protected]>
Date:   Wed Sep 18 23:54:45 2024 +0000

    Cleaned up some unnecessary methods.

commit d42cae2
Author: Jeremy Sadler <[email protected]>
Date:   Sun Aug 18 00:00:00 2024 +0000

    Fixing indexed variables

commit 6c648ca
Author: Jeremy Sadler <[email protected]>
Date:   Fri Aug 2 19:52:09 2024 +0000

    Removing Julia pieces (for now) and more mypy cleanup

commit 905e528
Author: Jeremy Sadler <[email protected]>
Date:   Thu Aug 1 20:06:31 2024 +0000

    Factory classes for vars and constraints

commit 8b18e97
Author: Jeremy Sadler <[email protected]>
Date:   Mon Jul 22 23:41:25 2024 +0000

    Improving test coverage

commit 069c3ef
Author: Jeremy Sadler <[email protected]>
Date:   Mon Jul 15 21:49:18 2024 +0000

    Adding tests

commit fe433a3
Author: Jeremy Sadler <[email protected]>
Date:   Thu Jul 11 22:40:17 2024 +0000

    Moving JuMP objects into their own file

commit 79af205
Author: Jeremy Sadler <[email protected]>
Date:   Tue Jul 9 21:20:36 2024 +0000

    Fixing an issue with linear trees

commit 8c459f1
Author: Jeremy Sadler <[email protected]>
Date:   Tue Jul 9 19:43:38 2024 +0000

    Making block-level modelling language choices percolate through generated variables and constraints

commit e01a352
Author: Jeremy Sadler <[email protected]>
Date:   Mon Jul 8 23:58:06 2024 +0000

    Including OmltExpr and OmltConstraints, spreading Omlt classes throughout the codebase.

commit c5b866b
Author: Jeremy Sadler <[email protected]>
Date:   Thu Jun 6 20:56:23 2024 +0000

    linting (2)

commit 98518f8
Author: Jeremy Sadler <[email protected]>
Date:   Thu Jun 6 20:42:30 2024 +0000

    linting (1)

commit 6e5292d
Author: Jeremy Sadler <[email protected]>
Date:   Thu Jun 6 13:33:16 2024 -0700

    Delete .github/workflows/python-package.yml

commit 7adf6e4
Author: Jeremy Sadler <[email protected]>
Date:   Thu Jun 6 20:22:56 2024 +0000

    adding abstract methods to expression interface

commit 1a8c124
Author: Jeremy Sadler <[email protected]>
Date:   Thu Jun 6 19:06:17 2024 +0000

    further fixing

commit 2764df1
Author: Jeremy Sadler <[email protected]>
Date:   Thu Jun 6 18:58:51 2024 +0000

    fixing variable initialization

commit dd69394
Author: Jeremy Sadler <[email protected]>
Date:   Thu Jun 6 18:47:01 2024 +0000

    tidying var.py

commit 6e141d4
Author: Jeremy Sadler <[email protected]>
Date:   Thu Jun 6 18:22:30 2024 +0000

    cleanup in expression.py

commit ef0885b
Author: Jeremy Sadler <[email protected]>
Date:   Wed Jun 5 20:01:17 2024 +0000

    Including OmltExpr expressions for the OmltVars

commit a64f6d7
Author: Jeremy Sadler <[email protected]>
Date:   Fri May 17 11:33:14 2024 -0700

    Update setup.cfg

commit 83ccaef
Author: Jeremy Sadler <[email protected]>
Date:   Sun Apr 21 18:05:47 2024 -0700

    Update setup.cfg

commit 63f0e5f
Author: Jeremy Sadler <[email protected]>
Date:   Mon Mar 18 22:41:10 2024 -0700

    Create python-package.yml

commit ea1154c
Author: Jeremy Sadler <[email protected]>
Date:   Fri May 17 11:45:42 2024 -0700

    Update main.yml

commit 7eecd26
Author: Jeremy Sadler <[email protected]>
Date:   Fri May 17 11:44:36 2024 -0700

    Update main.yml

commit f844c2d
Author: Jeremy Sadler <[email protected]>
Date:   Fri May 17 11:42:42 2024 -0700

    Update main.yml

commit 9ab7fc3
Author: Jeremy Sadler <[email protected]>
Date:   Fri May 17 11:40:35 2024 -0700

    Update main.yml

commit ab25542
Author: Jeremy Sadler <[email protected]>
Date:   Fri May 17 11:39:03 2024 -0700

    Update setup.cfg for Keras version

commit 61c8daf
Author: Jeremy Sadler <[email protected]>
Date:   Fri May 17 11:38:30 2024 -0700

    Update Python versions in main.yml

commit 0ae5b75
Author: Jeremy Sadler <[email protected]>
Date:   Mon Apr 22 00:35:06 2024 +0000

    Fixing some whitespace linting

commit cbcefcb
Author: Jeremy Sadler <[email protected]>
Date:   Mon Apr 22 00:20:48 2024 +0000

    restoring action workflow file

commit c911bb0
Author: Jeremy Sadler <[email protected]>
Date:   Mon Apr 22 00:16:13 2024 +0000

    removing tweaked action file

commit c929d54
Author: Jeremy Sadler <[email protected]>
Date:   Sat Apr 20 23:21:18 2024 -0700

    Fix Keras version at 2.9

    Keras 3 requires models to have the .keras file format. Going forward we should probably update the test models to use this format, but to unblock I'm holding back the Keras version.

commit 738f7fd
Author: Jeremy Sadler <[email protected]>
Date:   Sat Apr 20 23:01:19 2024 -0700

    Use tensorflow-cpu for testing to save space

commit c4ab257
Author: Jeremy Sadler <[email protected]>
Date:   Fri Apr 19 17:43:43 2024 -0700

    Make test for JuMP variables conditional on presence of JuMP

commit 09c9945
Author: Jeremy Sadler <[email protected]>
Date:   Fri Apr 19 17:35:36 2024 -0700

    Update var.py

commit 991dd37
Author: Jeremy Sadler <[email protected]>
Date:   Fri Apr 19 17:29:08 2024 -0700

    Update var.py

commit b57848a
Author: Jeremy Sadler <[email protected]>
Date:   Fri Apr 19 16:58:01 2024 -0700

    Getting dependencies lined up correctly

commit 1490f42
Author: Jeremy Sadler <[email protected]>
Date:   Fri Apr 19 16:52:08 2024 -0700

    Removing duplicate line

commit ef42ba3
Author: Jeremy Sadler <[email protected]>
Date:   Fri Apr 19 19:19:29 2024 +0000

    Cleaning up variables - MOI dependency

commit fa62661
Author: Jeremy Sadler <[email protected]>
Date:   Fri Apr 19 19:19:29 2024 +0000

    Cleaning up variables - MOI dependency

commit 5dae012
Author: Jeremy Sadler <[email protected]>
Date:   Fri Apr 19 19:19:29 2024 +0000

    Cleaning up variables

commit 29b89bc
Author: Jeremy Sadler <[email protected]>
Date:   Mon Apr 8 18:28:49 2024 +0000

    Implementing JuMP format scalar and indexed
    variables.

commit 3c20611
Author: Jeremy Sadler <[email protected]>
Date:   Tue Mar 19 00:58:12 2024 -0700

    Removing ipopt from CI workflow

commit 6e36c47
Author: Jeremy Sadler <[email protected]>
Date:   Mon Mar 18 22:16:04 2024 -0700

    Create main.yml

    copying CI workflow over

commit 9178a1b
Author: Jeremy Sadler <[email protected]>
Date:   Tue Mar 19 02:05:50 2024 +0000

    OmltVar wrapper class

commit 0e86c9f
Author: Jeremy Sadler <[email protected]>
Date:   Tue Mar 19 02:05:50 2024 +0000

    OmltVar wrapper class

commit 7515f57
Author: Jeremy Sadler <[email protected]>
Date:   Mon Jun 24 05:29:48 2024 +0000

    Fixing mypy typing errors

commit 7bb6f0d
Author: Jeremy Sadler <[email protected]>
Date:   Mon Jun 24 05:29:48 2024 +0000

    Fixing mypy typing errors

commit ce6a944
Author: Jeremy Sadler <[email protected]>
Date:   Sun Jun 23 00:27:31 2024 +0000

    Fixing ruff linting errors.

commit 8a44751
Author: Jeremy Sadler <[email protected]>
Date:   Thu Jun 13 22:08:18 2024 +0000

    Fixing initial batch of ruff errors

commit 0379ec6
Author: Lukas Turcani <[email protected]>
Date:   Thu May 30 18:32:52 2024 +0100

    Add back for mypy

commit 2b0f991
Author: Lukas Turcani <[email protected]>
Date:   Thu May 30 18:31:55 2024 +0100

    remove unnecessary things

commit 551530d
Author: Lukas Turcani <[email protected]>
Date:   Thu May 30 18:21:20 2024 +0100

    wip

commit 4c40b8b
Author: Lukas Turcani <[email protected]>
Date:   Thu May 30 18:20:48 2024 +0100

    thing

commit 8042185
Author: Lukas Turcani <[email protected]>
Date:   Thu May 30 18:20:00 2024 +0100

    wip

commit fd4ef72
Author: Lukas Turcani <[email protected]>
Date:   Thu May 30 18:18:08 2024 +0100

    add link

commit 5492ddb
Author: Lukas Turcani <[email protected]>
Date:   Thu May 30 18:03:26 2024 +0100

    wip

commit 3d056e9
Author: Lukas Turcani <[email protected]>
Date:   Thu May 30 12:57:50 2024 +0100

    Add thing

commit 0790eb8
Author: Lukas Turcani <[email protected]>
Date:   Thu May 30 11:37:53 2024 +0100

    Thing

commit e735707
Author: Lukas Turcani <[email protected]>
Date:   Thu May 30 11:34:35 2024 +0100

    Add conda

commit cc07a30
Author: Lukas Turcani <[email protected]>
Date:   Wed May 29 16:16:38 2024 +0100

    wip

commit c58ec68
Author: Lukas Turcani <[email protected]>
Date:   Tue May 28 22:41:59 2024 +0100

    wip

commit 0a2671f
Author: Lukas Turcani <[email protected]>
Date:   Tue May 28 22:20:10 2024 +0100

    Add workflows

commit 321a2e2
Author: Jiří Němeček <[email protected]>
Date:   Sat Aug 24 19:15:47 2024 +0200

    Fixing 404 errors of links to notebooks in the documentation (cog-imperial#143)

    I assume that the notebooks have been moved, but the documentation links
    did not reflect that

    **Legal Acknowledgement**\
    By contributing to this software project, I agree my contributions are
    submitted under the BSD license.
    I represent I am authorized to make the contributions and grant the
    license.
    If my employer has rights to intellectual property that includes these
    contributions,
    I represent that I have received permission to make contributions and
    grant the required license on behalf of that employer.

commit caebfc4
Author: Andrew Lee <[email protected]>
Date:   Thu Aug 22 13:28:24 2024 -0400

    Replace _BlockData with BlockData (cog-imperial#144)

    Pyomo recently made ComponentData classes public
    (Pyomo/pyomo#3221) which will be part of the
    upcoming release. Currently, this causes the following error to occur in
    OMLT:

    ```
    TypeError: metaclass conflict: the metaclass of a derived class must be a (non-strict) subclass of the metaclasses of all its bases
    ```

    The Pyomo team is working to try to address this issue, however OMLT
    should update its code to address this as otherwise deprecation warnings
    will be emitted when using the old class names.

    The fix is to replace all instances of `_BlockData` with `BlockData`
    (just removing the underscore) - this applies to any other instance of
    Pyomo component data objects as well (although I could only find 2
    instances of these in the OMLT code).

    **Legal Acknowledgement**\
    By contributing to this software project, I agree my contributions are
    submitted under the BSD license.
    I represent I am authorized to make the contributions and grant the
    license.
    If my employer has rights to intellectual property that includes these
    contributions,
    I represent that I have received permission to make contributions and
    grant the required license on behalf of that employer.

    Co-authored-by: jalving <[email protected]>

commit c6d274f
Author: Emma Johnson <[email protected]>
Date:   Thu Aug 22 10:56:10 2024 -0400

    Add tolerance to enforce strict inequalities in linear tree formulations (cog-imperial#163)

    This PR adds a tolerance at which to enforce ``strict'' inequalities in
    linear model trees: That is, the right branch will require that the
    feature value be greater than or equal to the bound plus this tolerance
    (epsilon). This means that users can tune epsilon in order to ensure
    that the MIP solution will match the tree prediction.

    Additionally, the PR simplifies the implementation of the hybrid bigm
    linear tree formulation by using two modern pyomo.gdp transformations.
    This does mean that the linear tree formulations will rely on
    pyomo>=6.7.1 though, if that's okay.

    **Legal Acknowledgement**\
    By contributing to this software project, I agree my contributions are
    submitted under the BSD license.
    I represent I am authorized to make the contributions and grant the
    license.
    If my employer has rights to intellectual property that includes these
    contributions,
    I represent that I have received permission to make contributions and
    grant the required license on behalf of that employer.

    ---------

    Co-authored-by: Emma Johnson <[email protected]>

commit d43643a
Author: Lukas Turcani <[email protected]>
Date:   Tue Aug 20 23:53:51 2024 +0100

    Clean up package boilerplate (cog-imperial#149)

    This PR does a couple of things to clean up the boilerplate related to
    packaging OMLT, see sections below for detailed explanations of the
    changes.

    * Remove `setup.cfg` , `setup.py`, `docs/requirements.txt`, `tox.ini` in
    favour of `pyproject.toml`.
    * Place `conda` requirements into `environment.yml`
    * Create new workflows `tests.yml` and `publish_release.yml`
    * Add quality checks using `ruff`, `mypy`, `doctest`
    * Use `just` for developer experience
    * Updated the `Development` section of `README`  to talk about `just`
    * Clean up `conf.py`
    * Move `pull_request_template.md`
    * Allow publishing of package to pypi by pushing a new version tag

    # Other comments

    * consider internal package structure
    * force squash merge of PRs - this keeps git history for the `main`
    branch nice and clean

    # Using `pyproject.toml`

    `pyrpoject.toml` is the simplest way to provide package metadata for a
    Python package. It is easy to read and also provides sections for
    configurating tools such as `pytest`, `ruff` and `mypy` all in one
    place. It works seamlessly with the modern Python ecosystem.

    I set up `pyproject.toml` to automactically detect the version of the
    code from git tags. No need to duplicate version numbers across the
    repo. Just add a new tag and everything will be updated. In addition,
    when a new git tag is pushed to the GitHub repo, the new
    `publish_release` workflow will be triggered and a new PYPI version
    released. (See more on this below).

    I also set it up so that the version is automatically added to a file
    called `src/omlt/_version.py` which holds the `__version__` variable.
    this file is autogenerated and therefore added to `.gitignore`. The
    `__version__` veriable is then re-exported in `src/omlt/__init__.py` so
    that our users have access to it.

    I tried to perserve all the information stored in the `setup.cfg` and
    other deleted files -- let me know if there is something i missed!

    ## Optional dependencies

    The `pyproject.toml` file allows the creation of optional dependencies.
    For example, our users can install

    ```bash
    pip install omlt[keras]
    # or
    pip install omlt[torch]
    # or
    pip install omlt[linear-tree,keras-gpu]
    ```
    Ofc any combination of optional dependencies is valid too. This allows
    our users to install the dependencies specific to their use case. Note
    that:

    * I made `onnx` and `onnxruntime` a required dependency because from my
    understanding it is almost always used
    * I added an optinoal dependency set called `dev` which developers can
    use to install all developer tools and all dependencies -- you need this
    to run all the tests for example
    * There is also `dev-gpu` which installs the GPU version of tensorflow
    in case the developer has a GPU

    The available optional dependencies are:

    * `linear-tree`, installs the linear tree dependency
    * `keras`, installs tensorflow and keras
    * `keras-gpu`, installs tensorflow for the gpu and keras
    * `torch`, installs torch and torch geometric
    * `dev-tools` - this is not to be used directly but allows easy re-use
    of dev tools in other optional dependencies, namely dev and dev-gpu
    * `docs` - installs dependencies required to compile docs
    * `dev` - dependecies needed for developing the project, such tooling
    * `dev-gpu` - same as dev but installed with gpu support

    Our documentation probably needs to be updated to tell users they wanna
    install omlt with some combination of `linear-tree`, `keras`,
    `keras-gpu`, `torch` optional dependencies depending on what features of
    the package they are using

    # Quality checks with `ruff`, `mypy` and `doctest`

    I've enabled `ruff`, `mypy` and `doctest`. Currently there are no
    doctests, but its good to have it set up so that it runs in case any are
    added in the future.

    Both `ruff` and `mypy` are failing because there are a number of things
    which need to fixed. For both `ruff` and `mypy` I have disabled some
    checks which it would be good to enable eventually but are probably a
    fair amount of work to fix -- these have comments in `pyproject.toml`.
    The remaining failing checks are ones which I would reccomend fixing
    ASAP. There's two approaches, merge now and fix these errors later. Or
    keep a separate branch where these are incrementally fixed. Up to you to
    decide what you prefer.

    I told ruff to check for `google` style docstrings. I think these are
    the best because they have good readbility and work the best with type
    hints in my opinion.

    # Using `just` instead of `tox`

    https://github.com/casey/just is a simple command runner. It allows the
    developers to define and re-use common operations, for example I can
    define a `check` recipe and then run

    ```bash
    just check
    ```

    in my command line and it will run all the tests. The beauty of this is
    that `just` is extremely simple. If you read the file its basically a
    sequence of bash instructions for each recipe. This makes the `recipes`
    really transparent, and easy to understand, and works as
    code-as-documentation. Users can just read the recipe and run the
    commands one by one to get the same effect without having `just`
    installed. There is no magic which helps with debugging issues. It's
    also language agnostic. `just` comes as a small stand-alone binary,
    which makes it a very non-intrusive tool to have on your computer that
    does not need any dependencies.

    The downside is that it does not provide automatic management for Python
    environments, which I belive tox does provide. The other side of this is
    that we allow developers to use their favorite tools for managing venvs
    rather than proscribing certain tools for this repo. (the difference
    with `just` being that it is essentially optional tool and also serving
    as documentation)

    I may be overly opinionated on this one, so feel free to push back.

    # Cleaning up `docs/conf.py`

    I removed a bunch of the commented out code. This makes it easier to see
    what the configuration is and also prevents the commented out options
    from becoming out of date when a new release of sphinx is made.

    # Moving `pull_request_template.md`

    I moved this into the `.github` folder because it is GitHub
    configuration. Very optional, but makes more sense to me.

    # `readthedocs` automated action

    this guide
    https://docs.readthedocs.io/en/stable/guides/pull-requests.html shows
    how to set it up. requires admin permissions on readthedocs -- can jump
    on a call to help with this

    # publishing with to `PYPI` with a git tag

    for this an API key for PYPI needs to be created and added to the repos
    secrets -- can jump on a call to help with this

    # consider `_internal` package structure

    One way to make it easier to manage private vs public code in a
    repository is to create an `_internal` folder where all the code goes.
    This way all code can be shared easily and moved between modules and its
    by default private, so changes to internal code does not break users.
    Public modules then just re-export code in the `_internal` submodules.
    You can see an example of this structure here
    https://github.com/lukasturcani/stk. Not a huge issue but I find it very
    helpful for managing what things are actually exposed to users the
    code-base grows.

    **Legal Acknowledgement**\
    By contributing to this software project, I agree my contributions are
    submitted under the BSD license.
    I represent I am authorized to make the contributions and grant the
    license.
    If my employer has rights to intellectual property that includes these
    contributions,
    I represent that I have received permission to make contributions and
    grant the required license on behalf of that employer.

    ---------

    Co-authored-by: Jeremy Sadler <[email protected]>
jalving added a commit that referenced this pull request Dec 8, 2024
The changes in #163 included changes to the hybrid bigm formulation for
linear tree that, while mathematically equivalent, made for a larger
formulation in terms of number of constraints. This PR corrects that: It
still uses the `gdp.bound_pretransformation` to generate the constraints
bounding the features values for each leaf, but it manually transforms
the constraints setting the output value to the leaf's linear function,
equivalently to @bammari's original implementation. In addition it adds
a test to check that the size of the resulting formulation is what is
expected.

**Legal Acknowledgement**\
By contributing to this software project, I agree my contributions are
submitted under the BSD license.
I represent I am authorized to make the contributions and grant the
license.
If my employer has rights to intellectual property that includes these
contributions,
I represent that I have received permission to make contributions and
grant the required license on behalf of that employer.

---------

Co-authored-by: Emma Johnson <[email protected]>
Co-authored-by: jalving <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants