Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

adapting to new rdkit fingerprint generators. #43

Conversation

RiesBen
Copy link
Contributor

@RiesBen RiesBen commented Sep 13, 2024

This is adressing the deprication warnings mentioned in #41 and therefore implements a set of new transformers.

Design concept is a bit verbose, but I wanted to be as close as possible in the old feeling and as efficient as possible ;)
Such it is possible to replace the old rdkit based implementations seemingly;)

We could consider doing this
already ;)

@EBjerrum
Copy link
Owner

Some of the test fails. Seem to be around the nBits renaming. It may be the tests that also needs updating.

@EBjerrum
Copy link
Owner

EBjerrum commented Sep 17, 2024

@RiesBen, Just for clarification, do you have more time to work on this? or should I also look at it?

@RiesBen
Copy link
Contributor Author

RiesBen commented Sep 20, 2024

@RiesBen, Just for clarification, do you have more time to work on this? or should I also look at it?

Hej esben,
I would look further into it, but I'm on holidays for 3 weeks. So feel free to give it a shot and I will look into what happened, when I'm back :)

@EBjerrum
Copy link
Owner

OK, Enjoy your hollidays. Lets reconvene when you get back.

@EBjerrum
Copy link
Owner

@RiesBen Hi Ben, Are you back from vacation and still have time to look at this? Then I would review the pull request.

@EBjerrum
Copy link
Owner

I had some time to read through the suggested changes now. Thanks for the effort, I think it looks generally solid, but there are some things I would suggest to change for likely improvements.

As you may have seen, the Master has moved, so likely there is a need to merge the updated master into the branch, and fix eventual conflicts that may arise. There's a little bit of error handling now, so that nonMols and other falsy objects return np.maskedarrays. We likely need to rework the stack _transform_mol, _fp_to_array and _mol_to_fp in the new generator class somewhat in the new abstract class. Likely the _mol2fp needs to be the abstract one, then _fp2array needs to be overriden in the new subclass, as we don't need to convert the array, but then this method can do the error handling for the conversion. Maybe the names need to be rethought at some point in the future.

I think we should drop the nBits from the API to the new transformers and use fpSize to be consistent with the underlying RDKit object and also simplify the code. But do we have technical debt from the parent class and the other transformers?

I think _generate_fp_generator could be a parameterless method. It can read all its needed settings from the self. That will simplify set_state and other calls a lot.

Its a terrible lot of boilerplate code for the properties and setters. Can't we override the set_attr or set_attribute or something to also call the _generate_fp_generator when we modify any of the parameters in the signature of the class? Alternative simply list as a class property tuple, what parameters that should trigger the update. Its probably a bit tricky in the init as we want to set all of them first, before the first call to _generate_fp_generator. However, if it's doable, it can be done in the abstract class, making it a lot easier for the subclasses to be defined. I don't think its a problem that it's one way setting, we can assume that users will not manipulate _fpgen properties directly.

I thing along this demo example:

class MyGenenerator:
    def __init__(self, **kwargs):
        print(f"Init Generator with {kwargs}")


class MyTransformer:
    _regenerate_on_properties = ("prop1", "prop2")

    def __init__(self, prop1: int, prop2: float):
        self._initializing = True  
        self.prop1 = prop1
        self.prop2 = prop2
        self._gen_object()
        delattr(self, "_initializing")

    def _gen_object(self):
        self._generator = MyGenenerator(prop1=self.prop1, prop2=self.prop2)

    def __setattr__(self, name: str, value):
        super().__setattr__(name, value)
        if (
            not hasattr(self, "_initializing")
            and name in self._regenerate_on_properties
        ):
            self._gen_object()

But Scikit-Learn may also be doing something with properties and settings and stuff, maybe it interferes?

nBits (or rather fpSize), is not set on the object in many of the init functions in the subclasses?, I think it lead to some failures in the pytests.

It's nice to give a deprecation warning for the old classes, but I think we should use the official way with a DeprecationWarning https://docs.python.org/3/library/warnings.html rather than print.

Nice that you also included some new pytests :-)

@RiesBen
Copy link
Contributor Author

RiesBen commented Oct 23, 2024

HI Esben,
I'm back :)
Hope I have time to have a look this Friday again! :)

@EBjerrum
Copy link
Owner

Sounds great. We could have a chat friday afternoon, if you are interested.

@RiesBen
Copy link
Contributor Author

RiesBen commented Oct 24, 2024

Sounds great. We could have a chat friday afternoon, if you are interested.

Interested in à chat :) but is Monday or tuesday possible? I could sent you two options via mail.tomorrow :)

@EBjerrum
Copy link
Owner

Monday or Tuesdat is fine

@RiesBen
Copy link
Contributor Author

RiesBen commented Nov 14, 2024

ToDos:

  • remove properties
  • nBits -> fpSize
  • implement setattr
  • _generate_fp_generator as parameterless function
  • implement deaprecation warning
  • remodel _transform_mol, _fp_to_array and _mol_to_fp -> errorhandling in _fp2array?
  • add potential missing unittest

- nBits->fpSize
- remove properties / overwrite setattr
- adapt tests.
- moving code around for easier oversight
- adding nicer dpecrecation warnings.
- add new generator functions to transformer test
Comment on lines +734 to +738
def _fp2array(self, fp):
raise DeprecationWarning("Generators can directly return fingerprints")

def _mol2fp(self, mol):
raise DeprecationWarning("use _mol2array")
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As you may have seen, the Master has moved, so likely there is a need to merge the updated master into the branch, and fix eventual conflicts that may arise. There's a little bit of error handling now, so that nonMols and other falsy objects return np.maskedarrays. We likely need to rework the stack _transform_mol, _fp_to_array and _mol_to_fp in the new generator class somewhat in the new abstract class. Likely the _mol2fp needs to be the abstract one, then _fp2array needs to be overriden in the new subclass, as we don't need to convert the array, but then this method can do the error handling for the conversion. Maybe the names need to be rethought at some point in the future.

Hej @EBjerrum ,
I'm a bit confused right now on the quote above, in the current generator approach we killed both _fp2array and _mol2fp.

We only have _transform_mol and _generate_fp_generator as abstracts. Are we missing something? Most likely the error handling, no?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, hmm, maybe I described it wrong. Main thing is that we need to be able to pass back masked arrays if we get any object that is evaluated as False in a boolean context (None, Nan, InvalidMol etc). So there need to be some kind of function in the parent class that guards the abstract function, as the subclasses should not have to deal with that, but can expect the molecules to be valid.

How we organize the private functions used by the public transform method is of course up to us. Maybe the current parent class implementation is not really that smart when it comes to the new generator code and can be simplified.

Have you pushed your current work to the pull request? Then I can have a look at it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, in the current status everything should be included. :)

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the contribution, I hope to get some time soon to review and merge it :-)

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had some time to review the pull request. Unfortunately, I think some further changes are needed, but I'm not sure how much more time you have or want to use?

The new classes doesn't support the safe_inference_mode, which was the new feature recently introduced to make inference more safe when deploying models. There might actually be some rethinking/refactoring of the parent class needed in order not to make it unnecessary complicated (as I maybe confusingly touched upon in my previous comment).

Also I think a new file organization could help give a better overview, but thats not directly related to this PR:

I'm also not sure that the deprecation of the current classes is the best way. The change in input parameter names are very similar except for nBits to fpSize and a seemingly a deprecation of dtype. I think it would be less disruptive to keep the current class names, maybe with some sensible way of raising a deprecation warning if old parameters are tried. We could maybe discuss this if you want, I'm not yet fully sure how this could be done without introducing too much temporary clutter, even if you don't have more time for fingers-on-the-keyboard I would appreciate a discussion.

I'm thinking I'll pull this PR for a new branch, and then do a few changes/experiments from there.

- add DeprecationWarnings to not harmonized fpSize bits.
@EBjerrum EBjerrum changed the base branch from main to 41_adjusting_to_rdkit_fp_generator_changes November 22, 2024 12:29
@EBjerrum EBjerrum merged commit bfadd31 into EBjerrum:41_adjusting_to_rdkit_fp_generator_changes Nov 22, 2024
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants