adapting to new rdkit fingerprint generators. #43

RiesBen · 2024-09-13T15:13:31Z

This is adressing the deprication warnings mentioned in #41 and therefore implements a set of new transformers.

Design concept is a bit verbose, but I wanted to be as close as possible in the old feeling and as efficient as possible ;)
Such it is possible to replace the old rdkit based implementations seemingly;)

We could consider doing this
already ;)

…ing and pickling.

EBjerrum · 2024-09-16T08:09:40Z

Some of the test fails. Seem to be around the nBits renaming. It may be the tests that also needs updating.

EBjerrum · 2024-09-17T08:26:34Z

@RiesBen, Just for clarification, do you have more time to work on this? or should I also look at it?

RiesBen · 2024-09-20T14:11:20Z

@RiesBen, Just for clarification, do you have more time to work on this? or should I also look at it?

Hej esben,
I would look further into it, but I'm on holidays for 3 weeks. So feel free to give it a shot and I will look into what happened, when I'm back :)

EBjerrum · 2024-09-23T06:37:28Z

OK, Enjoy your hollidays. Lets reconvene when you get back.

EBjerrum · 2024-10-23T14:47:13Z

@RiesBen Hi Ben, Are you back from vacation and still have time to look at this? Then I would review the pull request.

EBjerrum · 2024-10-23T19:02:58Z

I had some time to read through the suggested changes now. Thanks for the effort, I think it looks generally solid, but there are some things I would suggest to change for likely improvements.

As you may have seen, the Master has moved, so likely there is a need to merge the updated master into the branch, and fix eventual conflicts that may arise. There's a little bit of error handling now, so that nonMols and other falsy objects return np.maskedarrays. We likely need to rework the stack _transform_mol, _fp_to_array and _mol_to_fp in the new generator class somewhat in the new abstract class. Likely the _mol2fp needs to be the abstract one, then _fp2array needs to be overriden in the new subclass, as we don't need to convert the array, but then this method can do the error handling for the conversion. Maybe the names need to be rethought at some point in the future.

I think we should drop the nBits from the API to the new transformers and use fpSize to be consistent with the underlying RDKit object and also simplify the code. But do we have technical debt from the parent class and the other transformers?

I think _generate_fp_generator could be a parameterless method. It can read all its needed settings from the self. That will simplify set_state and other calls a lot.

Its a terrible lot of boilerplate code for the properties and setters. Can't we override the set_attr or set_attribute or something to also call the _generate_fp_generator when we modify any of the parameters in the signature of the class? Alternative simply list as a class property tuple, what parameters that should trigger the update. Its probably a bit tricky in the init as we want to set all of them first, before the first call to _generate_fp_generator. However, if it's doable, it can be done in the abstract class, making it a lot easier for the subclasses to be defined. I don't think its a problem that it's one way setting, we can assume that users will not manipulate _fpgen properties directly.

I thing along this demo example:

class MyGenenerator:
    def __init__(self, **kwargs):
        print(f"Init Generator with {kwargs}")


class MyTransformer:
    _regenerate_on_properties = ("prop1", "prop2")

    def __init__(self, prop1: int, prop2: float):
        self._initializing = True  
        self.prop1 = prop1
        self.prop2 = prop2
        self._gen_object()
        delattr(self, "_initializing")

    def _gen_object(self):
        self._generator = MyGenenerator(prop1=self.prop1, prop2=self.prop2)

    def __setattr__(self, name: str, value):
        super().__setattr__(name, value)
        if (
            not hasattr(self, "_initializing")
            and name in self._regenerate_on_properties
        ):
            self._gen_object()

But Scikit-Learn may also be doing something with properties and settings and stuff, maybe it interferes?

nBits (or rather fpSize), is not set on the object in many of the init functions in the subclasses?, I think it lead to some failures in the pytests.

It's nice to give a deprecation warning for the old classes, but I think we should use the official way with a DeprecationWarning https://docs.python.org/3/library/warnings.html rather than print.

Nice that you also included some new pytests :-)

RiesBen · 2024-10-23T20:24:40Z

HI Esben,
I'm back :)
Hope I have time to have a look this Friday again! :)

EBjerrum · 2024-10-24T09:13:21Z

Sounds great. We could have a chat friday afternoon, if you are interested.

RiesBen · 2024-10-24T20:48:10Z

Sounds great. We could have a chat friday afternoon, if you are interested.

Interested in à chat :) but is Monday or tuesday possible? I could sent you two options via mail.tomorrow :)

EBjerrum · 2024-10-25T06:08:56Z

Monday or Tuesdat is fine

RiesBen · 2024-11-14T20:19:20Z

ToDos:

remove properties
nBits -> fpSize
implement setattr
_generate_fp_generator as parameterless function
implement deaprecation warning
~~remodel _transform_mol, _fp_to_array and _mol_to_fp -> errorhandling in _fp2array?~~
add potential missing unittest

- nBits->fpSize - remove properties / overwrite setattr - adapt tests.

- moving code around for easier oversight - adding nicer dpecrecation warnings.

- add new generator functions to transformer test

RiesBen · 2024-11-15T06:10:52Z

scikit_mol/fingerprints.py

+    def _fp2array(self, fp):
+        raise DeprecationWarning("Generators can directly return fingerprints")
+
+    def _mol2fp(self, mol):
+        raise DeprecationWarning("use _mol2array")


As you may have seen, the Master has moved, so likely there is a need to merge the updated master into the branch, and fix eventual conflicts that may arise. There's a little bit of error handling now, so that nonMols and other falsy objects return np.maskedarrays. We likely need to rework the stack _transform_mol, _fp_to_array and _mol_to_fp in the new generator class somewhat in the new abstract class. Likely the _mol2fp needs to be the abstract one, then _fp2array needs to be overriden in the new subclass, as we don't need to convert the array, but then this method can do the error handling for the conversion. Maybe the names need to be rethought at some point in the future.

Hej @EBjerrum ,
I'm a bit confused right now on the quote above, in the current generator approach we killed both _fp2array and _mol2fp.

We only have _transform_mol and _generate_fp_generator as abstracts. Are we missing something? Most likely the error handling, no?

Yeah, hmm, maybe I described it wrong. Main thing is that we need to be able to pass back masked arrays if we get any object that is evaluated as False in a boolean context (None, Nan, InvalidMol etc). So there need to be some kind of function in the parent class that guards the abstract function, as the subclasses should not have to deal with that, but can expect the molecules to be valid.

How we organize the private functions used by the public transform method is of course up to us. Maybe the current parent class implementation is not really that smart when it comes to the new generator code and can be simplified.

Have you pushed your current work to the pull request? Then I can have a look at it.

yes, in the current status everything should be included. :)

Thanks for the contribution, I hope to get some time soon to review and merge it :-)

I had some time to review the pull request. Unfortunately, I think some further changes are needed, but I'm not sure how much more time you have or want to use?

The new classes doesn't support the safe_inference_mode, which was the new feature recently introduced to make inference more safe when deploying models. There might actually be some rethinking/refactoring of the parent class needed in order not to make it unnecessary complicated (as I maybe confusingly touched upon in my previous comment).

Also I think a new file organization could help give a better overview, but thats not directly related to this PR:

I'm also not sure that the deprecation of the current classes is the best way. The change in input parameter names are very similar except for nBits to fpSize and a seemingly a deprecation of dtype. I think it would be less disruptive to keep the current class names, maybe with some sensible way of raising a deprecation warning if old parameters are tried. We could maybe discuss this if you want, I'm not yet fully sure how this could be done without introducing too much temporary clutter, even if you don't have more time for fingers-on-the-keyboard I would appreciate a discussion.

I'm thinking I'll pull this PR for a new branch, and then do a few changes/experiments from there.

- add DeprecationWarnings to not harmonized fpSize bits.

RiesBen added 3 commits September 13, 2024 17:12

adapting to new rdkit fingerprint generators.

341171e

Deprecations warnings in transformers: raise->prints

8be105a

minor class property fixes, most test wun now. Need to look into clon…

681a493

…ing and pickling.

fixes for bugs

c812214

Merge branch 'main' into 41_adjusting_to_rdkit_fp_generator_changes

895924f

RiesBen added 3 commits November 14, 2024 22:32

Remodelling transformers:

fdd8624

- nBits->fpSize - remove properties / overwrite setattr - adapt tests.

Remodelling transformers:

d420cbd

- moving code around for easier oversight - adding nicer dpecrecation warnings.

Remodelling transformers:

f7d2958

- add new generator functions to transformer test

RiesBen commented Nov 15, 2024

View reviewed changes

Remodelling transformers:

5ae6a2b

- add DeprecationWarnings to not harmonized fpSize bits.

EBjerrum changed the base branch from main to 41_adjusting_to_rdkit_fp_generator_changes November 22, 2024 12:29

EBjerrum merged commit bfadd31 into EBjerrum:41_adjusting_to_rdkit_fp_generator_changes Nov 22, 2024
8 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

adapting to new rdkit fingerprint generators. #43

adapting to new rdkit fingerprint generators. #43

RiesBen commented Sep 13, 2024 •

edited

Loading

EBjerrum commented Sep 16, 2024

EBjerrum commented Sep 17, 2024 •

edited

Loading

RiesBen commented Sep 20, 2024

EBjerrum commented Sep 23, 2024

EBjerrum commented Oct 23, 2024

EBjerrum commented Oct 23, 2024

RiesBen commented Oct 23, 2024

EBjerrum commented Oct 24, 2024

RiesBen commented Oct 24, 2024

EBjerrum commented Oct 25, 2024

RiesBen commented Nov 14, 2024 •

edited

Loading

RiesBen Nov 15, 2024

EBjerrum Nov 16, 2024

RiesBen Nov 17, 2024

EBjerrum Nov 20, 2024

EBjerrum Nov 22, 2024

adapting to new rdkit fingerprint generators. #43

adapting to new rdkit fingerprint generators. #43

Conversation

RiesBen commented Sep 13, 2024 • edited Loading

EBjerrum commented Sep 16, 2024

EBjerrum commented Sep 17, 2024 • edited Loading

RiesBen commented Sep 20, 2024

EBjerrum commented Sep 23, 2024

EBjerrum commented Oct 23, 2024

EBjerrum commented Oct 23, 2024

RiesBen commented Oct 23, 2024

EBjerrum commented Oct 24, 2024

RiesBen commented Oct 24, 2024

EBjerrum commented Oct 25, 2024

RiesBen commented Nov 14, 2024 • edited Loading

RiesBen Nov 15, 2024

Choose a reason for hiding this comment

EBjerrum Nov 16, 2024

Choose a reason for hiding this comment

RiesBen Nov 17, 2024

Choose a reason for hiding this comment

EBjerrum Nov 20, 2024

Choose a reason for hiding this comment

EBjerrum Nov 22, 2024

Choose a reason for hiding this comment

RiesBen commented Sep 13, 2024 •

edited

Loading

EBjerrum commented Sep 17, 2024 •

edited

Loading

RiesBen commented Nov 14, 2024 •

edited

Loading