-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
adapting to new rdkit fingerprint generators. #43
adapting to new rdkit fingerprint generators. #43
Conversation
Some of the test fails. Seem to be around the nBits renaming. It may be the tests that also needs updating. |
@RiesBen, Just for clarification, do you have more time to work on this? or should I also look at it? |
Hej esben, |
OK, Enjoy your hollidays. Lets reconvene when you get back. |
@RiesBen Hi Ben, Are you back from vacation and still have time to look at this? Then I would review the pull request. |
I had some time to read through the suggested changes now. Thanks for the effort, I think it looks generally solid, but there are some things I would suggest to change for likely improvements. As you may have seen, the Master has moved, so likely there is a need to merge the updated master into the branch, and fix eventual conflicts that may arise. There's a little bit of error handling now, so that nonMols and other falsy objects return np.maskedarrays. We likely need to rework the stack _transform_mol, _fp_to_array and _mol_to_fp in the new generator class somewhat in the new abstract class. Likely the _mol2fp needs to be the abstract one, then _fp2array needs to be overriden in the new subclass, as we don't need to convert the array, but then this method can do the error handling for the conversion. Maybe the names need to be rethought at some point in the future. I think we should drop the nBits from the API to the new transformers and use fpSize to be consistent with the underlying RDKit object and also simplify the code. But do we have technical debt from the parent class and the other transformers? I think _generate_fp_generator could be a parameterless method. It can read all its needed settings from the self. That will simplify set_state and other calls a lot. Its a terrible lot of boilerplate code for the properties and setters. Can't we override the set_attr or set_attribute or something to also call the _generate_fp_generator when we modify any of the parameters in the signature of the class? Alternative simply list as a class property tuple, what parameters that should trigger the update. Its probably a bit tricky in the init as we want to set all of them first, before the first call to _generate_fp_generator. However, if it's doable, it can be done in the abstract class, making it a lot easier for the subclasses to be defined. I don't think its a problem that it's one way setting, we can assume that users will not manipulate _fpgen properties directly. I thing along this demo example: class MyGenenerator:
def __init__(self, **kwargs):
print(f"Init Generator with {kwargs}")
class MyTransformer:
_regenerate_on_properties = ("prop1", "prop2")
def __init__(self, prop1: int, prop2: float):
self._initializing = True
self.prop1 = prop1
self.prop2 = prop2
self._gen_object()
delattr(self, "_initializing")
def _gen_object(self):
self._generator = MyGenenerator(prop1=self.prop1, prop2=self.prop2)
def __setattr__(self, name: str, value):
super().__setattr__(name, value)
if (
not hasattr(self, "_initializing")
and name in self._regenerate_on_properties
):
self._gen_object() But Scikit-Learn may also be doing something with properties and settings and stuff, maybe it interferes? nBits (or rather fpSize), is not set on the object in many of the init functions in the subclasses?, I think it lead to some failures in the pytests. It's nice to give a deprecation warning for the old classes, but I think we should use the official way with a DeprecationWarning https://docs.python.org/3/library/warnings.html rather than print. Nice that you also included some new pytests :-) |
HI Esben, |
Sounds great. We could have a chat friday afternoon, if you are interested. |
Interested in à chat :) but is Monday or tuesday possible? I could sent you two options via mail.tomorrow :) |
Monday or Tuesdat is fine |
ToDos:
|
- nBits->fpSize - remove properties / overwrite setattr - adapt tests.
- moving code around for easier oversight - adding nicer dpecrecation warnings.
- add new generator functions to transformer test
def _fp2array(self, fp): | ||
raise DeprecationWarning("Generators can directly return fingerprints") | ||
|
||
def _mol2fp(self, mol): | ||
raise DeprecationWarning("use _mol2array") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As you may have seen, the Master has moved, so likely there is a need to merge the updated master into the branch, and fix eventual conflicts that may arise. There's a little bit of error handling now, so that nonMols and other falsy objects return np.maskedarrays. We likely need to rework the stack _transform_mol, _fp_to_array and _mol_to_fp in the new generator class somewhat in the new abstract class. Likely the _mol2fp needs to be the abstract one, then _fp2array needs to be overriden in the new subclass, as we don't need to convert the array, but then this method can do the error handling for the conversion. Maybe the names need to be rethought at some point in the future.
Hej @EBjerrum ,
I'm a bit confused right now on the quote above, in the current generator approach we killed both _fp2array
and _mol2fp
.
We only have _transform_mol
and _generate_fp_generator
as abstracts. Are we missing something? Most likely the error handling, no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, hmm, maybe I described it wrong. Main thing is that we need to be able to pass back masked arrays if we get any object that is evaluated as False in a boolean context (None, Nan, InvalidMol etc). So there need to be some kind of function in the parent class that guards the abstract function, as the subclasses should not have to deal with that, but can expect the molecules to be valid.
How we organize the private functions used by the public transform method is of course up to us. Maybe the current parent class implementation is not really that smart when it comes to the new generator code and can be simplified.
Have you pushed your current work to the pull request? Then I can have a look at it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, in the current status everything should be included. :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the contribution, I hope to get some time soon to review and merge it :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had some time to review the pull request. Unfortunately, I think some further changes are needed, but I'm not sure how much more time you have or want to use?
The new classes doesn't support the safe_inference_mode, which was the new feature recently introduced to make inference more safe when deploying models. There might actually be some rethinking/refactoring of the parent class needed in order not to make it unnecessary complicated (as I maybe confusingly touched upon in my previous comment).
Also I think a new file organization could help give a better overview, but thats not directly related to this PR:
I'm also not sure that the deprecation of the current classes is the best way. The change in input parameter names are very similar except for nBits to fpSize and a seemingly a deprecation of dtype. I think it would be less disruptive to keep the current class names, maybe with some sensible way of raising a deprecation warning if old parameters are tried. We could maybe discuss this if you want, I'm not yet fully sure how this could be done without introducing too much temporary clutter, even if you don't have more time for fingers-on-the-keyboard I would appreciate a discussion.
I'm thinking I'll pull this PR for a new branch, and then do a few changes/experiments from there.
- add DeprecationWarnings to not harmonized fpSize bits.
bfadd31
into
EBjerrum:41_adjusting_to_rdkit_fp_generator_changes
This is adressing the deprication warnings mentioned in #41 and therefore implements a set of new transformers.
Design concept is a bit verbose, but I wanted to be as close as possible in the old feeling and as efficient as possible ;)
Such it is possible to replace the old rdkit based implementations seemingly;)
We could consider doing this
already ;)