-
Notifications
You must be signed in to change notification settings - Fork 28
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add mixed tanimoto gp surrogate #318
Add mixed tanimoto gp surrogate #318
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi Li,
thanks for this nice PR. Looks overally very good! I let some comments.
Best,
Johannes
I have made the changes as suggested. In addition, I have also slightly cleaned up the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi Li,
looks overall good, I let only some minor comments. Sorry for being so picky ...
cc: @simonsung06
Best,
Johannes
Another small info: we merged in a PR which has refactored the test suite (#327). You need to merge main again into your branch, but the effects will be small. You just have to move the stuff which creates conflicts to the new positions. In case of problems, I can also help you or do it together with you! |
Hi @jduerholt, I have merged my branch with the main branch and solved the conflicts :) Please help me check whether it is correct. Thanks for all your constructive feedback! |
Hi @xxEthene, many thanks for the updates. I will have a look on them over the christmas days. Sorry for the delay! Best, Johannes |
Hi @jduerholt, I have made some changes to the codes based on the errors previously occurred. I accidentally removed the |
Hi @xxEthene; just ignore the failing tests, this is due to a new version of formulaic released on the 25th of December (https://pypi.org/project/formulaic/#history) which breaks our tests. @Osburg, can you take care for this? Regarding your PR: I will do a final review as soon as I am back in office next Tuesday! I wish you also a happy new year! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks already very good, only some small change requests!
ord_dims = sorted(set(range(d)) - set(cat_dims) - set(mol_dims)) # type: ignore | ||
|
||
if cont_kernel_factory is None: | ||
cont_kernel_factory = kernels.map_MaternKernel( # type: ignore |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, this has to be a callable or? Have a look here: https://github.com/pytorch/botorch/blob/cba563761c9a0731c06cec86478e7c2d8cbd34b5/botorch/models/gp_regression_mixed.py#L109
# these are the categorical dimesions after applying the OneHotToNumeric transform | ||
cat_dims = list( | ||
range(len(ord_dims), len(ord_dims) + len(non_numerical_features)) | ||
range(len(ord_dims), len(ord_dims) + len(categorical_feature_keys)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not use also here get_feature_indices
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
get_feature_indices
returns categorical dimensions with OneHot transformation applied, but we need the categorical dimensions without the transformation here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are correct, I overlooked it. This is then also the position in the code, where we run into problems, when we have for example CategoricalDescriptorInputs of which we use some as OneHots and some as descriptors. This breaks then the indices, because we just rely here that the categorical features are always the last ones (here also the order_id
comes into the play). This is often the case, but not always. And with the molecular ones coming in, it could be that it less often the case. But this bug, already existed before your PR. So we do not have to fix it here, but maybe you have a smart idea for the problem? ;)
scaler_enum, | ||
input_preprocessing_specs, | ||
expected_scaler, | ||
expected_indices_length, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why only the indices legth and not the indices?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have changed it to check the indices instead :)
I stumbled over this |
In PR #332 the problems regarding the failing tests in the DoE module are fixed. As soon as it is merged into main, you can merge it in from main. |
Hi @xxEthene, I let some comments. Sorry for this mess with the |
Hi @xxEthene, I created a PR in botorch fixing the issue with the |
Hi @xxEthene, the PR was now merged. Just tell me if you find time to finish this PR, if not I will try to finish it ;) |
Hi @jduerholt, sorry for the delay as my computer is under repairment recently. I will be working on the codes this weekend! |
Hi @jduerholt, the
and I have modified related tests in |
Hi @xxEthene, looks good for me, just one thing: you forgot the Can you update this? Best, Johannes |
Hi @jduerholt, Sorry for this and I have updated the order. I have also formatted the files based on the error messages received. However, the error for the file Best regards, |
Ignore it for now, I think there is some other test regarding the sorting of the features still failing. Can you have a look on this one too? Sorry for iterating this for such a long time! |
Hi @jduerholt, The failing tests regarding the sorting of the features are due to the unchanged order_id for outputs. I am so sorry for this. I have updated the order_id for outputs and the order now is:
Hope it can work now! Best regards, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Almost done. Thank you @xxEthene! Just format the one file or revert your change in universal_constraint.py
.
samples = samples.iloc[ | ||
self.num_candidates :, | ||
] | ||
samples = samples.iloc[self.num_candidates :,] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is a problem for the formatter, just revert it to the original one and then it should be fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, I just committed it!
Hi @jduerholt, I am so confused by the error messages received from the tests....as everything goes well on my computer and also I did not change anything except the a few lines in |
It is strange, that it is not occuring locally for you, but the error comes from my side. I overlooked in one of my last PRs something and this seems to be the reason for this behavior. I will put up a PR today and merge it in, then you can merge main again into your PR. Sorry for this! |
Should be fixed now, just merge |
Okay, I have done it! :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @xxEthene,
thank you very much for all your efforts. And sorry for the tedious review process!
Best,
Johannes
This PR adds a new surrogate type,
MixedTanimotoGPSurrogate
, which is designed to be used with datasets that containsMolecularInput
's withFingerprints
,Fragments
,FingerprintsFragments
molecular features, and Continuous and/or Categorical features. Therefore, this surrogate is analogous toMixedSingleTaskGP
except it involvesMolecularInputs
's. In order to provide the flexibility for this surrogate type, some other changes were made which will also be mentioned below:MixedTanimotoGPSurrogate
, the continous, categorical, and molecular kernels are combined where the final covar_module = sum of kernels + product of kernels. This is analogous to how the continuous and categorical kernels are combined in MixedSingleTaskGP.TanimotoGPSurrogate
has been modified to only work withFingerprints
,Fragments
,FingerprintsFragments
molecular features.MolecularInput
's withMordredDescriptors
molecular features can be used with SingleTaskGP and MixedSingleTaskGP now.MolecularInput
's withMordredDescriptors
molecular features do not requireTanimotoGPSurrogate
orMixedTanimotoGPSurrogate
. This change allows scalarization of the mordred descriptors, and allows continuous kernels to be used with them too (analogous to how descriptors inCategoricalDescriptorInput
are treated). For example, this means that when a dataset contains aMolecularInput
withMordredDescriptors
molecular features and anotherMolecularInput
withFingerprintsFragments
molecular features, they will affected by different kernels and scalars in aMixedTanimotoGPSurrogate
.