Add mixed tanimoto gp surrogate #318

xxEthene · 2023-12-12T02:14:54Z

This PR adds a new surrogate type, MixedTanimotoGPSurrogate, which is designed to be used with datasets that contains MolecularInput's with Fingerprints, Fragments, FingerprintsFragments molecular features, and Continuous and/or Categorical features. Therefore, this surrogate is analogous to MixedSingleTaskGP except it involves MolecularInputs's. In order to provide the flexibility for this surrogate type, some other changes were made which will also be mentioned below:

For MixedTanimotoGPSurrogate, the continous, categorical, and molecular kernels are combined where the final covar_module = sum of kernels + product of kernels. This is analogous to how the continuous and categorical kernels are combined in MixedSingleTaskGP.
TanimotoGPSurrogate has been modified to only work with Fingerprints, Fragments, FingerprintsFragments molecular features.
MolecularInput's with MordredDescriptors molecular features can be used with SingleTaskGP and MixedSingleTaskGP now.
Therefore, MolecularInput's with MordredDescriptors molecular features do not require TanimotoGPSurrogate or MixedTanimotoGPSurrogate. This change allows scalarization of the mordred descriptors, and allows continuous kernels to be used with them too (analogous to how descriptors in CategoricalDescriptorInput are treated). For example, this means that when a dataset contains a MolecularInput with MordredDescriptors molecular features and another MolecularInput with FingerprintsFragments molecular features, they will affected by different kernels and scalars in a MixedTanimotoGPSurrogate.

jduerholt

Hi Li,

thanks for this nice PR. Looks overally very good! I let some comments.

Best,

Johannes

bofire/data_models/surrogates/mixed_tanimoto_gp.py

bofire/surrogates/mixed_tanimoto_gp.py

bofire/surrogates/single_task_gp.py

tests/bofire/surrogates/test_gps.py

xxEthene · 2023-12-14T06:25:22Z

I have made the changes as suggested.

In addition, I have also slightly cleaned up the MixedSingleTaskGPSurrogate _fit function to use the get_continuous_features, get_categorical_features, and get_feature_indices functions.

jduerholt

Hi Li,

looks really good, just a few minor things.

cc: @simonsung06

Best,

Johannes

bofire/data_models/domain/features.py

jduerholt

Hi Li,

looks overall good, I let only some minor comments. Sorry for being so picky ...

cc: @simonsung06

Best,

Johannes

bofire/surrogates/utils.py

tests/bofire/surrogates/test_gps.py

tests/bofire/surrogates/test_tanimoto_related.py

jduerholt · 2023-12-21T11:53:22Z

Another small info: we merged in a PR which has refactored the test suite (#327). You need to merge main again into your branch, but the effects will be small. You just have to move the stuff which creates conflicts to the new positions.

In case of problems, I can also help you or do it together with you!

…xedTanimotoGP

xxEthene · 2023-12-21T15:10:52Z

Hi @jduerholt, I have merged my branch with the main branch and solved the conflicts :) Please help me check whether it is correct. Thanks for all your constructive feedback!

jduerholt · 2023-12-22T14:46:24Z

Hi @xxEthene,

many thanks for the updates. I will have a look on them over the christmas days. Sorry for the delay!

Best,

Johannes

into add-MixedTanimotoGP

xxEthene · 2023-12-29T05:14:21Z

Hi @jduerholt, I have made some changes to the codes based on the errors previously occurred. I accidentally removed the priors importing in mixed_single_task_gp and kept the test_features.py file when I merged with the main branch. These have been solved and I have moved some tests to the correct place :) However, for the rest failed tests related to strategies, I am not sure about it as I did not modify these files and the tests have passed on my computer. I am looking forward to your feedback on this! Wish you a great Christmas holiday and happy new year~~

jduerholt · 2023-12-29T19:18:06Z

Hi @xxEthene; just ignore the failing tests, this is due to a new version of formulaic released on the 25th of December (https://pypi.org/project/formulaic/#history) which breaks our tests. @Osburg, can you take care for this?

Regarding your PR: I will do a final review as soon as I am back in office next Tuesday! I wish you also a happy new year!

jduerholt

Looks already very good, only some small change requests!

jduerholt · 2024-01-02T08:36:04Z

bofire/surrogates/mixed_tanimoto_gp.py

+        ord_dims = sorted(set(range(d)) - set(cat_dims) - set(mol_dims))  # type: ignore
+
+        if cont_kernel_factory is None:
+            cont_kernel_factory = kernels.map_MaternKernel(  # type: ignore


Hmm, this has to be a callable or? Have a look here: https://github.com/pytorch/botorch/blob/cba563761c9a0731c06cec86478e7c2d8cbd34b5/botorch/models/gp_regression_mixed.py#L109

jduerholt · 2024-01-02T09:05:22Z

bofire/surrogates/mixed_single_task_gp.py

        # these are the categorical dimesions after applying the OneHotToNumeric transform
        cat_dims = list(
-            range(len(ord_dims), len(ord_dims) + len(non_numerical_features))
+            range(len(ord_dims), len(ord_dims) + len(categorical_feature_keys))


Why not use also here get_feature_indices?

get_feature_indices returns categorical dimensions with OneHot transformation applied, but we need the categorical dimensions without the transformation here.

You are correct, I overlooked it. This is then also the position in the code, where we run into problems, when we have for example CategoricalDescriptorInputs of which we use some as OneHots and some as descriptors. This breaks then the indices, because we just rely here that the categorical features are always the last ones (here also the order_id comes into the play). This is often the case, but not always. And with the molecular ones coming in, it could be that it less often the case. But this bug, already existed before your PR. So we do not have to fix it here, but maybe you have a smart idea for the problem? ;)

bofire/surrogates/mixed_tanimoto_gp.py

tests/bofire/data_models/domain/test_inputs.py

jduerholt · 2024-01-02T09:58:26Z

tests/bofire/surrogates/test_utils.py

+    scaler_enum,
+    input_preprocessing_specs,
+    expected_scaler,
+    expected_indices_length,


Why only the indices legth and not the indices?

I have changed it to check the indices instead :)

jduerholt · 2024-01-02T15:58:18Z

I stumbled over this order vs order_id issue also when working on this PR #279, the change to order_id will result in some failing tests in test_inputs.py, I fixed it in the other PR, you can just copy it over. I would not recommend to merge the other PR into yours.

jduerholt · 2024-01-03T07:48:57Z

In PR #332 the problems regarding the failing tests in the DoE module are fixed. As soon as it is merged into main, you can merge it in from main.

jduerholt · 2024-01-05T15:05:44Z

Hi @xxEthene, I let some comments. Sorry for this mess with the order_id. I will investigate the issue from the botorch side further. Just try to change it in the way that I proposed above, and check if it works then.

jduerholt · 2024-01-06T22:36:05Z

Hi @xxEthene, I created a PR in botorch fixing the issue with the OneHotToNumeric InputTransform. If you are interested, here is the PR: pytorch/botorch#2166

jduerholt · 2024-01-19T07:43:39Z

Hi @xxEthene, the PR was now merged. Just tell me if you find time to finish this PR, if not I will try to finish it ;)

xxEthene · 2024-01-19T15:25:07Z

Hi @xxEthene, the PR was now merged. Just tell me if you find time to finish this PR, if not I will try to finish it ;)

Hi @jduerholt, sorry for the delay as my computer is under repairment recently. I will be working on the codes this weekend!

xxEthene · 2024-01-21T04:37:03Z

Hi @jduerholt, the order_id values are as follows for now:

ContinuousInput: 1
ContinuousDescriptorInput: 2
DiscreteInput: 3
MolecularInput: 4
CategoricalDescriptorInput: 5
CategoricalInput: 6

and I have modified related tests in test_inputs.

jduerholt · 2024-01-22T13:09:33Z

Hi @jduerholt, the order_id values are as follows for now:

ContinuousInput: 1

ContinuousDescriptorInput: 2

DiscreteInput: 3

MolecularInput: 4

CategoricalDescriptorInput: 5

CategoricalInput: 6

and I have modified related tests in test_inputs.

Hi @xxEthene,

looks good for me, just one thing: you forgot the CategoricalMolecularInput, this one has still order_id 7, it should be between MolecularInput and CategoricalDescriptorInput and carrying the number 5. The following ones should then be 6 and 7.

Can you update this?

Best,

Johannes

xxEthene · 2024-01-23T13:57:09Z

Hi @jduerholt,

Sorry for this and I have updated the order. I have also formatted the files based on the error messages received. However, the error for the file bofire/strategies/samplers/universal_constraint.py is not detected from my side....I am not very sure about this one.

Best regards,
Yuxin

jduerholt · 2024-01-23T21:56:39Z

However, the error for the file bofire/strategies/samplers/universal_constraint.py is not detected from my side.

Ignore it for now, I think there is some other test regarding the sorting of the features still failing. Can you have a look on this one too? Sorry for iterating this for such a long time!

xxEthene · 2024-01-24T05:17:47Z

Hi @jduerholt,

The failing tests regarding the sorting of the features are due to the unchanged order_id for outputs. I am so sorry for this. I have updated the order_id for outputs and the order now is:

ContinuousInput: 1
ContinuousDescriptorInput: 2
DiscreteInput: 3
MolecularInput: 4
CategoricalMolecularInput: 5
CategoricalDescriptorInput: 6
CategoricalInput: 7
ContinuousOutput: 8
CategoricalOutput: 9

Hope it can work now!

Best regards,
Yuxin

jduerholt

Almost done. Thank you @xxEthene! Just format the one file or revert your change in universal_constraint.py.

jduerholt · 2024-01-24T09:44:30Z

bofire/strategies/samplers/universal_constraint.py

-        samples = samples.iloc[
-            self.num_candidates :,
-        ]
+        samples = samples.iloc[self.num_candidates :,]


I think this is a problem for the formatter, just revert it to the original one and then it should be fine.

Sure, I just committed it!

xxEthene · 2024-01-25T05:55:18Z

Hi @jduerholt, I am so confused by the error messages received from the tests....as everything goes well on my computer and also I did not change anything except the a few lines in universal_constraint.py.

jduerholt · 2024-01-25T08:04:27Z

It is strange, that it is not occuring locally for you, but the error comes from my side. I overlooked in one of my last PRs something and this seems to be the reason for this behavior. I will put up a PR today and merge it in, then you can merge main again into your PR.

Sorry for this!

jduerholt · 2024-01-25T09:59:35Z

Should be fixed now, just merge main in. Big sorry for this!

xxEthene · 2024-01-26T04:24:26Z

Okay, I have done it! :)

jduerholt

Hi @xxEthene,

thank you very much for all your efforts. And sorry for the tedious review process!

Best,

Johannes

xxEthene and others added 12 commits November 16, 2023 22:54

add MixedTanimotoGP

47ba17b

add MixedTanimotoGP

bdecd6e

modified MixedTanimotoGP and TanimotoGP

48a12ef

tests for MixedTanimotoGP and TanimotoGP

3ca5c44

Modified Tanimoto-related

59db5cf

refine Tanimoto codes

b16b43f

Merge branch 'experimental-design:main' into add-MixedTanimotoGP

363d34d

Fomatting and refinement

6300e10

Add test for mordred with SingleTaskGP

c6d680d

test formatting

164b3f0

pyright

4664512

black

7668a35

simonsung06 requested a review from jduerholt December 12, 2023 03:15

jduerholt requested changes Dec 12, 2023

View reviewed changes

xxEthene added 3 commits December 13, 2023 21:16

Modified based on comments

20e9806

static method and get_scaler

580c89b

docstring formatting

e7c0009

jduerholt requested changes Dec 15, 2023

View reviewed changes

bofire/data_models/domain/features.py Outdated Show resolved Hide resolved

bofire/data_models/domain/features.py Outdated Show resolved Hide resolved

bofire/data_models/domain/features.py Show resolved Hide resolved

bofire/data_models/domain/features.py Outdated Show resolved Hide resolved

get_feature_keys and get_scaler

4cfd846

jduerholt requested changes Dec 21, 2023

View reviewed changes

xxEthene added 3 commits December 21, 2023 22:15

relocate tests

6ec151f

Merge branch 'main' of https://github.com/xxEthene/bofire into add-Mi…

2edcb28

…xedTanimotoGP

black formatting

3570e1e

xxEthene added 2 commits December 22, 2023 11:08

Merge branch 'main' into add-MixedTanimotoGP

b189e6b

Update mixed_single_task_gp.py

646ebc4

Merge branch 'experimental-design:main' into add-MixedTanimotoGP

f4129a1

xxEthene added 2 commits December 28, 2023 22:15

Merge branch 'add-MixedTanimotoGP' of https://github.com/xxEthene/bofire

0706c7b

into add-MixedTanimotoGP

move test_features

cc1e0bd

jduerholt reviewed Jan 2, 2024

View reviewed changes

Merge branch 'experimental-design:main' into add-MixedTanimotoGP

ada61f9

xxEthene and others added 2 commits January 21, 2024 10:43

Merge branch 'experimental-design:main' into add-MixedTanimotoGP

490da0c

order_id and related tests

0f61b51

input order and formatting

e5c2f41

order_id and formatting

f813e2e

jduerholt requested changes Jan 24, 2024

View reviewed changes

format universal_constraint

548e77e

Merge branch 'experimental-design:main' into add-MixedTanimotoGP

01f4be1

jduerholt approved these changes Jan 29, 2024

View reviewed changes

jduerholt merged commit afae85c into experimental-design:main Jan 29, 2024
10 checks passed

Add mixed tanimoto gp surrogate #318

Add mixed tanimoto gp surrogate #318

Conversation

xxEthene commented Dec 12, 2023

jduerholt left a comment

Choose a reason for hiding this comment

xxEthene commented Dec 14, 2023

jduerholt left a comment

Choose a reason for hiding this comment

jduerholt left a comment • edited Loading

Choose a reason for hiding this comment

jduerholt commented Dec 21, 2023

xxEthene commented Dec 21, 2023

jduerholt commented Dec 22, 2023 • edited Loading

xxEthene commented Dec 29, 2023

jduerholt commented Dec 29, 2023

jduerholt left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jduerholt commented Jan 2, 2024

jduerholt commented Jan 3, 2024

jduerholt commented Jan 5, 2024

jduerholt commented Jan 6, 2024

jduerholt commented Jan 19, 2024

xxEthene commented Jan 19, 2024

xxEthene commented Jan 21, 2024

jduerholt commented Jan 22, 2024 • edited Loading

xxEthene commented Jan 23, 2024

jduerholt commented Jan 23, 2024

xxEthene commented Jan 24, 2024

jduerholt left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

xxEthene commented Jan 25, 2024

jduerholt commented Jan 25, 2024

jduerholt commented Jan 25, 2024

xxEthene commented Jan 26, 2024

jduerholt left a comment

Choose a reason for hiding this comment

jduerholt left a comment •

edited

Loading

jduerholt commented Dec 22, 2023 •

edited

Loading

jduerholt commented Jan 22, 2024 •

edited

Loading

jduerholt left a comment •

edited

Loading