Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Experimental] Derivative Kernel #1794

Open
wants to merge 7 commits into
base: main
Choose a base branch
from
Open

Conversation

jacobrgardner
Copy link
Member

This leverages some pretty cool functionality the PyTorch team is playing around with in https://github.com/pytorch/functorch to let us replace our bespoke derivative kernel implementations with a single fully autograd based one. You can use it like this:

x = torch.randn(5, 2)

# Polynomial Kernel
kern = gpytorch.kernels.PolynomialKernel(2)
kern_grad = gpytorch.kernels.PolynomialKernelGrad(2)
kern_autograd = gpytorch.kernels.experimental.DerivativeKernel(kern)
assert torch.norm(kern_grad(x).evaluate() - kern_autograd(x).evaluate()) < 1e-5

# RBF Kernel
kern = gpytorch.kernels.RBFKernel()
kern_grad = gpytorch.kernels.RBFKernelGrad()
kern_autograd = gpytorch.kernels.experimental.DerivativeKernel(kern)
assert torch.norm(kern_grad(x).evaluate() - kern_autograd(x).evaluate()) < 1e-5

The only change necessary outside of actually implementing the kernel was that currently vmap can't batch over torch.equal, so I made x1_eq_x2 an argument we can specify / set in __call__ to bypass the comparison here:

x1_eq_x2 = torch.equal(x1, x2)

Problems

  • For some reason, the Hessian block of DerivativeKernel(MaternKernel(nu=2.5)) specifically is just the negative of what it should be. This is the only kernel this happens for as far as I can tell. I have no idea why, but it's annoying since Matern would be obviously a fantastic kernel to get a derivative version of "for free." I suspect this has to do with the non squared distance computations here?

  • If you wrap a non-differentiable kernel in DerivativeKernel, it will still return a matrix but a super non-pd one. I don't think there's a good solution, but it's problematic since if we're not using Cholesky (likely with default settings since derivative kernel matrices get really big really fast) I'm not sure we'd fail loudly anywhere along the way?

@jacobrgardner
Copy link
Member Author

Not really familiar with the 3 failing WiskiGP tests -- maybe something related to the x1_eq_x2 argument tracking?

@wjmaddox
Copy link
Collaborator

wjmaddox commented Oct 28, 2021

Weird that the errors are mainly in the notebooks and not in the unit tests. I also couldn't reproduce the example failures in my setup.

@wjmaddox
Copy link
Collaborator

After a bit of tracking down the error, it seems to be because the SKI kernel evaluated against the grid should not have x1_eq_x2 as True. This commit should resolve it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants