-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Gradients of log densities #3
Comments
@bzinberg @agarret7 This is not super urgent, but it will pop up soon, and I wanted to have a place to start a discussion. If it turns out that gradients with respect to values on these manifolds can be represented by vectors of real numbers (e.g. in the embedding space), and that gradient step on these manifolds can be approximated well by gradient steps in the embedding space followed by a projection onto the manifold, then I think there are good reasons to use real-vectors in the embedding space to represent elements of these manifolds and their gradients. For example, we could use length-9 vectors of floats for 3D rotations (3x3 rotation matrices are a subset of this space), length-3 vectors of floats for directions, and length-2 vectors of floats for plane rotations. Model code that used these would then need to cast the values (e.g. via If elements of these spaces are instead treated as special data types (as is currently the case) then many general-purpose Gen inference procedures that use gradients would need to be generalized to work with these data types, for example, by making sure that all generic Gen inference code that uses gradients is independent of the internal data representation (it current assumes vectors of reals). That would require an increase in code complexity and might also limit performance in a way that committing to vectors of reals would not. Also, automatic differentiation would need to handle these special data types (which is pretty involved, and can be brittle). AD libraries are always going to be best at supporting arrays of floats. But again, this hinges on the question of whether you can approximate the gradient on a manifold by a gradient in the embedding space. That sounds plausible to me, but not 100% obvious, and also something that might not be that hard to find out. But I have yet to do this. |
Note that moving to use vectors of reals instead of custom data types would be a change to the interface for this package. I have a hunch this is what we should do, but not sure, and would like to discuss. |
It's not so much about getting the right gradient vector, but about making sense of what it means to "follow" the gradient vector along the manifold itself, rather than following it along the embedding of a tangent plane into the ambient space. You have previously said that you do not want to enter into a general discussion of how to do gradient descent on manifolds, but TBH, I'm skeptical we can have a meaningful discussion without moving to essentially that level of generality. If you fix a chart that contains the current state, then the function SO(3) -> R being optimized can be viewed in local coordinates as a function R^3 -> R. You can do one or several steps of gradient descent inside this chart, as long as the steps are small enough that the new state is still contained inside the chart. Then, you can switch charts, to some other chart that contains the new state.
Yes. (I think this is true for every choice of atlas on SO(3), since SO(3) is a compact connected manifold, but in any case, we can choose our own atlas and ensure that it has this property.)
Good question. I am interested in the general version of this question, but we can also talk about the specific version once we choose a specific atlas for SO(3). |
BTW, a more complicated but more theoretically elegant approach to defining how to "follow" a tangent vector would be to use the exponential map on SO(3). This method is coordinate-free, whereas the above method depends on the particular choice of chart. But definitely can and should start with the simpler approach. |
@bzinberg I think I said that it's higher priority to implement a modeling and inference library that supports the manifolds we will use in practice (e.g. those in in this package) as opposed to a maximally general library. I don't have a problem with discussing gradient descent on manifolds. But generally speaking, I do like to ground discussions in concrete examples. Anyways, this looks like a pretty good reference with a lot of detail for the types of manifolds that this package cares about, and there is an associated library: https://arxiv.org/pdf/1812.01537.pdf Also, GTSAM supports these manifolds and it would probably be wise to understand how they handle these things first. It might save a lot of reinvention of the wheel: https://gtsam.org/notes/GTSAM-Concepts.html It seems like GTSAM at least made the choice to introduce manfiold concepts into their core interfaces. That's informative, but it's not obvious to me that Gen should make the same choice. I want to understand the underlying operations and their implementation first. I am particularly concerned about how to implement things in a way that smoothly interacts with AD libraries.
Yes, After quickly reading through the PDF above, I think a more specific version of my speculation above is whether it is possible to use 3x3 skew-symmetric matrices (which have three degrees of freedom) to represent gradients of scalar functions with respect to rotation matrices, and whether you can implement gradient descent simply by adding a small multiple of such a matrix (i.e. multiplied by a small step size) to the rotation matrix and get an updated new rotation matrix. Vaguely speaking, it seems like maybe this is correct only when the rotation matrix being added to is the identity rotation, and that you need to explicitly involve the exponential map. Interesting; lots to learn! |
Cool, I'll take a look at those references when I get the chance. It's interesting that you have a strong default to start with the robotics and I have a strong default to start with the math 🙂
I think the right way to "increment" an element |
May also be of interest: JuliaGeometry/Rotations.jl#30. Floating a proof-of-concept for this is still on my list... |
That's possible to do, as long as you're within a specific chart -- that's the "simpler" approach I was describing above. The exponential map approach is fancier and doesn't depend on a specific choice of chart. |
This is very relevant: https://probprog.cc/assets/posters/thu/78.pdf |
No description provided.
The text was updated successfully, but these errors were encountered: