-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[multibody] Some speedups for inertia * vector #22287
[multibody] Some speedups for inertia * vector #22287
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+@xuchenhan-tri for both reviews, please
Reviewable status: LGTM missing from assignee xuchenhan-tri(platform)
cc @amcastro-tri (low hanging fruit!) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's have another pair of eyes look at this if you don't mind.
-(status: single reviewer ok) +@SeanCurtis-TRI for platform if you can squeeze this in for today.
Reviewed 3 of 3 files at r1, all commit messages.
Reviewable status: 3 unresolved discussions, LGTM missing from assignee SeanCurtis-TRI(platform)
multibody/tree/rotational_inertia.h
line 424 at r1 (raw file):
// TODO(Mitiguy) Issue #6145, add direct unit test for this method. Vector3<T> operator*(const Vector3<T>& w_E) const { // Eigen's symmetric multiply can be slow. Do this by hand instead:
BTW, have you considered adding an abstraction for symmetric 3x3 matrix to avoid doing these algebra in the inertia class? Seems like there are a few places in this class where we need to deal with the fact that only the triangular part of the matrix is used.
multibody/tree/rotational_inertia.h
line 428 at r1 (raw file):
// [b d (e)] * [y] = [ bx+dy+ez ] // [c e f ] [z] [ cx+ey+fz ] const T& a = I_SP_E_(0, 0); // Access only lower triangle.
BTW, I looked at the eigen source code for symmetric matrix vector product and I'm not surprised that this is faster, but could you say, for reference, how much faster is this?
multibody/tree/spatial_inertia.cc
line 450 at r1 (raw file):
return SpatialForce<T>( // Rotational mass_ * (G_SP_E_ * alpha_WB_E + p_PScm_E_.cross(a_WBo_E)),
BTW, it's not clear to me what the goal of the change here is. Could you explain?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 3 of 3 files at r1, all commit messages.
Reviewable status: 3 unresolved discussions
multibody/tree/spatial_inertia.cc
line 450 at r1 (raw file):
Previously, xuchenhan-tri wrote…
BTW, it's not clear to me what the goal of the change here is. Could you explain?
Are the factors not getting inlined? Or is the factoring of the mass scaling the savings? Or both?
I'd be surprised if it's a question of inlining, because the replaced methods are basically the ideal inlining case. However, this PR does refer to "stupid" compilers. Slightly stupid can't factor, but really stupid can't inline?
Previously, SeanCurtis-TRI (Sean Curtis) wrote…
Also, the change kind of ruins the easy mapping from code to comment (e.g. p_BoBcmx <-> p_PScm_E_.cross). |
32136dd
to
12cd6de
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the fast response, gentlemen. Comments addressed below.
Reviewable status: 1 unresolved discussion
multibody/tree/rotational_inertia.h
line 424 at r1 (raw file):
Previously, xuchenhan-tri wrote…
BTW, have you considered adding an abstraction for symmetric 3x3 matrix to avoid doing these algebra in the inertia class? Seems like there are a few places in this class where we need to deal with the fact that only the triangular part of the matrix is used.
That's a good idea if there are other cases like this, but out of scope for me at the moment. This is the only case where I actually measured a performance impact. Added a TODO.
multibody/tree/rotational_inertia.h
line 428 at r1 (raw file):
Previously, xuchenhan-tri wrote…
BTW, I looked at the eigen source code for symmetric matrix vector product and I'm not surprised that this is faster, but could you say, for reference, how much faster is this?
Yeah, it's terrible! Unfortunately I don't have a way to measure the performance of just this little piece of code. I have a good benchmark for its use within ID (cassiebench) so I know its a win but not by how much.
multibody/tree/spatial_inertia.cc
line 450 at r1 (raw file):
not clear to me what the goal of the change here is
The previous code failed to factor out the mass -- it did 3 multiplies to get the weighted com + 6 multiplies to convert the unit inertia to a rotational inertia. Now it waits until we have produced the rotational acceleration vector, where we can scale with just the minimum of 3 multiplies for each of the rotational and translational vectors.
I'd be surprised if it's a question of inlining
Right! This was just suboptimal code that needed a tweak.
The function as a whole is not inlined (defined in cc). I was hoping to squeeze it down to something inlinable
p_BoBcmx <-> p_PScm_E
Good point. I added a comment to clarify. I removed the earlier aliasing to make it easier to see what's happening here numerically. I'm considering reworking this in highway if it is still dominating ID.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 2 of 2 files at r2, all commit messages.
Reviewable status: complete! all discussions resolved, LGTM from assignees SeanCurtis-TRI(platform),xuchenhan-tri(platform)
multibody/tree/spatial_inertia.cc
line 450 at r1 (raw file):
Previously, sherm1 (Michael Sherman) wrote…
not clear to me what the goal of the change here is
The previous code failed to factor out the mass -- it did 3 multiplies to get the weighted com + 6 multiplies to convert the unit inertia to a rotational inertia. Now it waits until we have produced the rotational acceleration vector, where we can scale with just the minimum of 3 multiplies for each of the rotational and translational vectors.
I'd be surprised if it's a question of inlining
Right! This was just suboptimal code that needed a tweak.
The function as a whole is not inlined (defined in cc). I was hoping to squeeze it down to something inlinable
p_BoBcmx <-> p_PScm_E
Good point. I added a comment to clarify. I removed the earlier aliasing to make it easier to see what's happening here numerically. I'm considering reworking this in highway if it is still dominating ID.
Ah ok, I counted wrong! Thanks for the explanation.
Profiling the M-frame Inverse Dynamics in #22253 I found that a substantial fraction of the time was spent in the computation of the inertial force
SpatialInertia*SpatialAcceleration
. This PR makes some minor changes to the functions involved to make life easier on dumb compilers.This tiny change speeds up the isolated (kinematics not included) ID-M calculation by 26% when compiled by gcc. It also speeds up the existing isolated ID-W calculation by 10% so is worth checking in separately. Including kinematics + ID the overall speedup is 7% for k + ID-M and 4% for k + ID-W.
This change is