Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[multibody] Some speedups for inertia * vector #22287

Merged

Conversation

sherm1
Copy link
Member

@sherm1 sherm1 commented Dec 9, 2024

Profiling the M-frame Inverse Dynamics in #22253 I found that a substantial fraction of the time was spent in the computation of the inertial force SpatialInertia*SpatialAcceleration. This PR makes some minor changes to the functions involved to make life easier on dumb compilers.

This tiny change speeds up the isolated (kinematics not included) ID-M calculation by 26% when compiled by gcc. It also speeds up the existing isolated ID-W calculation by 10% so is worth checking in separately. Including kinematics + ID the overall speedup is 7% for k + ID-M and 4% for k + ID-W.


This change is Reviewable

@sherm1 sherm1 added priority: medium type: performance release notes: none This pull request should not be mentioned in the release notes labels Dec 9, 2024
@sherm1 sherm1 marked this pull request as ready for review December 9, 2024 22:49
@sherm1 sherm1 changed the title Some speedups for inertia * vector [multibody] Some speedups for inertia * vector Dec 9, 2024
@sherm1 sherm1 added the status: single reviewer ok https://drake.mit.edu/reviewable.html label Dec 9, 2024
Copy link
Member Author

@sherm1 sherm1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+@xuchenhan-tri for both reviews, please

Reviewable status: LGTM missing from assignee xuchenhan-tri(platform)

@sherm1
Copy link
Member Author

sherm1 commented Dec 9, 2024

cc @amcastro-tri (low hanging fruit!)

@xuchenhan-tri xuchenhan-tri removed the status: single reviewer ok https://drake.mit.edu/reviewable.html label Dec 9, 2024
Copy link
Contributor

@xuchenhan-tri xuchenhan-tri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Let's have another pair of eyes look at this if you don't mind.
-(status: single reviewer ok) +@SeanCurtis-TRI for platform if you can squeeze this in for today.

Reviewed 3 of 3 files at r1, all commit messages.
Reviewable status: 3 unresolved discussions, LGTM missing from assignee SeanCurtis-TRI(platform)


multibody/tree/rotational_inertia.h line 424 at r1 (raw file):

  // TODO(Mitiguy) Issue #6145, add direct unit test for this method.
  Vector3<T> operator*(const Vector3<T>& w_E) const {
    // Eigen's symmetric multiply can be slow. Do this by hand instead:

BTW, have you considered adding an abstraction for symmetric 3x3 matrix to avoid doing these algebra in the inertia class? Seems like there are a few places in this class where we need to deal with the fact that only the triangular part of the matrix is used.


multibody/tree/rotational_inertia.h line 428 at r1 (raw file):

    //     [b  d  (e)] * [y] = [ bx+dy+ez ]
    //     [c  e   f ]   [z]   [ cx+ey+fz ]
    const T& a = I_SP_E_(0, 0);  // Access only lower triangle.

BTW, I looked at the eigen source code for symmetric matrix vector product and I'm not surprised that this is faster, but could you say, for reference, how much faster is this?


multibody/tree/spatial_inertia.cc line 450 at r1 (raw file):

  return SpatialForce<T>(
      // Rotational
      mass_ * (G_SP_E_ * alpha_WB_E + p_PScm_E_.cross(a_WBo_E)),

BTW, it's not clear to me what the goal of the change here is. Could you explain?

Copy link
Contributor

@SeanCurtis-TRI SeanCurtis-TRI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:LGTM:

Reviewed 3 of 3 files at r1, all commit messages.
Reviewable status: 3 unresolved discussions


multibody/tree/spatial_inertia.cc line 450 at r1 (raw file):

Previously, xuchenhan-tri wrote…

BTW, it's not clear to me what the goal of the change here is. Could you explain?

Are the factors not getting inlined? Or is the factoring of the mass scaling the savings? Or both?

I'd be surprised if it's a question of inlining, because the replaced methods are basically the ideal inlining case. However, this PR does refer to "stupid" compilers. Slightly stupid can't factor, but really stupid can't inline?

@xuchenhan-tri
Copy link
Contributor

multibody/tree/spatial_inertia.cc line 450 at r1 (raw file):

Previously, SeanCurtis-TRI (Sean Curtis) wrote…

Are the factors not getting inlined? Or is the factoring of the mass scaling the savings? Or both?

I'd be surprised if it's a question of inlining, because the replaced methods are basically the ideal inlining case. However, this PR does refer to "stupid" compilers. Slightly stupid can't factor, but really stupid can't inline?

Also, the change kind of ruins the easy mapping from code to comment (e.g. p_BoBcmx <-> p_PScm_E_.cross).

@sherm1 sherm1 force-pushed the spatial_inertia_times_vector_speedup branch from 32136dd to 12cd6de Compare December 10, 2024 00:22
Copy link
Member Author

@sherm1 sherm1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the fast response, gentlemen. Comments addressed below.

Reviewable status: 1 unresolved discussion


multibody/tree/rotational_inertia.h line 424 at r1 (raw file):

Previously, xuchenhan-tri wrote…

BTW, have you considered adding an abstraction for symmetric 3x3 matrix to avoid doing these algebra in the inertia class? Seems like there are a few places in this class where we need to deal with the fact that only the triangular part of the matrix is used.

That's a good idea if there are other cases like this, but out of scope for me at the moment. This is the only case where I actually measured a performance impact. Added a TODO.


multibody/tree/rotational_inertia.h line 428 at r1 (raw file):

Previously, xuchenhan-tri wrote…

BTW, I looked at the eigen source code for symmetric matrix vector product and I'm not surprised that this is faster, but could you say, for reference, how much faster is this?

Yeah, it's terrible! Unfortunately I don't have a way to measure the performance of just this little piece of code. I have a good benchmark for its use within ID (cassiebench) so I know its a win but not by how much.


multibody/tree/spatial_inertia.cc line 450 at r1 (raw file):

not clear to me what the goal of the change here is

The previous code failed to factor out the mass -- it did 3 multiplies to get the weighted com + 6 multiplies to convert the unit inertia to a rotational inertia. Now it waits until we have produced the rotational acceleration vector, where we can scale with just the minimum of 3 multiplies for each of the rotational and translational vectors.

I'd be surprised if it's a question of inlining

Right! This was just suboptimal code that needed a tweak.

The function as a whole is not inlined (defined in cc). I was hoping to squeeze it down to something inlinable

p_BoBcmx <-> p_PScm_E

Good point. I added a comment to clarify. I removed the earlier aliasing to make it easier to see what's happening here numerically. I'm considering reworking this in highway if it is still dominating ID.

Copy link
Contributor

@xuchenhan-tri xuchenhan-tri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 2 of 2 files at r2, all commit messages.
Reviewable status: :shipit: complete! all discussions resolved, LGTM from assignees SeanCurtis-TRI(platform),xuchenhan-tri(platform)


multibody/tree/spatial_inertia.cc line 450 at r1 (raw file):

Previously, sherm1 (Michael Sherman) wrote…

not clear to me what the goal of the change here is

The previous code failed to factor out the mass -- it did 3 multiplies to get the weighted com + 6 multiplies to convert the unit inertia to a rotational inertia. Now it waits until we have produced the rotational acceleration vector, where we can scale with just the minimum of 3 multiplies for each of the rotational and translational vectors.

I'd be surprised if it's a question of inlining

Right! This was just suboptimal code that needed a tweak.

The function as a whole is not inlined (defined in cc). I was hoping to squeeze it down to something inlinable

p_BoBcmx <-> p_PScm_E

Good point. I added a comment to clarify. I removed the earlier aliasing to make it easier to see what's happening here numerically. I'm considering reworking this in highway if it is still dominating ID.

Ah ok, I counted wrong! Thanks for the explanation.

@sherm1 sherm1 merged commit 4ee5832 into RobotLocomotion:master Dec 10, 2024
9 of 10 checks passed
RussTedrake pushed a commit to RussTedrake/drake that referenced this pull request Dec 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
priority: medium release notes: none This pull request should not be mentioned in the release notes type: performance
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants