-
Notifications
You must be signed in to change notification settings - Fork 44
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
State and Line Performance Improvements #269
State and Line Performance Improvements #269
Conversation
Hey!
I am out of office. I can give feedback by the end of the month
…On Wed, 6 Nov 2024, 21:18 AlexWKinley, ***@***.***> wrote:
This PR contains changes to a few time integrator and to lines with the
goal of not changing the simulation is any meaningful way, while providing
performance improvements.
Performance Results
Overall, for models of lines, there is roughly a 2x performance
improvement.
Currently, the integrator performance improvements are mostly specific to
RK2 and RK4.
RK2.png (view on web)
<https://github.com/user-attachments/assets/108a05f6-7953-4a8e-944e-a1224a32f096>
RK4.png (view on web)
<https://github.com/user-attachments/assets/9a9b8380-648b-462b-9fe6-27a738f42479>
line_performance_plots.png (view on web)
<https://github.com/user-attachments/assets/eb3841dd-ce01-42f1-a011-5ce9780941b3>
Integrator / State Changes
The primary changes to the integrator and state code are to avoid memory
allocations.
The state code that allows writing
r[1] = r[0] + rd[0] * (0.5 * dt);
is very convenient, but its current implementation means that every
operation on a state allocates memory to create an entire copy of that
state. That basic $y_{n+1} = y_n + dy * \Delta t$ line allocates memory
to store the result of $dy * \Delta t$, and then the result of the
addition, and even an additional allocation for the assignment.
My current solution to this is the new butcher_row function that can do
these sorts of computations in place, without allocation any additional
memory. Named as such because it does the math for a single row in a Butcher
tableau
<https://en.wikipedia.org/wiki/Runge%E2%80%93Kutta_methods#Explicit_Runge.E2.80.93Kutta_methods>
.
With this function we get
// r[1] = r[0] + rd[0] * (0.5 * dt);
butcher_row<1>(r[1], r[0], { 0.5 * dt }, { &rd[0] });
Definitely not a clear and concise as the more mathematical notation, but
with significant performance improvements.
Because of the additional complexity, I've only switch the RK2 and RK4
integrator to use this function. But I can certainly expand its usage if
desired.
The other change to states I've made is to remove the empty constructor
and destructor from StateVar and StateVarDeriv. Because of the rule of
three/five <https://en.cppreference.com/w/cpp/language/rule_of_three>
defining a destructor (even an empty one), causes an expression like r[1]
= r[0] + rd[0] to allocate twice, even though the result of r[0] + rd[0]
can be directly assigned to r[1]. Removing the empty destructor allows
the compiler to avoid that second allocation.
This should help other integrators even if they're not using butcher_row,
but not as significantly as using butcher_row.
Line Changes
The first type of changes to lines are again to avoid memory allocations.
Changing line->getStateDeriv() to take references to the node velocities
and accelerations avoids having to allocate memory for those results every
time.
Also changing Line::setState(const std::vector<vec>& pos, const
std::vector<vec>& vel) to take vector references avoids it allocation
copies of the position and velocities vectors.
The other kind of change to lines are my attempts at simplifying the code
while improving performance.
The biggest thing is getting rid of many of the
if (i == 0)
....else if (i == N)
....else
....
by calculating the values that depend on whether it's an internal or
external node once, allowing for them to be reused, and making the code
nicer to read.
Misc
To improve consistency of the benchmarks, and avoid filling up the console
with simulation times, I added MoorDyn::SetDisableOutput to disable some
of the console and file output.
Logistical Notes
Feel free to share any thoughts/questions you have.
I know that those at NREL are doing some work from their side that could
potentially create some conflicts with some of these changes. I'm happy to
delay merging this and rebasing on top of whatever those changes may be
myself if that would be easiest.
------------------------------
You can view, comment on, or merge this pull request online at:
#269
Commit Summary
- 3252969
<3252969>
setup benchmark
- 7fdc3c7
<7fdc3c7>
butcher_row performance optimization
- aaadbc8
<aaadbc8>
Additional line performance improvements
File Changes
(12 files <https://github.com/FloatingArrayDesign/MoorDyn/pull/269/files>)
- *M* bench/CMakeLists.txt
<https://github.com/FloatingArrayDesign/MoorDyn/pull/269/files#diff-6d2f52fe94e40d1afde76a6c2285b4672c966380ebf25271ea83896eb2048c36>
(1)
- *A* bench/LinesBench.cpp
<https://github.com/FloatingArrayDesign/MoorDyn/pull/269/files#diff-989eb8ce588a04a4607379cab01396fefa1a6be29fcc1c94a767f23062944eb1>
(67)
- *A* bench/LinesBench.hpp
<https://github.com/FloatingArrayDesign/MoorDyn/pull/269/files#diff-fd6ef2ed65a4e067b86c422cbcd49db66815b4854224abcaa9509114a65bfb9f>
(11)
- *M* bench/MDBench.cpp
<https://github.com/FloatingArrayDesign/MoorDyn/pull/269/files#diff-566c725804a08449bb7dc481b5f7ba8cf876644129e9d53ddc0ea6eb341bcc38>
(4)
- *A* bench/Mooring/cases/.gitignore
<https://github.com/FloatingArrayDesign/MoorDyn/pull/269/files#diff-a2080354feaa91ddc711826257d21d4b6f03d25794e763da5993fd22d3255933>
(1)
- *A* bench/Mooring/cases/generate_cases.py
<https://github.com/FloatingArrayDesign/MoorDyn/pull/269/files#diff-e68301d9e67caa59968fdb216171252117a3fc35bac705564b3a9a22a672647e>
(92)
- *M* source/Line.cpp
<https://github.com/FloatingArrayDesign/MoorDyn/pull/269/files#diff-29898bc00510b247ddb019e5cb3b3c646b35ae51e549647eebcf010eb1899fa6>
(207)
- *M* source/Line.hpp
<https://github.com/FloatingArrayDesign/MoorDyn/pull/269/files#diff-c2fdb05a0e46c53fafbecc1f181e856b352e1df28a102ad0918796ff13114056>
(8)
- *M* source/MoorDyn2.cpp
<https://github.com/FloatingArrayDesign/MoorDyn/pull/269/files#diff-80a1bfa1212d3b21df426b512779592bdec270a00483d31731941ec8eab55ade>
(14)
- *M* source/MoorDyn2.hpp
<https://github.com/FloatingArrayDesign/MoorDyn/pull/269/files#diff-01e39b64058932ab8f2c6cb3e898f1cfb304f5e65cad9195064b76f9df0aa248>
(10)
- *M* source/State.hpp
<https://github.com/FloatingArrayDesign/MoorDyn/pull/269/files#diff-f19aeeae84b6c8854671709e13deac266863437e5c513bfaf930a699926cd947>
(98)
- *M* source/Time.cpp
<https://github.com/FloatingArrayDesign/MoorDyn/pull/269/files#diff-aef669bd79784653c94dc8de36cbdd5725acb1c583c2133e81ca403cbf7ac511>
(31)
Patch Links:
- https://github.com/FloatingArrayDesign/MoorDyn/pull/269.patch
- https://github.com/FloatingArrayDesign/MoorDyn/pull/269.diff
—
Reply to this email directly, view it on GitHub
<#269>, or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAMXKKBQIDCHWBYFFH3GWN3Z7J2RNAVCNFSM6AAAAABRJUOV6GVHI2DSMVQWIX3LMV43ASLTON2WKOZSGYZTSMJSGU4DKNA>
.
You are receiving this because you are subscribed to this thread.Message
ID: ***@***.***>
|
@RyanDavies19 |
…e. Useful for the MATLAB wrapper
docs: add disable output to the docs and options list
From my vantage point, I'm good with this being merged whenever. But I'm also happy to leave this open if you'd prefer to merge it after other work, or if we want to wait for Jose to take a look. Thanks for all the work you've been doing on MoorDyn @RyanDavies19. |
This is good to merge on my end! Lets wait to see if @sanguinariojoe has anything to add. |
This is good to go with me. As a minor comment, it would be great to have the
|
I agree it could be nice to better integrate with operator overloading. With how things currently work My future ideal would be for MoorDynState and MoorDynDState to subclass or internally wrap That being said, it's not hard to implement the |
Up to you Alex. As I said, this can be merged as it is.
You seem to have a plan, so I am totally ok with your design decisions.
I personally tend to avoid allocations abusing on += and *=, and sometimes
preallocating memory for + and *, overloading = as well to take advantage.
But I did not dive as much as you on the Eigen way, so I trust on you. On
top of that, you are usually the winning horse
…On Mon, 2 Dec 2024, 15:08 AlexWKinley, ***@***.***> wrote:
@sanguinariojoe <https://github.com/sanguinariojoe>
I agree it could be nice to better integrate with operator overloading.
With how things currently work butcher_row<1>(r[0], r[0], { dt }, {
&rd[1] }); would still be more performant than r[0] += rd[1] * dt. Since
rd[1] * dt would itself allocate a new state to store the scaled
derivatives in.
My future ideal would be for MoorDynState and MoorDynDState to subclass or
internally wrap Eigen::ArrayXd. Because Eigen uses expression templates,
things like r[0] = r[0] + (rd[0] + rd[3]) * (dt / 6.0) + (rd[1] + rd[2])
* (dt / 3.0); would avoid allocating and basically compile to the
equivalent loop expression. But there's some complexity in terms of then
being able to get individual object states, and the internal state code
would get more complicated overall. So I've held off on attempting that,
especially since there are other changes/additions to states in the
pipeline from Ryan's work.
That being said, it's not hard to implement the operator+=, and if it
would be useful I'm happy to add it. I just don't think any code currently
uses it, since it's not currently defined.
—
Reply to this email directly, view it on GitHub
<#269 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAMXKKDTSE4ARQOOUIAHIK32DRSUXAVCNFSM6AAAAABRJUOV6GVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDKMJRGY2DEMBWGE>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
Let's merge this as in then for now if it works for you @AlexWKinley. One last thing before that, can we change |
Oops, I somehow missed that review, my apologies. Should be fixed now. I'm good for this to be merged. |
This PR contains changes to a few time integrator and to lines with the goal of not changing the simulation is any meaningful way, while providing performance improvements.
Performance Results
Overall, for models of lines, there is roughly a 2x performance improvement.
Currently, the integrator performance improvements are mostly specific to RK2 and RK4.
Integrator / State Changes
The primary changes to the integrator and state code are to avoid memory allocations.
The state code that allows writing
is very convenient, but its current implementation means that every operation on a state allocates memory to create an entire copy of that state. That basic$y_{n+1} = y_n + dy * \Delta t$ line allocates memory to store the result of $dy * \Delta t$ , and then the result of the addition, and even an additional allocation for the assignment.
My current solution to this is the new
butcher_row
function that can do these sorts of computations in place, without allocation any additional memory. Named as such because it does the math for a single row in a Butcher tableau.With this function we get
Definitely not a clear and concise as the more mathematical notation, but with significant performance improvements.
Because of the additional complexity, I've only switch the RK2 and RK4 integrator to use this function. But I can certainly expand its usage if desired.
The other change to states I've made is to remove the empty constructor and destructor from StateVar and StateVarDeriv. Because of the rule of three/five defining a destructor (even an empty one), causes an expression like
r[1] = r[0] + rd[0]
to allocate twice, even though the result ofr[0] + rd[0]
can be directly assigned tor[1]
. Removing the empty destructor allows the compiler to avoid that second allocation.This should help other integrators even if they're not using butcher_row, but not as significantly as using butcher_row.
Line Changes
The first type of changes to lines are again to avoid memory allocations.
Changing
line->getStateDeriv()
to take references to the node velocities and accelerations avoids having to allocate memory for those results every time.Also changing
Line::setState(const std::vector<vec>& pos, const std::vector<vec>& vel)
to take vector references avoids it allocation copies of the position and velocities vectors.The other kind of change to lines are my attempts at simplifying the code while improving performance.
The biggest thing is getting rid of many of the
by calculating the values that depend on whether it's an internal or external node once, allowing for them to be reused, and making the code nicer to read.
Misc
To improve consistency of the benchmarks, and avoid filling up the console with simulation times, I added
MoorDyn::SetDisableOutput
to disable some of the console and file output.Logistical Notes
Feel free to share any thoughts/questions you have.
I know that those at NREL are doing some work from their side that could potentially create some conflicts with some of these changes. I'm happy to delay merging this and rebasing on top of whatever those changes may be myself if that would be easiest.