-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adds an initialization event to LcmSubscriberSystem #20072
Adds an initialization event to LcmSubscriberSystem #20072
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Checkpoint.
Reviewed 6 of 7 files at r1, all commit messages.
Reviewable status: 14 unresolved discussions, LGTM missing from assignee jwnimmer-tri(platform), needs at least two assigned reviewers
systems/lcm/lcm_subscriber_system.h
line 61 at r1 (raw file):
nit The subscriber API was specifically designed to allow the subscriptions to outlive their underlying interface. So as written here, this new documentation is a breaking change:
Any significant changes to the API documentation should be announced via deprecation with 3 months of advance notice.
However, the fix is easy. The documentation should require that pointer remain valid only when wait_for_..._timeout
is strictly greater than zero.
Since the default is zero, there is no breaking change -- just a new feature with its own local preconditions.
systems/lcm/lcm_subscriber_system.h
line 68 at r1 (raw file):
* processes existing received messages. If the timeout is > 0, then the * initialization event will call lcm->HandleSubscriptions() until at least * one message is received or until the timeout.
BTW We could also imagine reinforcing in the documentation here that the timeout is measured against the wall clock elapsed time, not the context time.
(Perhaps that's an improbable misunderstanding, though, and not worth spending words on.)
systems/lcm/lcm_subscriber_system.h
line 68 at r1 (raw file):
* processes existing received messages. If the timeout is > 0, then the * initialization event will call lcm->HandleSubscriptions() until at least * one message is received or until the timeout.
BTW Should we mention that passing ∞
as the timeout is valid (in order to wait indefinitely)?
systems/lcm/lcm_subscriber_system.h
line 182 at r1 (raw file):
// The lcm interface is (maybe) used to handle subscriptions during // Initialization. drake::lcm::DrakeLcmInterface* lcm_;
nit GSG requires either initialization or a const. Probably best is a const.
Suggestion:
drake::lcm::DrakeLcmInterface* const lcm_;
systems/lcm/lcm_subscriber_system.h
line 185 at r1 (raw file):
// A timeout in seconds. double wait_for_message_on_initialization_timeout_;
nit GSG requires either initialization or a const. Probably best is a const.
Suggestion:
const double wait_for_message_on_initialization_timeout_;
systems/lcm/lcm_subscriber_system.cc
line 31 at r1 (raw file):
serializer_(std::move(serializer)), magic_number_{kMagic}, lcm_{lcm},
nit I think the code would be a little more robust here if we only captured the lcm_
pointer exactly when needed. That would help encode the (new) lifetime requirement from the documentation into the code.
Suggestion:
lcm_{wait_for_message_on_initialization_timeout_ > 0 ? lcm : nullptr},
systems/lcm/lcm_subscriber_system.cc
line 36 at r1 (raw file):
DRAKE_THROW_UNLESS(serializer_ != nullptr); DRAKE_THROW_UNLESS(lcm != nullptr);
nit Missing one error-check. Unless we reject NaNs out of hand, some of the comparisons against 0 within the implementation become very subtly brittle.
DRAKE_THROW_UNLESS(!std::isnan(wait_for_message_on_initialization_timeout));
systems/lcm/lcm_subscriber_system.cc
line 231 at r1 (raw file):
// Since the DrakeLcmInterface will not be handling subscriptions during // this initialization, we must handle them directly here. lcm_->HandleSubscriptions(100 /* timeout_millis*/);
A hard-coded 100
will be too much waiting in some cases (when the remaining wall time balance is smaller).
It would be possible to try to convert the remaining wall clock timeout to a millis
value and pass it that as an argument here, but there are some sneaky overflow and rounding hazards with that approach.
My suggestion is to just hard-code 1 /* timeout_millis */
here and call it a day. Waking up at 1 kHz might be a little be questionable during an inner loop, but during one-time setup code seems perfectly plausible to me.
systems/lcm/lcm_subscriber_system.cc
line 234 at r1 (raw file):
} if (GetInternalMessageCount() > 0) { log()->info("Received");
BTW My own taste is that with paired log messages (waiting .. received), the specific details are congruent. That helps specifically in case there are multiple threads (each running different diagrams).
Suggestion:
log()->info("Received message on {}", channel_);
systems/lcm/lcm_subscriber_system.cc
line 238 at r1 (raw file):
// TODO(russt): Remove this once EventStatus are actually propagated. throw std::runtime_error(fmt::format( "Timed out without receiving any message on {}", channel_));
BTW If you like, we could log the interface URL as part of this message as well: lcm_->get_lcm_url()
. It might help forestall confusion about why nothing was received.
systems/lcm/lcm_subscriber_system.cc
line 239 at r1 (raw file):
throw std::runtime_error(fmt::format( "Timed out without receiving any message on {}", channel_)); log()->info("TIMEOUT");
nit Dead code.
systems/lcm/lcm_subscriber_system.cc
line 246 at r1 (raw file):
return ProcessMessageAndStoreToAbstractState(context, state); } else { return EventStatus::Failed(
In case wait_for_message_...timeout_
was zero, we should not return Failed
here, should we? I would expect DidNothing
in that case.
bindings/pydrake/systems/_lcm_extra.py
line 46 at r1 (raw file):
lcm, use_cpp_serializer=False, wait_for_message_on_initialization_timeout=0.0):
BTW We could probably save me some trouble later if we force this new argument to be kwarg-only right from the outset:
Suggestion:
use_cpp_serializer=False,
*,
wait_for_message_on_initialization_timeout=0.0):
bindings/pydrake/systems/_lcm_extra.py
line 56 at r1 (raw file):
systems that are implemented in C++. LCM types must be registered in C++ via `BindCppSerializer`. """
nit Missing documentation of new argument.
001c74c
to
5ec8aca
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 2 unresolved discussions, LGTM missing from assignee jwnimmer-tri(platform), needs at least two assigned reviewers
systems/lcm/lcm_subscriber_system.h
line 61 at r1 (raw file):
Previously, jwnimmer-tri (Jeremy Nimmer) wrote…
nit The subscriber API was specifically designed to allow the subscriptions to outlive their underlying interface. So as written here, this new documentation is a breaking change:
Any significant changes to the API documentation should be announced via deprecation with 3 months of advance notice.
However, the fix is easy. The documentation should require that pointer remain valid only when
wait_for_..._timeout
is strictly greater than zero.Since the default is zero, there is no breaking change -- just a new feature with its own local preconditions.
Done.
systems/lcm/lcm_subscriber_system.h
line 64 at r1 (raw file):
* * @param wait_for_message_on_initialization_timeout The number of seconds to * wait for GetInternalMessageCount() to be > 0. If this timeout is <= 0,
btw -- I had to pick between GetInternalMessageCount and the context-dependent message count. Do you agree with the choice?
systems/lcm/lcm_subscriber_system.h
line 68 at r1 (raw file):
Previously, jwnimmer-tri (Jeremy Nimmer) wrote…
BTW We could also imagine reinforcing in the documentation here that the timeout is measured against the wall clock elapsed time, not the context time.
(Perhaps that's an improbable misunderstanding, though, and not worth spending words on.)
Done.
systems/lcm/lcm_subscriber_system.h
line 68 at r1 (raw file):
Previously, jwnimmer-tri (Jeremy Nimmer) wrote…
BTW Should we mention that passing
∞
as the timeout is valid (in order to wait indefinitely)?
Done.
systems/lcm/lcm_subscriber_system.h
line 182 at r1 (raw file):
Previously, jwnimmer-tri (Jeremy Nimmer) wrote…
nit GSG requires either initialization or a const. Probably best is a const.
Done.
systems/lcm/lcm_subscriber_system.h
line 185 at r1 (raw file):
Previously, jwnimmer-tri (Jeremy Nimmer) wrote…
nit GSG requires either initialization or a const. Probably best is a const.
Done.
systems/lcm/lcm_subscriber_system.cc
line 31 at r1 (raw file):
Previously, jwnimmer-tri (Jeremy Nimmer) wrote…
nit I think the code would be a little more robust here if we only captured the
lcm_
pointer exactly when needed. That would help encode the (new) lifetime requirement from the documentation into the code.
Done.
systems/lcm/lcm_subscriber_system.cc
line 36 at r1 (raw file):
Previously, jwnimmer-tri (Jeremy Nimmer) wrote…
nit Missing one error-check. Unless we reject NaNs out of hand, some of the comparisons against 0 within the implementation become very subtly brittle.
DRAKE_THROW_UNLESS(!std::isnan(wait_for_message_on_initialization_timeout));
Done.
systems/lcm/lcm_subscriber_system.cc
line 231 at r1 (raw file):
Previously, jwnimmer-tri (Jeremy Nimmer) wrote…
A hard-coded
100
will be too much waiting in some cases (when the remaining wall time balance is smaller).It would be possible to try to convert the remaining wall clock timeout to a
millis
value and pass it that as an argument here, but there are some sneaky overflow and rounding hazards with that approach.My suggestion is to just hard-code
1 /* timeout_millis */
here and call it a day. Waking up at 1 kHz might be a little be questionable during an inner loop, but during one-time setup code seems perfectly plausible to me.
Done. I'm happy you commented. I was going to ask specifically what value you liked here.
systems/lcm/lcm_subscriber_system.cc
line 234 at r1 (raw file):
Previously, jwnimmer-tri (Jeremy Nimmer) wrote…
BTW My own taste is that with paired log messages (waiting .. received), the specific details are congruent. That helps specifically in case there are multiple threads (each running different diagrams).
Done.
systems/lcm/lcm_subscriber_system.cc
line 238 at r1 (raw file):
Previously, jwnimmer-tri (Jeremy Nimmer) wrote…
BTW If you like, we could log the interface URL as part of this message as well:
lcm_->get_lcm_url()
. It might help forestall confusion about why nothing was received.
Done.
systems/lcm/lcm_subscriber_system.cc
line 239 at r1 (raw file):
Previously, jwnimmer-tri (Jeremy Nimmer) wrote…
nit Dead code.
Done. I had it there intentionally for after the TODO was resolved. But I can zap it.
systems/lcm/lcm_subscriber_system.cc
line 246 at r1 (raw file):
Previously, jwnimmer-tri (Jeremy Nimmer) wrote…
In case
wait_for_message_...timeout_
was zero, we should not returnFailed
here, should we? I would expectDidNothing
in that case.
Done.
bindings/pydrake/systems/_lcm_extra.py
line 46 at r1 (raw file):
Previously, jwnimmer-tri (Jeremy Nimmer) wrote…
BTW We could probably save me some trouble later if we force this new argument to be kwarg-only right from the outset:
Done.
bindings/pydrake/systems/_lcm_extra.py
line 56 at r1 (raw file):
Previously, jwnimmer-tri (Jeremy Nimmer) wrote…
nit Missing documentation of new argument.
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 2 of 4 files at r2, all commit messages.
Reviewable status: 2 unresolved discussions, LGTM missing from assignee jwnimmer-tri(platform), needs at least two assigned reviewers
systems/lcm/lcm_subscriber_system.h
line 64 at r1 (raw file):
Previously, RussTedrake (Russ Tedrake) wrote…
btw -- I had to pick between GetInternalMessageCount and the context-dependent message count. Do you agree with the choice?
Aha, that's a great reminder. I'd failed to consider what happens on a reinitialize event (i.e., when restarting a simulation). The current code is probably not correct in that case.
I think the simplest way to provide my opinion on how that should work would be with a pull request instead of trying to write it down here. Stay tuned.
systems/lcm/lcm_subscriber_system.cc
line 239 at r1 (raw file):
Previously, RussTedrake (Russ Tedrake) wrote…
Done. I had it there intentionally for after the TODO was resolved. But I can zap it.
After we have EventStatus working, we would set the EventStatus.message
property to thisfmt::format()
string, and we would not call drake::log()
at all. It's the job of the EventStatus recipient to decide what to do about the status value wrt logging, throwing, and/or ignoring.
systems/lcm/lcm_subscriber_system.cc
line 37 at r2 (raw file):
DRAKE_THROW_UNLESS(serializer_ != nullptr); DRAKE_THROW_UNLESS(wait_for_message_on_initialization_timeout_ <= 0 || lcm_ != nullptr);
nit This was a bridge too far. We need the lcm
argument always to be non-null (so that we can subscribe), no matter what's happening with initialization events.
Suggestion:
DRAKE_THROW_UNLESS(lcm != nullptr);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 2 unresolved discussions, LGTM missing from assignee jwnimmer-tri(platform), needs at least two assigned reviewers
systems/lcm/lcm_subscriber_system.h
line 64 at r1 (raw file):
Previously, jwnimmer-tri (Jeremy Nimmer) wrote…
Aha, that's a great reminder. I'd failed to consider what happens on a reinitialize event (i.e., when restarting a simulation). The current code is probably not correct in that case.
I think the simplest way to provide my opinion on how that should work would be with a pull request instead of trying to write it down here. Stay tuned.
e3edaf1
to
0b2af9f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 1 unresolved discussion, LGTM missing from assignee jwnimmer-tri(platform), needs at least two assigned reviewers
systems/lcm/lcm_subscriber_system.h
line 64 at r1 (raw file):
Previously, jwnimmer-tri (Jeremy Nimmer) wrote…
Done. Thank you.
systems/lcm/lcm_subscriber_system.cc
line 239 at r1 (raw file):
Previously, jwnimmer-tri (Jeremy Nimmer) wrote…
After we have EventStatus working, we would set the
EventStatus.message
property to thisfmt::format()
string, and we would not calldrake::log()
at all. It's the job of the EventStatus recipient to decide what to do about the status value wrt logging, throwing, and/or ignoring.
Fair point.
systems/lcm/lcm_subscriber_system.cc
line 37 at r2 (raw file):
Previously, jwnimmer-tri (Jeremy Nimmer) wrote…
nit This was a bridge too far. We need the
lcm
argument always to be non-null (so that we can subscribe), no matter what's happening with initialization events.
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 4 of 4 files at r3, all commit messages.
Reviewable status: 1 unresolved discussion, needs at least two assigned reviewers
systems/lcm/lcm_subscriber_system.cc
line 37 at r2 (raw file):
Previously, RussTedrake (Russ Tedrake) wrote…
Done.
Missing push? (I forgot to fix this one in my branch.)
Also supports a new opt-in ability to wait for at least one message to be received during initialization.
0b2af9f
to
db62749
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 1 unresolved discussion, needs at least two assigned reviewers
systems/lcm/lcm_subscriber_system.cc
line 37 at r2 (raw file):
Previously, jwnimmer-tri (Jeremy Nimmer) wrote…
Missing push? (I forgot to fix this one in my branch.)
Done. for real.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 1 files at r4, all commit messages.
Reviewable status: needs at least two assigned reviewers
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+@ggould-tri for platform review, please.
Reviewable status: LGTM missing from assignee ggould-tri(platform)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 3 of 7 files at r1, 3 of 4 files at r3, 1 of 1 files at r4, all commit messages.
Reviewable status: complete! all discussions resolved, LGTM from assignees ggould-tri(platform),jwnimmer-tri(platform)
Also supports a new opt-in ability to wait for at least one message to be received during initialization.
+@jwnimmer-tri for feature review, please.
This change is