Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adds an initialization event to LcmSubscriberSystem #20072

Conversation

RussTedrake
Copy link
Contributor

@RussTedrake RussTedrake commented Aug 26, 2023

Also supports a new opt-in ability to wait for at least one message to be received during initialization.

+@jwnimmer-tri for feature review, please.


This change is Reviewable

@RussTedrake RussTedrake added the release notes: feature This pull request contains a new feature label Aug 26, 2023
Copy link
Collaborator

@jwnimmer-tri jwnimmer-tri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Checkpoint.

Reviewed 6 of 7 files at r1, all commit messages.
Reviewable status: 14 unresolved discussions, LGTM missing from assignee jwnimmer-tri(platform), needs at least two assigned reviewers


systems/lcm/lcm_subscriber_system.h line 61 at r1 (raw file):
nit The subscriber API was specifically designed to allow the subscriptions to outlive their underlying interface. So as written here, this new documentation is a breaking change:

Any significant changes to the API documentation should be announced via deprecation with 3 months of advance notice.

-- https://drake.mit.edu/stable.html#behavioral-changes

However, the fix is easy. The documentation should require that pointer remain valid only when wait_for_..._timeout is strictly greater than zero.

Since the default is zero, there is no breaking change -- just a new feature with its own local preconditions.


systems/lcm/lcm_subscriber_system.h line 68 at r1 (raw file):

   * processes existing received messages. If the timeout is > 0, then the
   * initialization event will call lcm->HandleSubscriptions() until at least
   * one message is received or until the timeout.

BTW We could also imagine reinforcing in the documentation here that the timeout is measured against the wall clock elapsed time, not the context time.

(Perhaps that's an improbable misunderstanding, though, and not worth spending words on.)


systems/lcm/lcm_subscriber_system.h line 68 at r1 (raw file):

   * processes existing received messages. If the timeout is > 0, then the
   * initialization event will call lcm->HandleSubscriptions() until at least
   * one message is received or until the timeout.

BTW Should we mention that passing as the timeout is valid (in order to wait indefinitely)?


systems/lcm/lcm_subscriber_system.h line 182 at r1 (raw file):

  // The lcm interface is (maybe) used to handle subscriptions during
  // Initialization.
  drake::lcm::DrakeLcmInterface* lcm_;

nit GSG requires either initialization or a const. Probably best is a const.

Suggestion:

drake::lcm::DrakeLcmInterface* const lcm_;

systems/lcm/lcm_subscriber_system.h line 185 at r1 (raw file):

  // A timeout in seconds.
  double wait_for_message_on_initialization_timeout_;

nit GSG requires either initialization or a const. Probably best is a const.

Suggestion:

const double wait_for_message_on_initialization_timeout_;

systems/lcm/lcm_subscriber_system.cc line 31 at r1 (raw file):

      serializer_(std::move(serializer)),
      magic_number_{kMagic},
      lcm_{lcm},

nit I think the code would be a little more robust here if we only captured the lcm_ pointer exactly when needed. That would help encode the (new) lifetime requirement from the documentation into the code.

Suggestion:

lcm_{wait_for_message_on_initialization_timeout_ > 0 ? lcm : nullptr},

systems/lcm/lcm_subscriber_system.cc line 36 at r1 (raw file):

  DRAKE_THROW_UNLESS(serializer_ != nullptr);
  DRAKE_THROW_UNLESS(lcm != nullptr);

nit Missing one error-check. Unless we reject NaNs out of hand, some of the comparisons against 0 within the implementation become very subtly brittle.

DRAKE_THROW_UNLESS(!std::isnan(wait_for_message_on_initialization_timeout));

systems/lcm/lcm_subscriber_system.cc line 231 at r1 (raw file):

      // Since the DrakeLcmInterface will not be handling subscriptions during
      // this initialization, we must handle them directly here.
      lcm_->HandleSubscriptions(100 /* timeout_millis*/);

A hard-coded 100 will be too much waiting in some cases (when the remaining wall time balance is smaller).

It would be possible to try to convert the remaining wall clock timeout to a millis value and pass it that as an argument here, but there are some sneaky overflow and rounding hazards with that approach.

My suggestion is to just hard-code 1 /* timeout_millis */ here and call it a day. Waking up at 1 kHz might be a little be questionable during an inner loop, but during one-time setup code seems perfectly plausible to me.


systems/lcm/lcm_subscriber_system.cc line 234 at r1 (raw file):

    }
    if (GetInternalMessageCount() > 0) {
      log()->info("Received");

BTW My own taste is that with paired log messages (waiting .. received), the specific details are congruent. That helps specifically in case there are multiple threads (each running different diagrams).

Suggestion:

log()->info("Received message on {}", channel_);

systems/lcm/lcm_subscriber_system.cc line 238 at r1 (raw file):

      // TODO(russt): Remove this once EventStatus are actually propagated.
      throw std::runtime_error(fmt::format(
          "Timed out without receiving any message on {}", channel_));

BTW If you like, we could log the interface URL as part of this message as well: lcm_->get_lcm_url(). It might help forestall confusion about why nothing was received.


systems/lcm/lcm_subscriber_system.cc line 239 at r1 (raw file):

      throw std::runtime_error(fmt::format(
          "Timed out without receiving any message on {}", channel_));
      log()->info("TIMEOUT");

nit Dead code.


systems/lcm/lcm_subscriber_system.cc line 246 at r1 (raw file):

    return ProcessMessageAndStoreToAbstractState(context, state);
  } else {
    return EventStatus::Failed(

In case wait_for_message_...timeout_ was zero, we should not return Failed here, should we? I would expect DidNothing in that case.


bindings/pydrake/systems/_lcm_extra.py line 46 at r1 (raw file):

                         lcm,
                         use_cpp_serializer=False,
                         wait_for_message_on_initialization_timeout=0.0):

BTW We could probably save me some trouble later if we force this new argument to be kwarg-only right from the outset:

Suggestion:

                         use_cpp_serializer=False,
                         *,
                         wait_for_message_on_initialization_timeout=0.0):

bindings/pydrake/systems/_lcm_extra.py line 56 at r1 (raw file):

            systems that are implemented in C++. LCM types must be registered
            in C++ via `BindCppSerializer`.
    """

nit Missing documentation of new argument.

@RussTedrake RussTedrake force-pushed the lcm_subscribe_wait_on_initialization branch from 001c74c to 5ec8aca Compare August 26, 2023 17:05
Copy link
Contributor Author

@RussTedrake RussTedrake left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 2 unresolved discussions, LGTM missing from assignee jwnimmer-tri(platform), needs at least two assigned reviewers


systems/lcm/lcm_subscriber_system.h line 61 at r1 (raw file):

Previously, jwnimmer-tri (Jeremy Nimmer) wrote…

nit The subscriber API was specifically designed to allow the subscriptions to outlive their underlying interface. So as written here, this new documentation is a breaking change:

Any significant changes to the API documentation should be announced via deprecation with 3 months of advance notice.

-- https://drake.mit.edu/stable.html#behavioral-changes

However, the fix is easy. The documentation should require that pointer remain valid only when wait_for_..._timeout is strictly greater than zero.

Since the default is zero, there is no breaking change -- just a new feature with its own local preconditions.

Done.


systems/lcm/lcm_subscriber_system.h line 64 at r1 (raw file):

   *
   * @param wait_for_message_on_initialization_timeout The number of seconds to
   * wait for GetInternalMessageCount() to be > 0. If this timeout is <= 0,

btw -- I had to pick between GetInternalMessageCount and the context-dependent message count. Do you agree with the choice?


systems/lcm/lcm_subscriber_system.h line 68 at r1 (raw file):

Previously, jwnimmer-tri (Jeremy Nimmer) wrote…

BTW We could also imagine reinforcing in the documentation here that the timeout is measured against the wall clock elapsed time, not the context time.

(Perhaps that's an improbable misunderstanding, though, and not worth spending words on.)

Done.


systems/lcm/lcm_subscriber_system.h line 68 at r1 (raw file):

Previously, jwnimmer-tri (Jeremy Nimmer) wrote…

BTW Should we mention that passing as the timeout is valid (in order to wait indefinitely)?

Done.


systems/lcm/lcm_subscriber_system.h line 182 at r1 (raw file):

Previously, jwnimmer-tri (Jeremy Nimmer) wrote…

nit GSG requires either initialization or a const. Probably best is a const.

Done.


systems/lcm/lcm_subscriber_system.h line 185 at r1 (raw file):

Previously, jwnimmer-tri (Jeremy Nimmer) wrote…

nit GSG requires either initialization or a const. Probably best is a const.

Done.


systems/lcm/lcm_subscriber_system.cc line 31 at r1 (raw file):

Previously, jwnimmer-tri (Jeremy Nimmer) wrote…

nit I think the code would be a little more robust here if we only captured the lcm_ pointer exactly when needed. That would help encode the (new) lifetime requirement from the documentation into the code.

Done.


systems/lcm/lcm_subscriber_system.cc line 36 at r1 (raw file):

Previously, jwnimmer-tri (Jeremy Nimmer) wrote…

nit Missing one error-check. Unless we reject NaNs out of hand, some of the comparisons against 0 within the implementation become very subtly brittle.

DRAKE_THROW_UNLESS(!std::isnan(wait_for_message_on_initialization_timeout));

Done.


systems/lcm/lcm_subscriber_system.cc line 231 at r1 (raw file):

Previously, jwnimmer-tri (Jeremy Nimmer) wrote…

A hard-coded 100 will be too much waiting in some cases (when the remaining wall time balance is smaller).

It would be possible to try to convert the remaining wall clock timeout to a millis value and pass it that as an argument here, but there are some sneaky overflow and rounding hazards with that approach.

My suggestion is to just hard-code 1 /* timeout_millis */ here and call it a day. Waking up at 1 kHz might be a little be questionable during an inner loop, but during one-time setup code seems perfectly plausible to me.

Done. I'm happy you commented. I was going to ask specifically what value you liked here.


systems/lcm/lcm_subscriber_system.cc line 234 at r1 (raw file):

Previously, jwnimmer-tri (Jeremy Nimmer) wrote…

BTW My own taste is that with paired log messages (waiting .. received), the specific details are congruent. That helps specifically in case there are multiple threads (each running different diagrams).

Done.


systems/lcm/lcm_subscriber_system.cc line 238 at r1 (raw file):

Previously, jwnimmer-tri (Jeremy Nimmer) wrote…

BTW If you like, we could log the interface URL as part of this message as well: lcm_->get_lcm_url(). It might help forestall confusion about why nothing was received.

Done.


systems/lcm/lcm_subscriber_system.cc line 239 at r1 (raw file):

Previously, jwnimmer-tri (Jeremy Nimmer) wrote…

nit Dead code.

Done. I had it there intentionally for after the TODO was resolved. But I can zap it.


systems/lcm/lcm_subscriber_system.cc line 246 at r1 (raw file):

Previously, jwnimmer-tri (Jeremy Nimmer) wrote…

In case wait_for_message_...timeout_ was zero, we should not return Failed here, should we? I would expect DidNothing in that case.

Done.


bindings/pydrake/systems/_lcm_extra.py line 46 at r1 (raw file):

Previously, jwnimmer-tri (Jeremy Nimmer) wrote…

BTW We could probably save me some trouble later if we force this new argument to be kwarg-only right from the outset:

Done.


bindings/pydrake/systems/_lcm_extra.py line 56 at r1 (raw file):

Previously, jwnimmer-tri (Jeremy Nimmer) wrote…

nit Missing documentation of new argument.

Done.

Copy link
Collaborator

@jwnimmer-tri jwnimmer-tri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 2 of 4 files at r2, all commit messages.
Reviewable status: 2 unresolved discussions, LGTM missing from assignee jwnimmer-tri(platform), needs at least two assigned reviewers


systems/lcm/lcm_subscriber_system.h line 64 at r1 (raw file):

Previously, RussTedrake (Russ Tedrake) wrote…

btw -- I had to pick between GetInternalMessageCount and the context-dependent message count. Do you agree with the choice?

Aha, that's a great reminder. I'd failed to consider what happens on a reinitialize event (i.e., when restarting a simulation). The current code is probably not correct in that case.

I think the simplest way to provide my opinion on how that should work would be with a pull request instead of trying to write it down here. Stay tuned.


systems/lcm/lcm_subscriber_system.cc line 239 at r1 (raw file):

Previously, RussTedrake (Russ Tedrake) wrote…

Done. I had it there intentionally for after the TODO was resolved. But I can zap it.

After we have EventStatus working, we would set the EventStatus.message property to thisfmt::format() string, and we would not call drake::log() at all. It's the job of the EventStatus recipient to decide what to do about the status value wrt logging, throwing, and/or ignoring.


systems/lcm/lcm_subscriber_system.cc line 37 at r2 (raw file):

  DRAKE_THROW_UNLESS(serializer_ != nullptr);
  DRAKE_THROW_UNLESS(wait_for_message_on_initialization_timeout_ <= 0 ||
                     lcm_ != nullptr);

nit This was a bridge too far. We need the lcm argument always to be non-null (so that we can subscribe), no matter what's happening with initialization events.

Suggestion:

  DRAKE_THROW_UNLESS(lcm != nullptr);

Copy link
Collaborator

@jwnimmer-tri jwnimmer-tri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 2 unresolved discussions, LGTM missing from assignee jwnimmer-tri(platform), needs at least two assigned reviewers


systems/lcm/lcm_subscriber_system.h line 64 at r1 (raw file):

Previously, jwnimmer-tri (Jeremy Nimmer) wrote…

Aha, that's a great reminder. I'd failed to consider what happens on a reinitialize event (i.e., when restarting a simulation). The current code is probably not correct in that case.

I think the simplest way to provide my opinion on how that should work would be with a pull request instead of trying to write it down here. Stay tuned.

RussTedrake#54

@RussTedrake RussTedrake force-pushed the lcm_subscribe_wait_on_initialization branch from e3edaf1 to 0b2af9f Compare August 26, 2023 19:32
Copy link
Contributor Author

@RussTedrake RussTedrake left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 1 unresolved discussion, LGTM missing from assignee jwnimmer-tri(platform), needs at least two assigned reviewers


systems/lcm/lcm_subscriber_system.h line 64 at r1 (raw file):

Previously, jwnimmer-tri (Jeremy Nimmer) wrote…

RussTedrake#54

Done. Thank you.


systems/lcm/lcm_subscriber_system.cc line 239 at r1 (raw file):

Previously, jwnimmer-tri (Jeremy Nimmer) wrote…

After we have EventStatus working, we would set the EventStatus.message property to thisfmt::format() string, and we would not call drake::log() at all. It's the job of the EventStatus recipient to decide what to do about the status value wrt logging, throwing, and/or ignoring.

Fair point.


systems/lcm/lcm_subscriber_system.cc line 37 at r2 (raw file):

Previously, jwnimmer-tri (Jeremy Nimmer) wrote…

nit This was a bridge too far. We need the lcm argument always to be non-null (so that we can subscribe), no matter what's happening with initialization events.

Done.

Copy link
Collaborator

@jwnimmer-tri jwnimmer-tri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm: feature.

Reviewed 4 of 4 files at r3, all commit messages.
Reviewable status: 1 unresolved discussion, needs at least two assigned reviewers


systems/lcm/lcm_subscriber_system.cc line 37 at r2 (raw file):

Previously, RussTedrake (Russ Tedrake) wrote…

Done.

Missing push? (I forgot to fix this one in my branch.)

Also supports a new opt-in ability to wait for at least one message to
be received during initialization.
@RussTedrake RussTedrake force-pushed the lcm_subscribe_wait_on_initialization branch from 0b2af9f to db62749 Compare August 26, 2023 19:49
Copy link
Contributor Author

@RussTedrake RussTedrake left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 1 unresolved discussion, needs at least two assigned reviewers


systems/lcm/lcm_subscriber_system.cc line 37 at r2 (raw file):

Previously, jwnimmer-tri (Jeremy Nimmer) wrote…

Missing push? (I forgot to fix this one in my branch.)

Done. for real.

Copy link
Collaborator

@jwnimmer-tri jwnimmer-tri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 1 files at r4, all commit messages.
Reviewable status: needs at least two assigned reviewers

Copy link
Contributor Author

@RussTedrake RussTedrake left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+@ggould-tri for platform review, please.

Reviewable status: LGTM missing from assignee ggould-tri(platform)

Copy link
Contributor

@ggould-tri ggould-tri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 3 of 7 files at r1, 3 of 4 files at r3, 1 of 1 files at r4, all commit messages.
Reviewable status: :shipit: complete! all discussions resolved, LGTM from assignees ggould-tri(platform),jwnimmer-tri(platform)

@jwnimmer-tri jwnimmer-tri merged commit 086e93a into RobotLocomotion:master Aug 28, 2023
@RussTedrake RussTedrake deleted the lcm_subscribe_wait_on_initialization branch September 17, 2023 10:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
priority: high release notes: feature This pull request contains a new feature
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants