Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

No error in the case a QoS 1 message sending is timed out #119

Open
vpetrigo opened this issue Dec 23, 2020 · 8 comments
Open

No error in the case a QoS 1 message sending is timed out #119

vpetrigo opened this issue Dec 23, 2020 · 8 comments

Comments

@vpetrigo
Copy link
Contributor

Hello,

I have met an issue that the library right now does not report an error if a message with QoS 1 (that should be valid for QoS 2 as well, but I have not checked that) sending meets command timeout.
As we can see there in the case of command timeout __mqtt_send() just marks a message to be resent in the future with no error indication to the caller.
I would like to describe a case where it may introduce some problems:

  • send a QoS 1 message to the MQTT broker
  • await for acknowledge from the broker (see note 1)
  • in the case of no acknowledge, preserve message's content somewhere to resend after 2 hours

Same effect may be observer with keep-alive pings that have not been confirmed with a PINGRESP from the broker.

Note 1
Right now it seems there are no way to explicitly wait for a message delivery with ACK. Potential workaround there is to call mqtt_sync() after a QoS 1 message send and wait for response_timeout to see whether client.number_of_timeouts increments or not.

Current behaviour example

  • connect to the MQTT broker in non-blocking mode
  • block server responses delivery (e.g. with iptables: iptables -A INPUT -s <broker IP> -j DROP)
  • QoS 1 messages can be sent to the broker, but there won't be neither ACK nor messages from the broker
  • keep-alive pings also do not lead to response from the broker
  • no notification from the library about error due to missed keep-alive messages
  • no notification from the library that the QoS 1 messages have been sent to the broker
  • one of the error that may occur - MQTT_ERROR_SEND_BUFFER_IS_FULL when a TX buffer is full of not delivered messages which may be handled as a signal for reconnection attempt to the broker
@LiamBindle
Copy link
Owner

LiamBindle commented Dec 28, 2020

Hi @vpetrigo, thanks for raising this. I'm not sure if I quite follow though. If I understand correctly, the problem you've identified is MQTT-C won't generate an error code if a QoS>0 message goes unacknowledged. Is that correct?

If so, I believe that's true---no error code will be generated, and unacknowledged QoS>0 messages will just loop through timeout+resending. I'm struggling to see how the client could identify such an even though. Under this event (an unacknownledged QoS>0 message) the client is supposed to continue resending the message until it hears back from the broker. Therefore, if I've understood correctly, I think this event would have to be handled at the application level. Have I understood correctly? If not, could you clarify what error event is missed/what the expected way to handle it is?

@vpetrigo
Copy link
Contributor Author

vpetrigo commented Dec 29, 2020

Hello @LiamBindle,

MQTT-C won't generate an error code if a QoS>0 message goes unacknowledged. Is that correct?
Yes, that is correct. I'd say that keep-alive messages also have such an issue, but we may put off focusing on that.

I try to provide as many details as I can right now:

  • QoS > 0 messages are considered as successfully published as soon a client receives an ACK from a broker

So, I think it is a good idea to silently push a QoS = 0 messages into client's queue and forget about that, while it should not be the case for QoS > 0 messages. We have explicit configuration to be set - response_timeout. For me it seems that in the case of no response in the given time, the client should return back to a caller and notify with an error instead of silently put them back into client's queue.

Handling of that case may be done on the app side, but it seems as duplicating responsibilities. We already have response_timeout tracking that just only increments the number_of_timeouts counter. The first straightforward way to handle that in the application with the current implementation is:

int rc = mqtt_publish(p_client, ...);
unsigned n_timeouts = p_client->number_of_timeouts;
unsigned start_timestamp = now();

while (now() - start_timestamp < response_timeout && p_client->number_of_timeouts ==n_timeouts)
{
   rc = mqtt_sync(p_client, ...);

   if (rc != MQTT_OK)
   {
      // handle error
   }
}

That leads us to wait for the whole defined response_timeout to be completely sure that a message we published has been delivered. That would increase delays in the app though.

Impact on keep-alive pings
That issue also relates to keep-alive pings. Right now we have explicit keep-alive interval we have to define during the client initialization. The issue is that as with QoS 1 there are no explicit error code upon calling mqtt_sync about keep-alive timeout. Usually there is a recommendation specified by MQTT services like AWS IoT, Azure IoT Hub, etc. to specify that as 80% of the keep-alive setting on the server side to avoid too frequent pinging as well as keeping connection alive.
In the case I specified in my first post - if a device can deliver messages to a broker, but does not receive back anything, the MQTT client keeps silently put all keep-alive ping messages back in the queue. I have the case with such behaviour and only the internet connection reset helps. If we miss a keep-alive ping delivery, most likely that the broker will disconnect a device. So, it seems to me that there is no need to issue new keep-alives into client's queue.
Current observed behaviour - in the case of no responses from the broker, keep-alive messages are created till the TX buffer is full and then mqtt_sync raises an TX_BUFFER_IS_FULL error.

On of the options I see right now:

  • update mqtt_publish to returns back message's PID to a caller for tracking
struct message_context {
    uint16_t pid;
}

mqtt_publish(struct mqtt_client *client,
                             const char* topic_name,
                             const void* application_message,
                             size_t application_message_size,
                             uint8_t publish_flags
                             struct message_context *msg_ctx)
  • for a QoS > 0 messages call publish_ack_callback that may have the same signature publish_response_callback when an ACK is received for the published message:
struct message_context on_the_fly_message;
// ...

void publish_ack_callback(..., uint16_t pid)
{
   if (pid == on_the_fly_message.pid)
   {
       // set `ack_arrive` or notify awaiting task somehow
   }
}

int rc = mqtt_publish(p_client, ..., &on_the_fly_message);
unsigned n_timeouts = p_client->number_of_timeouts;
unsigned start_timestamp = now();

while (now() - start_timestamp < response_timeout && ack_arrived)
{
   rc = mqtt_sync(p_client, ...);

   if (rc != MQTT_ACK_TIMEOUT_ERROR)
   {
      // handle error
   }
}

@LiamBindle
Copy link
Owner

@vpetrigo I would be in favor of a PR that added:

  • mqtt_publish2(...., uint16_t* pid): wrapper of mqtt_publish() which returns pid
  • user callback for handling timed out messages (that way users can define custom timeout handling, e.g., keep-alive timeouts, publish timeouts, etc; if this callback is NULL then it retains its current behaviour)

I think those two would provide the necessary flexibility. What do you think? Would you be able to make a PR for this (or similar)?

@LiamBindle
Copy link
Owner

Closing due to inactivity. Please feel free to reopen.

@arthurkomatsu
Copy link

Up on this as it would be very useful to know if a QOS 1 message timed out.

@LiamBindle LiamBindle reopened this Aug 12, 2021
@LiamBindle
Copy link
Owner

I'll reopen this.

I'm happy to merge a PR that implements this.

@vpetrigo
Copy link
Contributor Author

vpetrigo commented Oct 6, 2021

Hello everyone,
Sorry for huge delay, have too many things on my plate. Get back to that feature as soon as I have some spare time.

@LiamBindle
Copy link
Owner

@vpetrigo No worries at all---I totally understand. Same goes for me right now.

When you have something ready, I'm happy to review and merge it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants