No error in the case a QoS 1 message sending is timed out #119

vpetrigo · 2020-12-23T14:42:43Z

Hello,

I have met an issue that the library right now does not report an error if a message with QoS 1 (that should be valid for QoS 2 as well, but I have not checked that) sending meets command timeout.
As we can see there in the case of command timeout __mqtt_send() just marks a message to be resent in the future with no error indication to the caller.
I would like to describe a case where it may introduce some problems:

send a QoS 1 message to the MQTT broker
await for acknowledge from the broker (see note 1)
in the case of no acknowledge, preserve message's content somewhere to resend after 2 hours

Same effect may be observer with keep-alive pings that have not been confirmed with a PINGRESP from the broker.

Note 1
Right now it seems there are no way to explicitly wait for a message delivery with ACK. Potential workaround there is to call mqtt_sync() after a QoS 1 message send and wait for response_timeout to see whether client.number_of_timeouts increments or not.

Current behaviour example

connect to the MQTT broker in non-blocking mode
block server responses delivery (e.g. with iptables: iptables -A INPUT -s <broker IP> -j DROP)
QoS 1 messages can be sent to the broker, but there won't be neither ACK nor messages from the broker
keep-alive pings also do not lead to response from the broker
no notification from the library about error due to missed keep-alive messages
no notification from the library that the QoS 1 messages have been sent to the broker
one of the error that may occur - MQTT_ERROR_SEND_BUFFER_IS_FULL when a TX buffer is full of not delivered messages which may be handled as a signal for reconnection attempt to the broker

The text was updated successfully, but these errors were encountered:

LiamBindle · 2020-12-28T15:42:56Z

Hi @vpetrigo, thanks for raising this. I'm not sure if I quite follow though. If I understand correctly, the problem you've identified is MQTT-C won't generate an error code if a QoS>0 message goes unacknowledged. Is that correct?

If so, I believe that's true---no error code will be generated, and unacknowledged QoS>0 messages will just loop through timeout+resending. I'm struggling to see how the client could identify such an even though. Under this event (an unacknownledged QoS>0 message) the client is supposed to continue resending the message until it hears back from the broker. Therefore, if I've understood correctly, I think this event would have to be handled at the application level. Have I understood correctly? If not, could you clarify what error event is missed/what the expected way to handle it is?

vpetrigo · 2020-12-29T22:08:40Z

Hello @LiamBindle,

MQTT-C won't generate an error code if a QoS>0 message goes unacknowledged. Is that correct?
Yes, that is correct. I'd say that keep-alive messages also have such an issue, but we may put off focusing on that.

I try to provide as many details as I can right now:

QoS > 0 messages are considered as successfully published as soon a client receives an ACK from a broker

So, I think it is a good idea to silently push a QoS = 0 messages into client's queue and forget about that, while it should not be the case for QoS > 0 messages. We have explicit configuration to be set - response_timeout. For me it seems that in the case of no response in the given time, the client should return back to a caller and notify with an error instead of silently put them back into client's queue.

Handling of that case may be done on the app side, but it seems as duplicating responsibilities. We already have response_timeout tracking that just only increments the number_of_timeouts counter. The first straightforward way to handle that in the application with the current implementation is:

int rc = mqtt_publish(p_client, ...);
unsigned n_timeouts = p_client->number_of_timeouts;
unsigned start_timestamp = now();

while (now() - start_timestamp < response_timeout && p_client->number_of_timeouts ==n_timeouts)
{
   rc = mqtt_sync(p_client, ...);

   if (rc != MQTT_OK)
   {
      // handle error
   }
}

That leads us to wait for the whole defined response_timeout to be completely sure that a message we published has been delivered. That would increase delays in the app though.

Impact on keep-alive pings
That issue also relates to keep-alive pings. Right now we have explicit keep-alive interval we have to define during the client initialization. The issue is that as with QoS 1 there are no explicit error code upon calling mqtt_sync about keep-alive timeout. Usually there is a recommendation specified by MQTT services like AWS IoT, Azure IoT Hub, etc. to specify that as 80% of the keep-alive setting on the server side to avoid too frequent pinging as well as keeping connection alive.
In the case I specified in my first post - if a device can deliver messages to a broker, but does not receive back anything, the MQTT client keeps silently put all keep-alive ping messages back in the queue. I have the case with such behaviour and only the internet connection reset helps. If we miss a keep-alive ping delivery, most likely that the broker will disconnect a device. So, it seems to me that there is no need to issue new keep-alives into client's queue.
Current observed behaviour - in the case of no responses from the broker, keep-alive messages are created till the TX buffer is full and then mqtt_sync raises an TX_BUFFER_IS_FULL error.

On of the options I see right now:

update mqtt_publish to returns back message's PID to a caller for tracking

struct message_context {
    uint16_t pid;
}

mqtt_publish(struct mqtt_client *client,
                             const char* topic_name,
                             const void* application_message,
                             size_t application_message_size,
                             uint8_t publish_flags
                             struct message_context *msg_ctx)

for a QoS > 0 messages call publish_ack_callback that may have the same signature publish_response_callback when an ACK is received for the published message:

struct message_context on_the_fly_message;
// ...

void publish_ack_callback(..., uint16_t pid)
{
   if (pid == on_the_fly_message.pid)
   {
       // set `ack_arrive` or notify awaiting task somehow
   }
}

int rc = mqtt_publish(p_client, ..., &on_the_fly_message);
unsigned n_timeouts = p_client->number_of_timeouts;
unsigned start_timestamp = now();

while (now() - start_timestamp < response_timeout && ack_arrived)
{
   rc = mqtt_sync(p_client, ...);

   if (rc != MQTT_ACK_TIMEOUT_ERROR)
   {
      // handle error
   }
}

LiamBindle · 2021-01-18T15:29:19Z

@vpetrigo I would be in favor of a PR that added:

mqtt_publish2(...., uint16_t* pid): wrapper of mqtt_publish() which returns pid
user callback for handling timed out messages (that way users can define custom timeout handling, e.g., keep-alive timeouts, publish timeouts, etc; if this callback is NULL then it retains its current behaviour)

I think those two would provide the necessary flexibility. What do you think? Would you be able to make a PR for this (or similar)?

LiamBindle · 2021-07-20T12:50:01Z

Closing due to inactivity. Please feel free to reopen.

arthurkomatsu · 2021-08-11T12:56:39Z

Up on this as it would be very useful to know if a QOS 1 message timed out.

LiamBindle · 2021-08-12T14:07:11Z

I'll reopen this.

I'm happy to merge a PR that implements this.

vpetrigo · 2021-10-06T13:47:23Z

Hello everyone,
Sorry for huge delay, have too many things on my plate. Get back to that feature as soon as I have some spare time.

LiamBindle · 2021-11-08T17:37:30Z

@vpetrigo No worries at all---I totally understand. Same goes for me right now.

When you have something ready, I'm happy to review and merge it.

LiamBindle closed this as completed Jul 20, 2021

LiamBindle reopened this Aug 12, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

No error in the case a QoS 1 message sending is timed out #119

No error in the case a QoS 1 message sending is timed out #119

vpetrigo commented Dec 23, 2020

LiamBindle commented Dec 28, 2020 •

edited

Loading

vpetrigo commented Dec 29, 2020 •

edited

Loading

LiamBindle commented Jan 18, 2021

LiamBindle commented Jul 20, 2021

arthurkomatsu commented Aug 11, 2021

LiamBindle commented Aug 12, 2021

vpetrigo commented Oct 6, 2021

LiamBindle commented Nov 8, 2021

No error in the case a QoS 1 message sending is timed out #119

No error in the case a QoS 1 message sending is timed out #119

Comments

vpetrigo commented Dec 23, 2020

LiamBindle commented Dec 28, 2020 • edited Loading

vpetrigo commented Dec 29, 2020 • edited Loading

LiamBindle commented Jan 18, 2021

LiamBindle commented Jul 20, 2021

arthurkomatsu commented Aug 11, 2021

LiamBindle commented Aug 12, 2021

vpetrigo commented Oct 6, 2021

LiamBindle commented Nov 8, 2021

LiamBindle commented Dec 28, 2020 •

edited

Loading

vpetrigo commented Dec 29, 2020 •

edited

Loading