Skip to content

Commit

Permalink
Ok/stingy conn (#1089)
Browse files Browse the repository at this point in the history
  • Loading branch information
olivakar authored and cartertinney committed Aug 28, 2023
1 parent 92b44b6 commit 0fb74d7
Show file tree
Hide file tree
Showing 5 changed files with 379 additions and 2 deletions.
3 changes: 3 additions & 0 deletions samples/how-to-guides/connect_retry_with_telemetry.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@
# license information.
# --------------------------------------------------------------------------


import random
import asyncio

Expand Down Expand Up @@ -59,6 +60,7 @@
logger = logging.getLogger(__name__)
logger.addHandler(sample_log_handler)


# The device connection string to authenticate the device with your IoT hub.
# Using the Azure CLI:
# az iot hub device-identity show-connection-string --hub-name {YourIoTHubName} --device-id MyNodeDevice --output table
Expand Down Expand Up @@ -165,6 +167,7 @@ async def run_sample(device_client):
if not encountered_no_error:
print("Fatal error encountered. Will exit the application...")
raise Exception

while True:
global connected_event
print("Client is connected {}".format(device_client.connected))
Expand Down
3 changes: 3 additions & 0 deletions samples/solutions/producer_consumer.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@ an initial value of INITIAL_SLEEP_TIME_BETWEEN_CONNS after which the interval be
geometrically. Once the sleep time reaches a upper threshold the application exits. All values are configurable and
customizable as per the scenario needs.


## WORKING APP

The application should work seamlessly and continuously as long as the customer does not exit the application.
Expand All @@ -28,6 +29,7 @@ on a timed rotating logging handler. So multiple of DEBUG and INFO files based o
The debug log files will be named like `debug.log.2023-01-04_11-28-49` and info log files will be named as
`info.log.2023-01-04_11-28-49` with the date and timestamp. The next debug and log files will be generated with names
like `debug.log.2023-01-04_12-28-49` and `info.log.2023-01-04_12-28-49` with a rotation interval of 1 hour.

The `sample.log` file will contain logging output only from the solution. The solution also prints similar texts onto the console for visual purposes.
Customer can modify the current logging and set it to a different level by changing one of the loggers.

Expand All @@ -54,6 +56,7 @@ In the event the application has stopped working for any error, it will establis
application whenever the network is back. Such intermittent disruptions are temporary and this is a
correct process of operation.


Any other cause of exception is not retryable. In case the application has stopped and exited,
the cause could be found out from the logs.

Expand Down
7 changes: 5 additions & 2 deletions samples/solutions/producer_consumer.py
Original file line number Diff line number Diff line change
Expand Up @@ -88,7 +88,7 @@ async def create_client(self, conn_str):
try:
# Create a Device Client
self.device_client = IoTHubDeviceClient.create_from_connection_string(
conn_str, keep_alive=20
conn_str
)
# Attach the connection state handler
self.device_client.on_connection_state_change = self.handle_on_connection_state_change
Expand All @@ -110,10 +110,10 @@ async def handle_on_connection_state_change(self):
self.log_info_and_print("Connected connected_event is set...")
self.disconnected_event.clear()
self.connected_event.set()

self.retry_increase_factor = 1
self.sleep_time_between_conns = INITIAL_SLEEP_TIME_BETWEEN_CONNS
self.try_number = 1

else:
self.log_info_and_print("Disconnected connected_event is set...")
self.disconnected_event.set()
Expand All @@ -122,6 +122,7 @@ async def handle_on_connection_state_change(self):
async def enqueue_message(self):
message_id = 0
while True:

message_id += 1
msg = Message("current wind speed ")
msg.message_id = message_id
Expand Down Expand Up @@ -247,7 +248,9 @@ async def main(self):
asyncio.create_task(self.enqueue_message()),
asyncio.create_task(self.if_disconnected_then_connect_with_retry()),
]

pending = []

try:
done, pending = await asyncio.wait(tasks, return_when=asyncio.FIRST_EXCEPTION)
await asyncio.gather(*done)
Expand Down
89 changes: 89 additions & 0 deletions samples/solutions/stingy_connection.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,89 @@
# -------------------------------------------------------------------------
# Copyright (c) Microsoft Corporation. All rights reserved.
# Licensed under the MIT License. See License.txt in the project root for
# license information.
# --------------------------------------------------------------------------

## CUSTOMER PERSONA
This application illustrates that connections are expensive and telemetry is only sent whenever connection is present.
Since connections are expensive, it is NOT necessary to keep track of lost messages. By any chance if connection is not
established due to some error the retry process happens for a fixed set of NUMBER_OF_TRIES. All connection failed
attempts for those NUMBER_OF_TRIES are retried starting with an initial value of INITIAL_SLEEP_TIME_BETWEEN_CONNS after
which the interval between each retry attempt increases geometrically. Meanwhile, telemetry messages are enqueued
inside a list at some random intervals. In the current sample connections are established every TIME_BETWEEN_CONNECTIONS
secs. Once connection is established all messages in the list are sent at once. In case message sending results in an
exception that batch of messages are discarded. Regardless of whether messages are successfully transmitted or
not the client is disconnected and waits for the next connection to be established.

## TESTING
Exceptions were thrown artificially and deliberately in the MQTT transport for random messages based on their id
in the following manner to check that the app works seamlessly. The application discarded messages during an exception
and moved on to the next batch.

```python
message_id = random.randrange(3, 1000, 1)
self.log_info_and_print("Id of message is: {}".format(message_id))
if message_id % 6 == 0:
msg = Message("message that must raise exception")
else:
msg = Message("current wind speed ")
```

## GARBAGE COLLECTION STATISTICS
This application has some garbage collection statistics which it displays from time to time.
The statistics look like below.

```commandline
GC stats are:-
collections -> 124
collected -> 3212
uncollectable -> 0
collections -> 11
collected -> 366
uncollectable -> 0
collections -> 1
collected -> 323
uncollectable -> 0
```
## WORKING APP

The application should work seamlessly and continuously as long as the customer does not exit the application.
The application can also raise an unrecoverable exception and exit itself.
In case of recoverable error where the network connection drops, the application should try to establish connection again.

The application has significant logging as well to check on progress and troubleshoot issues.

## APP SPECIFIC LOGS

Several log files will be generated as the application runs. The DEBUG and INFO logs are generated
on a timed rotating logging handler. So multiple of DEBUG and INFO files based on time-stamp will be generated.
The `sample.log` file will contain logging output only from the solution. The solution also prints similar texts onto the console for visual purposes.
Customer can modify the current logging and set it to a different level by changing one of the loggers.

## ADD LIBRARY SPECIFIC LOGGING

Customer can also add logging for example say into the MQTT Library Paho by doing
```python
paho_log_handler = logging.handlers.TimedRotatingFileHandler(
filename="{}/paho.log".format(LOG_DIRECTORY),
when="S",
interval=LOG_ROTATION_INTERVAL,
backupCount=LOG_BACKUP_COUNT,
)
paho_log_handler.setLevel(level=logging.DEBUG)
paho_log_handler.setFormatter(log_formatter)
paho_logger = logging.getLogger("paho")
paho_logger.addHandler(paho_log_handler)
```

## TROUBLESHOOTING TIPS
Currently, whenever connection drops due it is considered to be recoverable, and it is retried for a fixed set of times.

In the event the application has stopped working for any of the above errors, it will establish connection on its own
and resume the application whenever the network is back. Such intermittent disruptions are temporary and this is
a correct process of operation.

In case the application has stopped and exited, the cause could be found out from the logs.



Loading

0 comments on commit 0fb74d7

Please sign in to comment.