Ok/stingy conn (#1089)

Azure · Aug 28, 2023 · 0fb74d7 · 0fb74d7
1 parent 92b44b6
commit 0fb74d7
Show file tree

Hide file tree

Showing 5 changed files with 379 additions and 2 deletions.
diff --git a/samples/how-to-guides/connect_retry_with_telemetry.py b/samples/how-to-guides/connect_retry_with_telemetry.py
@@ -4,6 +4,7 @@
 # license information.
 # --------------------------------------------------------------------------
 
+
 import random
 import asyncio
 
@@ -59,6 +60,7 @@
 logger = logging.getLogger(__name__)
 logger.addHandler(sample_log_handler)
 
+
 # The device connection string to authenticate the device with your IoT hub.
 # Using the Azure CLI:
 # az iot hub device-identity show-connection-string --hub-name {YourIoTHubName} --device-id MyNodeDevice --output table
@@ -165,6 +167,7 @@ async def run_sample(device_client):
     if not encountered_no_error:
         print("Fatal error encountered. Will exit the application...")
         raise Exception
+
     while True:
         global connected_event
         print("Client is connected {}".format(device_client.connected))

diff --git a/samples/solutions/producer_consumer.md b/samples/solutions/producer_consumer.md
@@ -13,6 +13,7 @@ an initial value of INITIAL_SLEEP_TIME_BETWEEN_CONNS after which the interval be
 geometrically. Once the sleep time reaches a upper threshold the application exits. All values are configurable and 
 customizable as per the scenario needs.
 
+
 ## WORKING APP
 
 The application should work seamlessly and continuously as long as the customer does not exit the application. 
@@ -28,6 +29,7 @@ on a timed rotating logging handler. So multiple of DEBUG and INFO files based o
 The debug log files will be named like `debug.log.2023-01-04_11-28-49` and info log files will be named as 
 `info.log.2023-01-04_11-28-49` with the date and timestamp. The next debug and log files will be generated with names 
 like `debug.log.2023-01-04_12-28-49` and `info.log.2023-01-04_12-28-49` with a rotation interval of 1 hour.
+
 The `sample.log` file will contain logging output only from the solution. The solution also prints similar texts onto the console for visual purposes.
 Customer can modify the current logging and set it to a different level by changing one of the loggers.
 
@@ -54,6 +56,7 @@ In the event the application has stopped working for any error, it will establis
 application whenever the network is back. Such intermittent disruptions are temporary and this is a 
 correct process of operation.
 
+
 Any other cause of exception is not retryable. In case the application has stopped and exited,
 the cause could be found out from the logs. 
 

diff --git a/samples/solutions/producer_consumer.py b/samples/solutions/producer_consumer.py
@@ -88,7 +88,7 @@ async def create_client(self, conn_str):
         try:
             # Create a Device Client
             self.device_client = IoTHubDeviceClient.create_from_connection_string(
-                conn_str, keep_alive=20
+                conn_str
             )
             # Attach the connection state handler
             self.device_client.on_connection_state_change = self.handle_on_connection_state_change
@@ -110,10 +110,10 @@ async def handle_on_connection_state_change(self):
             self.log_info_and_print("Connected connected_event is set...")
             self.disconnected_event.clear()
             self.connected_event.set()
+
             self.retry_increase_factor = 1
             self.sleep_time_between_conns = INITIAL_SLEEP_TIME_BETWEEN_CONNS
             self.try_number = 1
-
         else:
             self.log_info_and_print("Disconnected connected_event is set...")
             self.disconnected_event.set()
@@ -122,6 +122,7 @@ async def handle_on_connection_state_change(self):
     async def enqueue_message(self):
         message_id = 0
         while True:
+
             message_id += 1
             msg = Message("current wind speed ")
             msg.message_id = message_id
@@ -247,7 +248,9 @@ async def main(self):
             asyncio.create_task(self.enqueue_message()),
             asyncio.create_task(self.if_disconnected_then_connect_with_retry()),
         ]
+
         pending = []
+
         try:
             done, pending = await asyncio.wait(tasks, return_when=asyncio.FIRST_EXCEPTION)
             await asyncio.gather(*done)

diff --git a/samples/solutions/stingy_connection.md b/samples/solutions/stingy_connection.md
@@ -0,0 +1,89 @@
+# -------------------------------------------------------------------------
+# Copyright (c) Microsoft Corporation. All rights reserved.
+# Licensed under the MIT License. See License.txt in the project root for
+# license information.
+# --------------------------------------------------------------------------
+
+## CUSTOMER PERSONA
+This application illustrates that connections are expensive and telemetry is only sent whenever connection is present.
+Since connections are expensive, it is NOT necessary to keep track of lost messages. By any chance if connection is not 
+established due to some error the retry process happens for a fixed set of NUMBER_OF_TRIES. All connection failed 
+attempts for those NUMBER_OF_TRIES are retried starting with an initial value of INITIAL_SLEEP_TIME_BETWEEN_CONNS after 
+which the interval between each retry attempt increases geometrically. Meanwhile, telemetry messages are enqueued 
+inside a list at some random intervals. In the current sample connections are established every TIME_BETWEEN_CONNECTIONS
+secs. Once connection is established all messages in the list are sent at once. In case message sending results in an 
+exception that batch of messages are discarded. Regardless of whether messages are successfully transmitted or 
+not the client is disconnected and waits for the next connection to be established.
+
+## TESTING
+Exceptions were thrown artificially and deliberately in the MQTT transport for random messages based on their id 
+in the following manner to check that the app works seamlessly. The application discarded messages during an exception 
+and moved on to the next batch.
+
+```python
+message_id = random.randrange(3, 1000, 1)
+self.log_info_and_print("Id of message is: {}".format(message_id))
+if message_id % 6 == 0:
+    msg = Message("message that must raise exception")
+else:
+    msg = Message("current wind speed ")
+```
+
+## GARBAGE COLLECTION STATISTICS
+This application has some garbage collection statistics which it displays from time to time. 
+The statistics look like below.
+
+```commandline
+GC stats are:-
+collections -> 124
+collected -> 3212
+uncollectable -> 0
+collections -> 11
+collected -> 366
+uncollectable -> 0
+collections -> 1
+collected -> 323
+uncollectable -> 0
+```
+## WORKING APP
+
+The application should work seamlessly and continuously as long as the customer does not exit the application. 
+The application can also raise an unrecoverable exception and exit itself. 
+In case of recoverable error where the network connection drops, the application should try to establish connection again.
+
+The application has significant logging as well to check on progress and troubleshoot issues. 
+
+## APP SPECIFIC LOGS
+
+Several log files will be generated as the application runs. The DEBUG and INFO logs are generated 
+on a timed rotating logging handler. So multiple of DEBUG and INFO files based on time-stamp will be generated. 
+The `sample.log` file will contain logging output only from the solution. The solution also prints similar texts onto the console for visual purposes.
+Customer can modify the current logging and set it to a different level by changing one of the loggers.
+
+## ADD LIBRARY SPECIFIC LOGGING
+
+Customer can also add logging for example say into the MQTT Library Paho by doing 
+```python
+paho_log_handler = logging.handlers.TimedRotatingFileHandler(
+    filename="{}/paho.log".format(LOG_DIRECTORY),
+    when="S",
+    interval=LOG_ROTATION_INTERVAL,
+    backupCount=LOG_BACKUP_COUNT,
+)
+paho_log_handler.setLevel(level=logging.DEBUG)
+paho_log_handler.setFormatter(log_formatter)
+paho_logger = logging.getLogger("paho")
+paho_logger.addHandler(paho_log_handler)
+```
+
+## TROUBLESHOOTING TIPS
+Currently, whenever connection drops due it is considered to be recoverable, and it is retried for a fixed set of times.
+
+In the event the application has stopped working for any of the above errors, it will establish connection on its own 
+and resume the application whenever the network is back. Such intermittent disruptions are temporary and this is 
+a correct process of operation.
+
+In case the application has stopped and exited, the cause could be found out from the logs. 
+
+
+