Restart unhealthy containers #124

MRuecklCC · 2022-06-23T11:04:06Z

Resolves #123 via autoheal sidecar.

docker-compose.yml

MRuecklCC · 2022-06-23T11:06:12Z

src/metalookup/core/metadata_manager.py

@@ -103,6 +103,8 @@ async def extract(self, message: Input, extra: bool) -> Output:
            self.logger.info(f"Built WebsiteData object in {t():5.2f}s.")
        except ClientConnectorError as e:
            raise HTTPException(status_code=502, detail=f"Could not get HAR from splash: {e}")
+        except asyncio.exceptions.TimeoutError:


eventually we want to catch an even broader exception here to avoid further crashes?

Which broader exception do you have in mind? Exception itself?

Yeah, that would probably be the only broader option (asyncio doesn't provide its internal top-level Exception base class)

Are you implementing it now or in another PR?

RobertMeissner

This PR feels half-done and slightly surprising to me.

RobertMeissner · 2022-06-27T05:00:05Z

docker-compose.yml

@@ -1,6 +1,20 @@
 version: '3.4'

 services:
+
+  # As docker-compose does not provide an automatic mechanism for restarting unhealthy containers


What? Are you sure about this? What about this "restart: always" option mentioned in all containers? And I had the impression unhealthy containers were indeed restarted due to that option

yeah, I'm sure about it, docker(-compose) only checks the health but does nothing in case of an unhealthy container. In fact, I think the restart option probably can be removed (should be managed via the labels)

I think the restart: always option is meant for another scenario: I assume it controls the behavior if e.g. the host or docker-daemon is restarted, or the container really terminates for some unknown reason. However, in our case, we want the container to restart if it becomes unhealthy which is not exactly the same thing :-/

Will it still restart if the service terminates, e.g., due to some internal error?

I will leave the restart: always as it should cover scenarios where the container terminates (because the entrypoint process has terminated for whatever reason).
However, the autoheal sidecar will now also gracefully handle the scenario, where the container has not terminated (e.g. because the process did not cleanly terminate, e.g. because of deadlock, dangling non-daemon processes, etc).

docker-compose.yml

RobertMeissner · 2022-06-27T05:05:40Z

metadata-picker.yml

@@ -1,6 +1,7 @@
 version: '3.3'

 services:
+  # fixme: Add auto-heal container similar to dev docker-compose.yml


Why are you not doing this in the current branch? Other suggestion: can you use metadata-picker instead of docker-compose to have one file only?

Thats why it's not fixed! I wasn't sure why there are two docker-compose files doing more or less the same thing!

removed this file (together with a bunch of other no longer used docker related stuff)

Please double-check with the production instance of MetaLookup - as far as I know that machine uses e.g. metadata-picker yml and the restart_from_hook.sh.

RobertMeissner · 2022-06-27T05:06:55Z

restart_from_hook.sh

@@ -1,4 +0,0 @@
-#!/bin/bash
-echo "Restarting from hook"


Why did you delete this file?

Becasue it wasn't used anywhere!

Please double-check with the production instance of MetaLookup - as far as I know that machine uses e.g. metadata-picker yml and the restart_from_hook.sh.

RobertMeissner · 2022-06-27T05:07:22Z

src/metalookup/core/metadata_manager.py

@@ -103,6 +103,8 @@ async def extract(self, message: Input, extra: bool) -> Output:
            self.logger.info(f"Built WebsiteData object in {t():5.2f}s.")
        except ClientConnectorError as e:
            raise HTTPException(status_code=502, detail=f"Could not get HAR from splash: {e}")
+        except asyncio.exceptions.TimeoutError:


Which broader exception do you have in mind? Exception itself?

MRuecklCC · 2022-07-18T11:53:42Z

I will shelve this PR for now, as it is unclear to me how the production deployment would work (traefik etc). Once we get to deploying the new changes to prod, we can probably use this PR and adapt it to also suit the production needs.

RobertMeissner · 2022-07-18T12:08:54Z

I will shelve this PR for now, as it is unclear to me how the production deployment would work (traefik etc). Once we get to deploying the new changes to prod, we can probably use this PR and adapt it to also suit the production needs.

Sounds good. What needs to be done to deploy to production? Do you need insights into traefik? Is something else missing?

MRuecklCC · 2022-07-19T15:43:01Z

Part of this is now present in #144 . Once #144 is merged, rebase this PR and then we can do another review!

- Use a sidecar container (autoheal) that has access to the hosts docker socket to restart unhealthy containers. - Remove obsolete restart_from_hook files

MRuecklCC requested a review from RobertMeissner June 23, 2022 11:04

MRuecklCC commented Jun 23, 2022

View reviewed changes

docker-compose.yml Outdated Show resolved Hide resolved

MRuecklCC commented Jun 23, 2022

View reviewed changes

docker-compose.yml Outdated Show resolved Hide resolved

MRuecklCC commented Jun 23, 2022

View reviewed changes

MRuecklCC mentioned this pull request Jun 23, 2022

Catch timeouts in cache warmup #126

Merged

MRuecklCC changed the title ~~123 restart unhealthy containers~~ Restart unhealthy containers Jun 23, 2022

MRuecklCC mentioned this pull request Jun 23, 2022

Not found or gone content #127

Merged

RobertMeissner requested changes Jun 27, 2022

View reviewed changes

MRuecklCC force-pushed the 123-restart-unhealthy-containers branch 3 times, most recently from 42b1d32 to 3eee73f Compare June 29, 2022 14:13

MRuecklCC force-pushed the 123-restart-unhealthy-containers branch from 3eee73f to a193bf4 Compare July 18, 2022 10:27

MRuecklCC force-pushed the 123-restart-unhealthy-containers branch from a193bf4 to c119499 Compare July 19, 2022 14:48

MRuecklCC changed the base branch from dev to main July 19, 2022 15:42

MRuecklCC force-pushed the 123-restart-unhealthy-containers branch 3 times, most recently from a5eb358 to 8392c3f Compare July 26, 2022 11:44

MRuecklCC added 3 commits August 8, 2022 13:35

setup autoheal for unhealth docker containers

300515b

- Use a sidecar container (autoheal) that has access to the hosts docker socket to restart unhealthy containers. - Remove obsolete restart_from_hook files

rephrase docker-compose comments

adbfcf8

add autoheal setup to prod docker-compose

9974377

MRuecklCC force-pushed the 123-restart-unhealthy-containers branch from 8392c3f to 9974377 Compare August 8, 2022 11:37

MRuecklCC force-pushed the main branch 6 times, most recently from 4c8c95c to 5d83f67 Compare September 20, 2022 08:48

MRuecklCC force-pushed the main branch 10 times, most recently from 3da3841 to b8869e0 Compare September 21, 2022 08:59

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Restart unhealthy containers #124

Restart unhealthy containers #124

MRuecklCC commented Jun 23, 2022

MRuecklCC Jun 23, 2022 •

edited

Loading

RobertMeissner Jun 27, 2022

MRuecklCC Jun 27, 2022

RobertMeissner Jun 27, 2022

RobertMeissner left a comment

RobertMeissner Jun 27, 2022

MRuecklCC Jun 27, 2022

MRuecklCC Jun 27, 2022

RobertMeissner Jun 27, 2022

MRuecklCC Jul 18, 2022

RobertMeissner Jun 27, 2022

MRuecklCC Jun 27, 2022

MRuecklCC Jun 27, 2022

RobertMeissner Jun 27, 2022

RobertMeissner Jun 27, 2022

MRuecklCC Jun 27, 2022

RobertMeissner Jun 27, 2022

RobertMeissner Jun 27, 2022

MRuecklCC commented Jul 18, 2022

RobertMeissner commented Jul 18, 2022

MRuecklCC commented Jul 19, 2022

Restart unhealthy containers #124

Are you sure you want to change the base?

Restart unhealthy containers #124

Conversation

MRuecklCC commented Jun 23, 2022

MRuecklCC Jun 23, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

RobertMeissner left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

MRuecklCC commented Jul 18, 2022

RobertMeissner commented Jul 18, 2022

MRuecklCC commented Jul 19, 2022

MRuecklCC Jun 23, 2022 •

edited

Loading