Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement policy for managing docker build caches #3797

Open
sxa opened this issue Nov 4, 2024 · 5 comments
Open

Implement policy for managing docker build caches #3797

sxa opened this issue Nov 4, 2024 · 5 comments
Assignees
Labels

Comments

@sxa
Copy link
Member

sxa commented Nov 4, 2024

Seen recently on test-azure-ubuntu2404-x64-1 but this is also a follow-on to #3007

https://docs.docker.com/build/cache/garbage-collection/ has some information on how the cache can be maintained. We should ensure that the amount of space used by docker is managed appropriately. I believe we have implemented some regular cleanups on some machines and we should ensure that we deploy this more universally - likely as part of the adoptopenjdk tags in the UNIX playbook. The test jobs are increasingly making use of docker, and so a certain amount of build up will occur and this needs to be managed.

@steelhead31
Copy link
Contributor

Im currently writing a docker housekeeping script to help maintain the size of the overlay2.

@steelhead31
Copy link
Contributor

Im currently writing up a script to do some automated housekeeping on machines that use docker ( to try and ensure the builder cache is kept at a manageable volume ), and that the overlay2 directory is periodically housekept, as it has a tendency to grow and isnt managed by the normal "docker system prune" commands...  currently I plan to have a script that behaves as follows :

  1. Looks for any images that havent been used in the last 6 months
  2. Deletes any containers based on the images identified in 1.
  3. Deletes the images identified in 1.
  4. Prunes the docker builder cache by deleting everything older than 14 days
  5. Prune any unused images
  6. Prune any unused containers ( older than 7 days )
  7. Prune any docker volumes ( that arent in use )
  8. Prune any docker networks that arent in use

Does anybody see any issues with this, the initial version of the script will allow it to run in "reporting" mode, and highlight what it plans to delete, so we can do some shakedown testing ahead of implementation.

@steelhead31
Copy link
Contributor

Running my proposed script on test-ibmcloud-rhel7-x64-1 identifies the following 👍 

Cutoff date: 2024-05-01 (Wed May  1 00:00:00 CDT 2024)
Mode: Listing images and containers that would be deleted.
Would delete: alpine:3.14 (Created on: 2023-03-29 13:19:37 -0500 CDT)
Would delete: rabbitmq:3.9-management (Created on: 2023-03-24 13:45:07 -0500 CDT)
Would delete: postgres:14.1 (Created on: 2022-01-26 19:10:09 -0600 CST)
Would delete: apicurio/apicurio-registry-mem:2.1.5.Final (Created on: 2021-12-22 12:00:15 -0600 CST)
Would delete: neo4j:4.0.0 (Created on: 2021-12-22 09:12:51 -0600 CST)
Would delete: quay.io/keycloak/keycloak-x:16.1.0 (Created on: 2021-12-20 10:19:34 -0600 CST)
Would delete: vectorized/redpanda:v21.11.2 (Created on: 2021-12-09 14:00:22 -0600 CST)
Would delete: confluentinc/cp-kafka:5.4.3 (Created on: 2021-10-15 17:38:29 -0500 CDT)
Would delete: testcontainers/ryuk:0.3.3 (Created on: 2021-10-14 03:34:37 -0500 CDT)
Would delete: centos:7 (Created on: 2021-09-15 13:20:23 -0500 CDT)
Would delete: localstack/localstack:0.12.17 (Created on: 2021-08-27 13:15:05 -0500 CDT)
Would delete: consul:1.7 (Created on: 2021-05-07 15:20:31 -0500 CDT)
Would delete: quay.io/artemiscloud/activemq-artemis-broker:0.1.2 (Created on: 2021-03-12 00:20:31 -0600 CST)
Would delete: quay.io/infinispan-test/ryuk:0.3.0 (Created on: 2020-06-06 02:29:06 -0500 CDT)
Would delete: mongo:4.2.6 (Created on: 2020-04-24 17:00:49 -0500 CDT)

@sxa
Copy link
Member Author

sxa commented Nov 26, 2024

Does anybody see any issues with this, the initial version of the script will allow it to run in "reporting" mode, and highlight what it plans to delete, so we can do some shakedown testing ahead of implementation.

SGTM. We should definitely wait until Haroon is back before implementing automated cleanups to have the discussion with him too. There may also currently be some issues with doing this on the s390x machines that we need to be careful on.

@steelhead31
Copy link
Contributor

Will do, we shoud probably have a discussion on his return, I've a couple of ideas its probably worth discussing, in the meantime, I'm putting a PR in for a new plugin I've written for Nagios... the initial version is visible in nagios here..

https://nagios.adoptopenjdk.net/nagios/cgi-bin/extinfo.cgi?type=2&host=test-ibmcloud-rhel7-x64-1&service=Docker+Overlay2+Size

it defaults to warn at 30Gb and critical at 40gb in size, however it is parameterised so the thresholds can be tuned on a per host basis in the nagios config.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Status: In Progress
Development

No branches or pull requests

2 participants