Note: This is a fork of the original gcs-cacher repository. Here are some notable changes:
- Preserve symbolic link and hard link inside an archive
- Use archiver library instead of manually creating tar.gz files
- Compress using tar.zst instead of tar.gz for better performance and compression rate
All usages are same as the original tool. To use this tool on Shakr repo's GitHub Actions workflow, refer to the actions runner repository.
GCS Cacher is a small CLI and Docker container that saves and restores caches on Google Cloud Storage. It is intended to be used in CI/CD systems like Cloud Build, but may have applications elsewhere.
-
Create a new Cloud Storage bucket. Alternatively, you can use an existing Cloud Storage bucket. To automatically clean up the cache after a certain period of time, set a lifecycle policy.
-
Create a cache:
gcs-cacher -bucket "my-bucket" -cache "go-mod" -dir "$GOPATH/pkg/mod"
This will compress and upload the contents at
pkg/mod
to Google Cloud Storage at the key "go-mod". -
Restore a cache:
gcs-cacher -bucket "my-bucket" -restore "go-mod" -dir "$GOPATH/pkg/mod"
This will download the Google Cloud Storage object named "go-mod" and decompress it to
pkg/mod
.
Choose from one of the following:
-
Download the latest version from the releases.
-
Use a pre-built Docker container:
us-docker.pkg.dev/vargolabs/gcs-cacher/gcs-cacher docker.pkg.github.com/sethvargo/gcs-cacher/gcs-cacher
When saving the cache, the provided directory is made into a tarball, then gzipped, then uploaded to Google Cloud Storage. When restoring the cache, the reverse happens.
It's strongly recommend that you use a cache key based on your dependency file, and restore up the chain. For example:
gcs-cacher \
-bucket "my-bucket" \
-cache "ruby-{{ hashGlob "Gemfile.lock" }}"
gcs-cacher \
-bucket "my-bucket" \
-restore "ruby-{{ hashGlob "Gemfile.lock" }}"
-restore "ruby-"
This will maximize cache hits.
It is strongly recommended that you enable a lifecycle rule on your cache bucket! This will automatically purge stale entities and keep costs lower.
The primary use case is to cache large and/or expensive dependency trees like a Ruby vendor directory or a Go module cache as part of a CI/CD step. Downloading a compressed, packaged archive is often much faster than a full dependency resolution. It has an unintended benefit of also reducing dependencies on external build systems.
Why not just use gsutil?
That's a great question. In fact, there's already a cloud builder
that uses gsutil
to accomplish similar things. However, that approach has a
few drawbacks:
-
It doesn't work with large files because containers don't package the crc package. If you're cache is > 500mb it will fail. GCS Cacher does not have this same limitation.
-
You have to build, publish, and manage the container to your own project. We publish pre-compiled binaries and Docker containers from multiple registries. You're still free to build it yourself, but you don't have to.
-
The container image itself is huge. It's nearly 1GB in size. The gcs-cacher container is just a few MBs. Since we're optimzing for build speed, container size is important.
-
It's actually really hard to get the fallback key logic correct in bash. There are some subtle edge cases (like when your filename contains a
$
) where this approach completely fails. -
Despite supporting parallel uploads, that cacher is still ~3.2x slower than GCS Cacher.