-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
chore(ci): support additional paths in the build cache key #11341
Conversation
8813b4c
to
8324ce1
Compare
@@ -8,6 +8,9 @@ inputs: | |||
description: 'String prefix applied to the build cache key' | |||
required: false | |||
default: 'build' | |||
paths: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
taking a step back, as a generic cache key, it's not necessary composed of file hashes. what if we just omit kong/**
from build-cache-key workflow, and in release.yml we will just
key: ${{ steps.cache-key.outputs.cache-key }}-${{ hashFiles('kong/**') }}
?
also it reminds me that we deliberately remove .github/**
from the cache key of release.yml https://github.com/Kong/kong-ee/blob/1e98cb5a3fd8c15a3a8b11d1b16389eff9910787/.github/workflows/release.yml#L209. the cache is only used in PR, not for official releases, and in a PR when we are testing the workflow, it would save us lot of time as those files (generally) won't change the artifact being generated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
taking a step back, as a generic cache key, it's not necessary composed of file hashes. what if we just omit
kong/**
from build-cache-key workflow, and in release.yml we will justkey: ${{ steps.cache-key.outputs.cache-key }}-${{ hashFiles('kong/**') }}
This is a good point. The gripe I still have with this is that it encourages more copy/paste, and sooner or later somebody will see hashFiles()
in there and take this as their cue to add another file/glob. Eventually we're back to where we started with a super long line that is hard to read and painful to fix when there are merge conflicts:
key: ${{ steps.cache-key.outputs.cache-key }}-${{ hashFiles('kong/**', 'kong-extra/**', 'my-build-file.ext', 'my-other-directory/*/*.txt') }}
Callers of the action could achieve the same by adding hashFiles()
to the prefix
input option:
- name: Generate cache key
id: cache-key
uses: ./.github/actions/build-cache-key
with:
prefix: build::${{ hashFiles('kong/**') }}
It's somewhat better, but we still have the problem where new paths are added, and eventually it becomes a super long line.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it possible to just call hashFiles
in Generate cache key
? I'm mostly concerned about the fact that we are implementing our own method of hash and it seems bit complex.
something like sha256sum ( ${{ hashFiles(preset path1) }}-${{ hashFiles(preset path2) }}-${{ hashFiles(custom path1) }} )
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm mostly concerned about the fact that we are implementing our own method of hash and it seems bit complex.
The primary reason for not using hashFiles()
is that GHA's expression language doesn't really support non-primitive inputs. If we had array input structures we could do something like:
- name: Generate cache key
id: cache-key
uses: ./.github/actions/build-cache-key
with:
paths:
- kong/**
- foo.txt
...and then call hashFiles(inputs.paths)
somewheres. But, yeah, here we are =/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how about something like
- name: Generate cache key
id: cache-key
uses: ./.github/actions/build-cache-key
with:
suffixes:
- ${{ hashFiles('kong/**') }}
- ${{ hashFiles('foo.txt') }}
then we can even have
- name: Generate cache key
id: cache-key
uses: ./.github/actions/build-cache-key
with:
suffixes:
- fips
i know we want to make this easy to use, but the cost of maintaining a new hashFiles implementation is also what we need to consider.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem is that action inputs cannot be arrays though, only primitives: string, boolean, number.
Check out the latest iteration and let me know what you think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i see, in that case, do we still need to compute filehashs and sort them?
FILE_HASHES=(
${{ hashFiles('.bazelignore') }}
${{ hashFiles('.bazelrc') }}
${{ hashFiles('.bazelversion') }}
${{ hashFiles('.github/actions/build-cache-key/**') }}
${{ hashFiles('.github/workflows/build.yml') }}
${{ hashFiles('.requirements') }}
${{ hashFiles('BUILD.bazel') }}
${{ hashFiles('WORKSPACE') }}
${{ hashFiles('bin/kong') }}
${{ hashFiles('bin/kong-health') }}
${{ hashFiles('build/**') }}
${{ hashFiles('kong-*.rockspec') }}
${{ hashFiles('kong.conf.default') }}
${{ hashFiles('kong/**') }}
)
could be a single
${{ hashFiles('.bazelignore', '.bazelrc', ...)) }}
and we don't need all those bash logic, so only
sha256sum(hashFiles("default files") + "." + inputs.extra))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I find this to be much more readable:
FILE_HASHES=(
${{ hashFiles('.bazelignore') }}
${{ hashFiles('.bazelrc') }}
${{ hashFiles('.bazelversion') }}
${{ hashFiles('.github/actions/build-cache-key/**') }}
${{ hashFiles('.github/workflows/build.yml') }}
${{ hashFiles('.requirements') }}
${{ hashFiles('BUILD.bazel') }}
${{ hashFiles('WORKSPACE') }}
${{ hashFiles('bin/kong') }}
${{ hashFiles('bin/kong-health') }}
${{ hashFiles('build/**') }}
${{ hashFiles('kong-*.rockspec') }}
${{ hashFiles('kong.conf.default') }}
)
...than this:
FILE_HASHES=${{ hashFiles('.bazelignore' , '.bazelrc', '.bazelversion', '.github/actions/build-cache-key/**', '.github/workflows/build.yml', '.requirements', 'BUILD.bazel', 'WORKSPACE', 'bin/kong', 'bin/kong-health', 'build/**', 'kong-*.rockspec', 'kong.conf.default') }}
The extra bash just splits inputs.extra
into lines, appends it to FILE_HASHES, and then normalizes everything before sha256sum
. I don't like having extra bash code sitting around either, but GHA doesn't give us any other good option for invoking hashFiles()
with more than 2-3 inputs and not having unreadable line lengths.
It's also a lot less bash than it might look like at a glance. I'm used to getting code review from folks who don't understand lots of bash-isms, so I tend to write bash that is super verbose with lots of debug output and extensive use of intermediate vars (i.e. OFFSET
and NORMALIZED
) for the sake of the reader.
To illustrate, in 73e9df0 I removed all of the extra verbosity. It condenses down to just 5 actual statements:
# 1 - set initial array of hashes
FILE_HASHES=(
[...]
)
# 2 - check for extra input
if [[ -n ${EXTRA:-} ]]; then
# 3 - split input lines and append to FILE_HASHES
readarray \
-O "${#FILE_HASHES[@]}" \
-t \
FILE_HASHES \
<<< "$EXTRA"
fi
# 4 - normalize (remove empty strings and sort) and hash
HASH=$(printf '%s\n' "${FILE_HASHES[@]}" \
| grep -vE '^$' \
| sort --stable --unique \
| sha256sum - \
| awk '{print $1}'
)
# 5 - export
echo "CACHE_KEY=${PREFIX}::${HASH}" >> $GITHUB_OUTPUT
I could rewrite this in javascript and get more readable array/string-munging code, but I think I'd need to pull in a crypto dependency for the hash bit, so I'm not sure if that would be an improvement.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I could rewrite this in javascript and get more readable array/string-munging code, but I think I'd need to pull in a crypto dependency for the hash bit
Probably we will just go with perl 😄
I'm still not convinced that we need to use extra logic to handle all the hashes, I do get the readable part though, but then the extra code adds to a new layer of read-ability and maintainability.
anyway, I don't want to block on this anymore as this has been back and forth for long so i think it's now in good shape. is there anything left before transitioning to non-draft?
8324ce1
to
d7e7120
Compare
1cbe683
to
d7e7120
Compare
dd42caa
to
73e9df0
Compare
This updates the
build-cache-key
action to accept additional paths to use when generating the cache key. The impetus for this change is to allow removal ofkong/**
from the default cache key so that it can be used selectively.See also: #11299 (comment)
cc @fffonion lemme know what you think of this approach!