Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore(ci): support additional paths in the build cache key #11341

Merged
merged 7 commits into from
Sep 12, 2023

Conversation

flrgh
Copy link
Contributor

@flrgh flrgh commented Aug 1, 2023

This updates the build-cache-key action to accept additional paths to use when generating the cache key. The impetus for this change is to allow removal of kong/** from the default cache key so that it can be used selectively.

See also: #11299 (comment)

cc @fffonion lemme know what you think of this approach!

@github-actions github-actions bot added the chore Not part of the core functionality of kong, but still needed label Aug 1, 2023
@flrgh flrgh force-pushed the build/cache-key-opt-paths branch from 8813b4c to 8324ce1 Compare August 1, 2023 20:46
@@ -8,6 +8,9 @@ inputs:
description: 'String prefix applied to the build cache key'
required: false
default: 'build'
paths:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

taking a step back, as a generic cache key, it's not necessary composed of file hashes. what if we just omit kong/** from build-cache-key workflow, and in release.yml we will just

key: ${{ steps.cache-key.outputs.cache-key }}-${{ hashFiles('kong/**') }}

?
also it reminds me that we deliberately remove .github/** from the cache key of release.yml https://github.com/Kong/kong-ee/blob/1e98cb5a3fd8c15a3a8b11d1b16389eff9910787/.github/workflows/release.yml#L209. the cache is only used in PR, not for official releases, and in a PR when we are testing the workflow, it would save us lot of time as those files (generally) won't change the artifact being generated.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

taking a step back, as a generic cache key, it's not necessary composed of file hashes. what if we just omit kong/** from build-cache-key workflow, and in release.yml we will just

key: ${{ steps.cache-key.outputs.cache-key }}-${{ hashFiles('kong/**') }}

This is a good point. The gripe I still have with this is that it encourages more copy/paste, and sooner or later somebody will see hashFiles() in there and take this as their cue to add another file/glob. Eventually we're back to where we started with a super long line that is hard to read and painful to fix when there are merge conflicts:

key: ${{ steps.cache-key.outputs.cache-key }}-${{ hashFiles('kong/**', 'kong-extra/**', 'my-build-file.ext', 'my-other-directory/*/*.txt') }}

Callers of the action could achieve the same by adding hashFiles() to the prefix input option:

- name: Generate cache key
  id: cache-key
  uses: ./.github/actions/build-cache-key
  with:
    prefix: build::${{ hashFiles('kong/**') }}

It's somewhat better, but we still have the problem where new paths are added, and eventually it becomes a super long line.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it possible to just call hashFiles in Generate cache key? I'm mostly concerned about the fact that we are implementing our own method of hash and it seems bit complex.
something like sha256sum ( ${{ hashFiles(preset path1) }}-${{ hashFiles(preset path2) }}-${{ hashFiles(custom path1) }} )

Copy link
Contributor Author

@flrgh flrgh Aug 14, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm mostly concerned about the fact that we are implementing our own method of hash and it seems bit complex.

The primary reason for not using hashFiles() is that GHA's expression language doesn't really support non-primitive inputs. If we had array input structures we could do something like:

- name: Generate cache key
  id: cache-key
  uses: ./.github/actions/build-cache-key
  with:
    paths:
      - kong/**
      - foo.txt

...and then call hashFiles(inputs.paths) somewheres. But, yeah, here we are =/

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about something like

- name: Generate cache key
  id: cache-key
  uses: ./.github/actions/build-cache-key
  with:
    suffixes:
      - ${{ hashFiles('kong/**') }}
      - ${{ hashFiles('foo.txt') }}

then we can even have

- name: Generate cache key
  id: cache-key
  uses: ./.github/actions/build-cache-key
  with:
    suffixes:
    - fips

i know we want to make this easy to use, but the cost of maintaining a new hashFiles implementation is also what we need to consider.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem is that action inputs cannot be arrays though, only primitives: string, boolean, number.

Check out the latest iteration and let me know what you think.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i see, in that case, do we still need to compute filehashs and sort them?

        FILE_HASHES=(
          ${{ hashFiles('.bazelignore') }}
          ${{ hashFiles('.bazelrc') }}
          ${{ hashFiles('.bazelversion') }}
          ${{ hashFiles('.github/actions/build-cache-key/**') }}
          ${{ hashFiles('.github/workflows/build.yml') }}
          ${{ hashFiles('.requirements') }}
          ${{ hashFiles('BUILD.bazel') }}
          ${{ hashFiles('WORKSPACE') }}
          ${{ hashFiles('bin/kong') }}
          ${{ hashFiles('bin/kong-health') }}
          ${{ hashFiles('build/**') }}
          ${{ hashFiles('kong-*.rockspec') }}
          ${{ hashFiles('kong.conf.default') }}
          ${{ hashFiles('kong/**') }}
        )

could be a single
${{ hashFiles('.bazelignore', '.bazelrc', ...)) }} and we don't need all those bash logic, so only
sha256sum(hashFiles("default files") + "." + inputs.extra))

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find this to be much more readable:

FILE_HASHES=(
  ${{ hashFiles('.bazelignore') }}
  ${{ hashFiles('.bazelrc') }}
  ${{ hashFiles('.bazelversion') }}
  ${{ hashFiles('.github/actions/build-cache-key/**') }}
  ${{ hashFiles('.github/workflows/build.yml') }}
  ${{ hashFiles('.requirements') }}
  ${{ hashFiles('BUILD.bazel') }}
  ${{ hashFiles('WORKSPACE') }}
  ${{ hashFiles('bin/kong') }}
  ${{ hashFiles('bin/kong-health') }}
  ${{ hashFiles('build/**') }}
  ${{ hashFiles('kong-*.rockspec') }}
  ${{ hashFiles('kong.conf.default') }}
)

...than this:

FILE_HASHES=${{ hashFiles('.bazelignore' , '.bazelrc', '.bazelversion', '.github/actions/build-cache-key/**', '.github/workflows/build.yml', '.requirements', 'BUILD.bazel', 'WORKSPACE', 'bin/kong', 'bin/kong-health', 'build/**', 'kong-*.rockspec', 'kong.conf.default') }}

The extra bash just splits inputs.extra into lines, appends it to FILE_HASHES, and then normalizes everything before sha256sum. I don't like having extra bash code sitting around either, but GHA doesn't give us any other good option for invoking hashFiles() with more than 2-3 inputs and not having unreadable line lengths.

It's also a lot less bash than it might look like at a glance. I'm used to getting code review from folks who don't understand lots of bash-isms, so I tend to write bash that is super verbose with lots of debug output and extensive use of intermediate vars (i.e. OFFSET and NORMALIZED) for the sake of the reader.

To illustrate, in 73e9df0 I removed all of the extra verbosity. It condenses down to just 5 actual statements:

# 1 - set initial array of hashes
FILE_HASHES=(
  [...]
)

# 2 - check for extra input
if [[ -n ${EXTRA:-} ]]; then
  # 3 - split input lines and append to FILE_HASHES
  readarray \
    -O "${#FILE_HASHES[@]}" \
    -t \
    FILE_HASHES \
  <<< "$EXTRA"
fi

# 4 - normalize (remove empty strings and sort) and hash
HASH=$(printf '%s\n' "${FILE_HASHES[@]}" \
  | grep -vE '^$' \
  | sort --stable --unique \
  | sha256sum - \
  | awk '{print $1}'
)

# 5 - export
echo "CACHE_KEY=${PREFIX}::${HASH}" >> $GITHUB_OUTPUT

I could rewrite this in javascript and get more readable array/string-munging code, but I think I'd need to pull in a crypto dependency for the hash bit, so I'm not sure if that would be an improvement.

Copy link
Contributor

@fffonion fffonion Sep 8, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could rewrite this in javascript and get more readable array/string-munging code, but I think I'd need to pull in a crypto dependency for the hash bit

Probably we will just go with perl 😄

I'm still not convinced that we need to use extra logic to handle all the hashes, I do get the readable part though, but then the extra code adds to a new layer of read-ability and maintainability.
anyway, I don't want to block on this anymore as this has been back and forth for long so i think it's now in good shape. is there anything left before transitioning to non-draft?

@flrgh flrgh force-pushed the build/cache-key-opt-paths branch from dd42caa to 73e9df0 Compare September 6, 2023 17:39
@pull-request-size pull-request-size bot added size/S and removed size/M labels Sep 6, 2023
@fffonion fffonion marked this pull request as ready for review September 12, 2023 04:58
@fffonion fffonion merged commit ddc1400 into master Sep 12, 2023
21 checks passed
@fffonion fffonion deleted the build/cache-key-opt-paths branch September 12, 2023 05:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
chore Not part of the core functionality of kong, but still needed size/S
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants