-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
improve caching #1005
improve caching #1005
Changes from 36 commits
8ee8b7e
0935798
8ba3b92
76001e7
4130412
1629280
aabe54d
af74819
f175856
215f47b
2b58a96
81a1341
af805ec
244f379
f74b1b9
3b61123
486540f
b20b554
49f0034
7d0b938
103ad7e
ebd2a5f
99ce830
25eb722
2ed835a
c505225
0a5f588
9da8693
d5424ab
defe36b
4fa24f7
c57b12b
379b63f
10996a3
a9062d4
fa1d783
2b79bfb
1206351
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -15,21 +15,43 @@ jobs: | |
- name: Check images exist | ||
run: ./tools/bin/check_images_exist.sh | ||
|
||
- name: Cache java deps | ||
- name: Docker Caching | ||
uses: actions/cache@v2 | ||
with: | ||
path: ~/.gradle | ||
key: gradle-${{ hashFiles('**/*.gradle') }} | ||
path: | | ||
/tmp/docker-registry | ||
key: docker-${{ runner.os }}-${{ hashFiles('Dockerfile') }}-${{ github.sha }} | ||
restore-keys: | | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. docker can (and should) use restore-keys since some of the layers will match even if the contents change |
||
gradle- | ||
docker-${{ runner.os }}-${{ hashFiles('Dockerfile') }}- | ||
docker-${{ runner.os }}- | ||
|
||
- name: Cache node deps | ||
- name: Pip Caching | ||
uses: actions/cache@v2 | ||
with: | ||
path: ~/.npm | ||
key: node-${{ hashFiles('**/package-lock.json') }} | ||
path: | | ||
~/.cache/pip | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Depending on how you use pip, more might be stored here. Right now this is super lightweight. It also allows restore-keys because this can be used by all venvs. |
||
key: pip-${{ runner.os }}-${{ hashFiles('**/setup.py') }}-${{ hashFiles('**/requirements.txt') }} | ||
restore-keys: | | ||
node- | ||
pip-${{ runner.os }}- | ||
|
||
- name: Npm Caching | ||
uses: actions/cache@v2 | ||
with: | ||
path: | | ||
~/.npm | ||
**/node_modules | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It's somewhat debatable that we should have node_modules in a place that uses restore-keys, but it seems like a much smaller issue than python venvs or gradle caching. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. isn't There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. You're right, it isn't recommended. I'll remove. |
||
key: npm-${{ runner.os }}-${{ hashFiles('**/package-lock.json') }} | ||
restore-keys: | | ||
npm-${{ runner.os }}- | ||
|
||
- name: Gradle and Python Caching | ||
uses: actions/cache@v2 | ||
with: | ||
path: | | ||
~/.gradle/caches | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The Github Action cache examples prefer to cache the specific directories for gradle instead of the entire This intentionally does not use restore-key. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. add comment saying this "This intentionally does not use restore-key."? |
||
~/.gradle/wrapper | ||
**/.venv | ||
key: ${{ runner.os }}-${{ hashFiles('**/*.gradle*') }}-${{ hashFiles('**/package-lock.json') }}-${{ hashFiles('**/setup.py') }}-${{ hashFiles('**/requirements.txt') }} | ||
|
||
- uses: actions/setup-java@v1 | ||
with: | ||
|
@@ -43,6 +65,19 @@ jobs: | |
with: | ||
python-version: '3.7' | ||
|
||
- name: Start local Docker registry | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. how does this interact with the docker cache? the idea is we get access to the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This uses |
||
run: docker run -d -p 5000:5000 --restart=always --name registry -v /tmp/docker-registry:/var/lib/registry registry:2 && npx wait-on tcp:5000 | ||
|
||
- name: Build | ||
run: ./gradlew --no-daemon build --scan | ||
|
||
- name: Ensure no file change | ||
run: git status --porcelain && test -z "$(git status --porcelain)" | ||
|
||
- name: Check documentation | ||
if: success() && github.ref == 'refs/heads/master' | ||
run: ./tools/site/link_checker.sh check_docs | ||
|
||
- name: Write Integration Test Credentials | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I moved this block down to make sure that |
||
run: ./tools/bin/ci_credentials.sh | ||
env: | ||
|
@@ -62,16 +97,6 @@ jobs: | |
AWS_S3_INTEGRATION_TEST_CREDS: ${{ secrets.AWS_S3_INTEGRATION_TEST_CREDS }} | ||
MAILCHIMP_TEST_CREDS: ${{ secrets.MAILCHIMP_TEST_CREDS }} | ||
|
||
- name: Build | ||
run: ./gradlew --no-daemon build --scan | ||
|
||
- name: Ensure no file change | ||
run: git status --porcelain && test -z "$(git status --porcelain)" | ||
|
||
- name: Check documentation | ||
if: success() && github.ref == 'refs/heads/master' | ||
run: ./tools/site/link_checker.sh check_docs | ||
|
||
- name: Run Integration Tests (PR) | ||
if: success() && github.ref != 'refs/heads/master' | ||
run: ./tools/bin/integration_test_pr.sh | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -20,17 +20,15 @@ dependencies { | |
|
||
// on rc version due to docker pull bug in current stable version. | ||
// issue: https://github.com/airbytehq/airbyte/issues/493 | ||
testCompile "org.testcontainers:mssqlserver:1.15.0-rc2" | ||
testImplementation "org.testcontainers:mssqlserver:1.15.0-rc2" | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. compile is deprecated |
||
|
||
testImplementation 'org.apache.commons:commons-text:1.9' | ||
testImplementation 'org.apache.commons:commons-lang3:3.11' | ||
testImplementation 'org.apache.commons:commons-dbcp2:2.7.0' | ||
testImplementation 'org.testcontainers:postgresql:1.15.0-rc2' | ||
|
||
testImplementation project(':airbyte-test-utils') | ||
|
||
integrationTestImplementation project(':airbyte-integrations:bases:standard-source-test') | ||
testCompile "org.testcontainers:mssqlserver:1.15.0-rc2" | ||
|
||
implementation files(project(':airbyte-integrations:bases:base-java').airbyteDocker.outputs) | ||
} |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -45,11 +45,17 @@ class AirbytePythonPlugin implements Plugin<Project> { | |
project.task('installReqs', type: PythonTask) { | ||
module = "pip" | ||
command = "install -r requirements.txt" | ||
inputs.file('requirements.txt') | ||
outputs.file('build/installedreqs.txt') | ||
outputs.cacheIf { true } | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. what's the implication of this? it just always caches its output? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This tells it that it's cacheable, but it still needs to re-run if the inputs change. |
||
} | ||
|
||
project.task('installTestReqs', type: PythonTask, dependsOn: project.installReqs) { | ||
module = "pip" | ||
command = "install .[tests]" | ||
inputs.file('setup.py') | ||
outputs.file('build/installedtestreqs.txt') | ||
outputs.cacheIf { true } | ||
} | ||
|
||
if(project.file('unit_tests').exists()) { | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -14,4 +14,21 @@ assert_root | |
|
||
cd "$PROJECT_DIR" | ||
|
||
DOCKER_BUILDKIT=1 docker build -f "$DOCKERFILE" . -t "$TAG" --iidfile "$ID_FILE" | ||
if [[ -z "$CI" ]]; then | ||
# run standard build locally (not on CI) | ||
DOCKER_BUILDKIT=1 docker build \ | ||
-f "$DOCKERFILE" . \ | ||
-t "$TAG" \ | ||
--iidfile "$ID_FILE" | ||
else | ||
# run build with local docker registery for CI | ||
docker pull localhost:5000/"$TAG" || true | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Storing in a local repository with a custom data dir (which is also started by Github Actions) seems like the best balance between build speed and cache speed. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. perhaps a silly question but shouldn't this also include the image name? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. here "$TAG" is the tagged image name. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. renamed so it's more clear |
||
DOCKER_BUILDKIT=1 docker build \ | ||
-f "$DOCKERFILE" . \ | ||
-t "$TAG" \ | ||
--iidfile "$ID_FILE" \ | ||
--cache-from localhost:5000/"$TAG" \ | ||
--build-arg BUILDKIT_INLINE_CACHE=1 | ||
docker tag "$TAG" localhost:5000/"$TAG" | ||
docker push localhost:5000/"$TAG" | ||
fi |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm using Dockerfile hashes even though the contents of the image might change, because this is only used to speed up builds, not to decide if it should skip a Docker build.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is
github.sha
intended? wouldn't this basically be discarded on every commit?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it is intended to store the cache entries with different keys. However, restore-keys allows it to read from different keys. It makes it easier to benchmark re-running builds for Docker with almost no downside (except maybe more cache evictions since it isn't overwriting keys as much).