buildkube


Bazel	REAPI	Kubernetes

buildkube uses rules_docker and rules_k8s to build and deploy bazel-buildfarm (java), bazel-buildbarn (golang) and/or buildgrid (python) into an existing kubernetes cluster. These are the 3 known open-source server-side implementations of the remote-execution-api (REAPI), plus the closed source google Remote Build Execution (RBE) service (alpha).

Known clients of the REAPI include bazel itself, recc, and possibly pants.

INSTRUCTIONS

Clone this repository
Edit the WORKSPACE file k8s_defaults rule to point to your kubernetes cluster (should match $ kubectl config current-context)
Build and deploy an implementation: for example: $ (cd farm/ && make install)
In a separate terminal, establish port-forwarding to the server implementation $ (cd farm/ && make port-forward)
Clone the abseil repository as a test case: $ make abseil_clone
Compile abseil remotely: $ make abseil

NOTES

Bazel 0.17.1 or higher is required (primarily tested on 0.17.2 on an ubuntu laptop).
Run all tests via $ bazel test //....
Each implementation goes in its own namespace. $ kubectl get pods --all-namespaces to see all.
Consider adjusting replicas in the deploy.yaml files and/or bazelrc file.

OBSERVATIONS

General

Logging in all 3 implementations is scant and makes debugging difficult. Prometheus metrics are available in the barn impl (not examined thus far).

BuildFarm

BuildFarm worker does not detect if server goes down. Must manually kubectl delete pod --selector=k8s-app=worker when re-installing or updating server deployment.
When a worker registers itself with the server (operation-queue), it provides a dict of key:value pairs that must match the action execution requirements. In particular, the worker.config container-image key MUST be exactly matching the rbe_ubuntu image tag.

BuildBarn

After spinning up a new install, the service seems flaky at first. Tend to get several errors like: /tmp/abseil-cpp/absl/utility/BUILD.bazel:22:1: C++ compilation of rule '//absl/utility:utility_test' failed (Exit 34). Note: Remote connection/protocol failed with: execution failed catastrophically.

NOTE(@EdShoueten): There are three ways that can be used to alleviate this issue:

Spawn more workers on your cluster.

Pass in an explicit --jobs= to the build that is the same order of magnitude as the number of workers.

Tune this flag on the scheduler process: https://github.com/EdSchouten/bazel-buildbarn/blob/master/cmd/bbb_scheduler/main.go#L22

More details

BuildGrid

Worker does not auto-reconnect to a new server (like buildfarm).
Instance name (main) must match across the bazelrc --instance_name=main, server args -scheduler main|ubuntu-scheduler:8981, and worker args bot --remote=http://server:8980 --parent=main host-tools
Overall robustness to changes (increases) in job size and worker size is low. Seems to require resetting the server/workers in some cases. Seems happiest when job size matches worker replicas.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

buildkube

INSTRUCTIONS

NOTES

OBSERVATIONS

General

BuildFarm

BuildBarn

BuildGrid

Files

README.md

Latest commit

History

README.md

File metadata and controls

buildkube

INSTRUCTIONS

NOTES

OBSERVATIONS

General

BuildFarm

BuildBarn

BuildGrid