Bazel | REAPI | Kubernetes |
buildkube uses rules_docker and rules_k8s to build and deploy bazel-buildfarm (java), bazel-buildbarn (golang) and/or buildgrid (python) into an existing kubernetes cluster. These are the 3 known open-source server-side implementations of the remote-execution-api (REAPI), plus the closed source google Remote Build Execution (RBE) service (alpha).
Known clients of the REAPI include bazel itself, recc, and possibly pants.
- Clone this repository
- Edit the
WORKSPACE
filek8s_defaults
rule to point to your kubernetes cluster (should match$ kubectl config current-context
) - Build and deploy an implementation: for example:
$ (cd farm/ && make install)
- In a separate terminal, establish port-forwarding to the server
implementation
$ (cd farm/ && make port-forward)
- Clone the abseil repository as a test case:
$ make abseil_clone
- Compile abseil remotely:
$ make abseil
- Bazel 0.17.1 or higher is required (primarily tested on 0.17.2 on an ubuntu laptop).
- Run all tests via
$ bazel test //...
. - Each implementation goes in its own namespace.
$ kubectl get pods --all-namespaces
to see all. - Consider adjusting
replicas
in thedeploy.yaml
files and/orbazelrc
file.
- Logging in all 3 implementations is scant and makes debugging difficult. Prometheus metrics are available in the barn impl (not examined thus far).
-
BuildFarm worker does not detect if server goes down. Must manually
kubectl delete pod --selector=k8s-app=worker
when re-installing or updating server deployment. -
When a worker registers itself with the server (operation-queue), it provides a dict of key:value pairs that must match the action execution requirements. In particular, the
worker.config
container-image
key MUST be exactly matching the rbe_ubuntu image tag.
- After spinning up a new install, the service seems flaky at first. Tend to
get several errors like:
/tmp/abseil-cpp/absl/utility/BUILD.bazel:22:1: C++ compilation of rule '//absl/utility:utility_test' failed (Exit 34). Note: Remote connection/protocol failed with: execution failed catastrophically
.
NOTE(@EdShoueten): There are three ways that can be used to alleviate this issue:
- Spawn more workers on your cluster.
- Pass in an explicit --jobs= to the build that is the same order of magnitude as the number of workers.
- Tune this flag on the scheduler process: https://github.com/EdSchouten/bazel-buildbarn/blob/master/cmd/bbb_scheduler/main.go#L22
- Worker does not auto-reconnect to a new server (like buildfarm).
- Instance name (
main
) must match across thebazelrc
--instance_name=main
, server args-scheduler main|ubuntu-scheduler:8981
, and worker argsbot --remote=http://server:8980 --parent=main host-tools
- Overall robustness to changes (increases) in job size and worker size is low. Seems to require resetting the server/workers in some cases. Seems happiest when job size matches worker replicas.