APM Server can be built using make apm-server
, which will build the apm-server binary for the host system.
make apm-server
will build using the same flags as used in distributed artifacts, which includes stripping
debug symbols. If you wish to run APM Server under a debugger such as delve
,
you should build APM Server directly: go build -o apm-server ./x-pack/apm-server
.
To run the unit tests, you can use make test
or simply go test ./...
. The unit tests do not require any external services.
The APM Server "system tests" run the APM Server in various scenarios, with the Elastic Stack running inside Docker containers.
To run the system tests locally, you can run go test
inside the systemtest directory.
Some tests make use of the concept of snapshot or approvals testing. If running tests leads to changed snapshots,
you can run the approvals
tool to update the snapshots. The following workflow is intended:
- Run
make test
, which will create a*.received.json
file for every newly created or changed snapshot. - Run
make check-approvals
to review and interactively accept the changes.
To run simple benchmark tests, run:
make bench
A good way to present your results is by using benchstat
.
With your changes in the current working tree, do:
$ go install golang.org/x/tools/cmd/benchstat@latest
$ make bench > new.txt
$ git checkout main
$ make bench > old.txt
$ benchstat old.txt new.txt
Macro benchmarking focuses on measuring the APM Server's performance (throughput) and how changes in the codebase impact that performance.
Our benchmarking tool, apmbench
, uses pre-recorded APM Agent events for the benchmarks.
This allows us to generate rich events which can be used to assess the Server's throughput under varying loads.
The APM Integration testing framework was used to generate
the events, and intake-receiver
will capture the events that are sent to the
intake API. apmbench
will records various metrics for understanding where any potential bottlenecks may be, and
how APM Server consumes the available resources.
TODO(marclop): convert the dot diagrams from dot to mermaid so they can be read in Markdown documents
The applications that are used to generate the stored traces may not always use the apm-integration-testing
.
Instead, we may want to write specific applications that generate a specific type of events, rather than re-use
the existing opbeans applications.
The events are currently commited in the apm-server
repository (apm-server/systemtest/benchtest/events
). This
may change in the near future, and instead, we'll download the stored traces on-demand and upload/update them
periodically.
# Navigate to your local copy of 'elastic/apm-integration-testing'.
$ SLEEP=180 STACK_VERSION=8.1.2 RPM=5000; ./scripts/compose.py start $STACK_VERSION --opbeans-go-loadgen-rpm ${RPM} --opbeans-python-loadgen-rpm ${RPM} --opbeans-node-loadgen-rpm $((${RPM} * 2)) --opbeans-ruby-loadgen-rpm ${RPM} --with-opbeans-go --no-apm-server-self-instrument --with-opbeans-python --with-opbeans-ruby --with-opbeans-node --apm-server-record --loadgen-no-ws && sleep $SLEEP && make copy-events; docker-compose down
...
# Copy the generated traces to the location where `apmbench` expects them to be (`apm-server/systemtest/benchtest/events`).
# Assuming that the `apm-server` repository has been checked out at the same level as `apm-integration-testing`.
$ cp -r events ../apm-server/systemtest/benchtest/events
apmbench
is located in systemtest/cmd/apmbench
and can target any APM Server with apm-server.expvar.enabled
set to true
to be able to calculate basic throughput measurements, but apm-server.pprof.enabled
should also
be set to true
if any of -blockprofile
, -cpuprofile
, -memprofile
or -mutexprofile
flags are set.
The default behavior of apmbench
is to send the captured events to the target APM Server as fast as possible
with the configured number of -agents
. The -agents
flag determines how many concurrent goroutines will be used
to send the events to the APM Server in parallel. The -max-rate
can be used to specify rate of events, as eps
or epm
to send to the APM server instead of the default behaviour. To benchmark the APM Server in setup similar
to what we'd see in production, the number of agents should be high (>500
).
By default, apmbench
will warm up the APM Server by sending events for N duration to the APM Server before any of the
benchmark scenarios are run. That N can be configured via -warmup-time
and defaults to a conservative number of 1 minute.
The default -benchtime
is 1s
which, for our purposes isn't a great default, so if you're benchmarking
changes to the APM Server you'll want to set the duration to at least 30s
to have some quick feedback, our
periodic benchmarks should aim to benchmark for longer to allow any long-queue effects to be detected.
The rest of the flags configure the apmbench
so it can target an APM Server, these can be configured via the
set flags, or their ELASTIC_APM_<UPPERCASE FLAG NAME>
alternative, for example, to configure the server URL
set ELASTIC_APM_SERVER_URL
to the full URL of the APM Server you'd like to benchmark.
Soak testing involves testing apm-server against a continuous, sustained workload to identify performance
and stability issues that occur over an extended period. apmsoak
command can be used to generate a
sustained and continuous load for the purpose of soak testing:
$ cd systemtest/cmd/apmsoak
$ go run main.go -h
Usage of /var/folders/k9/z1yw8fsn0sjbl5yy7z2rsdpr0000gn/T/go-build4164012609/b001/exe/main:
-agents-replicas int
Number of agents replicas to use, each replica launches 4 agents, one for each type (default 1)
-max-rate value
Max event rate as epm or eps with burst size=max(1000, 2*eps), <= 0 values evaluate to Inf (default 0epm)
-secret-token string
secret token for APM Server
-secure
validate the remote server TLS certificates
-server value
apm-server URL (default http://localhost:8200)
Worker with apmsoak
installed for generating load can be created on GCP using the soaktest_workers
terraform module. The terraform module can be run as follows:
$ cd testing/apmsoak/
$ make apply
Above command starts the apmsoak process as a systemd unit configured to send load at the specified rate.
The module also installs elastic-agent
to monitor the worker.
Smoke tests verify are light end to end tests which ensure that the "happy path" of the APM Server works as expected per the asserted scenarios. The idea is to automatically gauge if there are any critical problems in the APM Server in a regular manner.
These tests are currently using a Terraform module which manages the creation of deployments in ESS (could also be configured to use an ECE installation) with some light bash scripting which is run in a variety of scenarios and upgrades, but ensures there aren't any major problems with APM Server accepting and indexing events where it should.
The smoke tests can be found under testing/smoke
and the latest CI runs can be found in
the APM CI dashboard.
If the smoke tests are failing on the CI environment and active investigation is needed, it may be useful to
run the tests with the SKIP_DESTROY
variable set to any value (true
or 1
work too), which will prevent the
ESS deployment from being destroyed to allow some debugging to be done.
Often, we need to manually test the integration between different features, PR testing or pre-release testing.
Our docker-compose.yml
contains the basic components that make up the Elastic Stack for the APM Server.
For local development and testing you can use Tilt with a Kubernetes cluster.
We provide Kustomize manifests in testing/infra/k8s
for setting up
the Elastic Stack using ECK,
including Fleet Server and an Elastic Agent for running the APM integration. Tilt will watch
for source code changes and build and inject a customized elastic-agent Docker image; it will
also watch for changes to the APM integration package source, and rebuild and upload the
package to Kibana on changes.
⚠️ The Tilt setup configures ECK with a trial license. By using the Tilt setup, you must agree to the Elastic EULA which can be found at https://www.elastic.co/eula
To use Tilt, first install Tilt, and then create a Kubernetes cluster. On macOS you can use Docker Desktop. On Linux you should use Kind, optionally using ctlptl for managing the cluster.
Once Tilt is installed and you have a Kubernetes cluster running, simply run tilt up
in the
root of the repository. This will present you with the option of opening a web browser to watch
changes and inspect the pod output.
When you're done, you should run tilt down
to clean up. This will remove all resources
from the Kubernetes cluster created by Tilt.
APM Server publishes a set of metrics that are consumed either by Metricbeat or sent by the APM Server to an
Elasticsearch cluster. Some of these metrics are used to power the Stack Monitoring UI. The stack monitoring
setup is non trivial and has been automated in testing/stack-monitoring.sh
. The script will launch the
necessary stack components, modify the necessary files and once finished, you'll be able to test or ensure
that Stack Monitoring is working as expected.
Note that the testing/stack-monitoring.sh
script relies on systemtest/cmd/runapm
, and will use a locally built
version of APM Server (see more information below).
APM Server can be run in either standalone or managed mode by the ELastic Agent. To facilitate manual testing of
APM Server in managed mode, it is possible to inject a locally built apm-server
binary via systemtest/cmd/runapm
.
It requires having the apm-server docker-compose project running and creates the required fleet policies, and exposes
the APM Server port using a random binding that is printed to the standard output after the container has started.
$ cd systemtest/cmd/runapm
$ go run main.go -h
Usage of /var/folders/35/r4w8sbqj2md1sg944kpnzyth0000gn/T/go-build3644709196/b001/exe/main:
-arch string
The architecture to use for the APM Server and Docker Image (default runtime.GOARCH)
-d If true, runapm will exit after the agent container has been started
-f Force agent policy creation, deleting existing policy if found
-keep
If true, agent policy and agent will not be destroyed on exit
-name string
Docker container name to use, defaults to random
-namespace string
Agent policy namespace (default "default")
-policy string
Agent policy name (default "runapm")
-reinstall
Reinstall APM integration package (default true)
-var value
Define a package var (k=v), with values being YAML-encoded; can be specified more than once
It's possible to run runapm
(as pictured above) and re-use the image that runapm
builds to use in docker-compose
files or run in ECE / ESS. However, it's also possible to only build a docker image without requiring the docker-compose
project containers to be up and running with systemtest/cmd/buildapm
.
buildapm
reads the docker-compose.yml
at the root of the repository and uses that information to build an Elastic Agent
docker image with an APM Server bundled that contains any local changes you might have made.
By default, the amd64
architecture (or platform in Docker lingo) will be used. This may not be ideal if you run a machine
with a different architecture than amd64
, but you can specify the -arch
flag.
Additionally, if -cloud
is set, the Elastic Agent cloud image will be used as the base image, so changes can be packaged
and tested in ESS / ECE (See our internal documentation on these for how to use them).
$ cd systemtest/cmd/buildapm
$ go run main.go -arch arm64
2022/05/05 17:50:18 Building elastic-agent-systemtest:8.3.0-e4aa1f83-SNAPSHOT (arm64) from docker.elastic.co/beats/elastic-agent:8.3.0-e4aa1f83-SNAPSHOT...
2022/05/05 17:50:18 Building apm-server...
2022/05/05 17:50:18 Built /Users/marclop/repos/elastic/apm-server/build/apm-server-linux
2022/05/05 17:50:25 Built elastic-agent-systemtest:8.3.0-e4aa1f83-SNAPSHOT (arm64)
$ go run main.go -arch amd64
2022/05/05 17:50:35 Building elastic-agent-systemtest:8.3.0-e4aa1f83-SNAPSHOT (amd64) from docker.elastic.co/beats/elastic-agent:8.3.0-e4aa1f83-SNAPSHOT...
2022/05/05 17:50:35 Building apm-server...
2022/05/05 17:50:43 Built /Users/marclop/repos/elastic/apm-server/build/apm-server-linux
2022/05/05 17:50:49 Built elastic-agent-systemtest:8.3.0-e4aa1f83-SNAPSHOT (amd64)
# go run main.go -cloud
2022/05/19 11:08:04 Building image elastic-agent-systemtest:8.3.0-e4aa1f83-SNAPSHOT (amd64) from docker.elastic.co/cloud-release/elastic-agent-cloud:8.3.0-e4aa1f83-SNAPSHOT...
2022/05/19 11:08:04 Building apm-server...
2022/05/19 11:08:04 Built /Users/marclop/repos/elastic/apm-server/build/apm-server-linux
2022/05/19 11:09:07 Built image elastic-agent-systemtest:8.3.0-e4aa1f83-SNAPSHOT (amd64)
It is possible for Elastic employees to create an Elastic Cloud deployment with a locally built
APM Server binary and APM integration package. See testing/cloud
for instructions.