Skip to content

Files

This branch is 1 commit ahead of NaicheD/datahub:master.

perf-test

Load testing with Locust

Locust is an open-source, python-based, easy-to-use load testing tool. It provides an interface to spawn multiple users (swarm) that behave according to pre-defined python code.

In this directory, we provide some example locust configs that send common requests to DataHub GMS (ingest, search, browse, and graph).

Prerequisites

To run the example configs, you need to first install locust by running

pip3 install locust

Note that it supports python versions 3.6 and up. Refer to this guide for more details.

You will also need to import requirements in order to run the Locustfile scripts:

pip3 install -r requirements.txt

Locustfiles

Locustfiles define how the users will behave once they are spawned. Refer to this doc on how to write one.

Here, we have defined 4 common requests

  • Ingest: ingests a dataset with a random URN with properties, browse paths, and ownership aspects filled out
  • Search: searches datasets with query "test"
  • Browse: browses datasets with path "/perf/test"
  • Graph: gets datasets owned by user "common"

We will continue adding more as more use cases arise, but feel free to play around with the default behavior to create a load test that matches your request pattern.

Load testing

There are two ways to run locust. One is through the web interface, and the other is on the command line.

Web interface

To run through the web interface, you can run the following

locust -f <<path-to-locustfile>>

For instance, to run ingest load testing, run the following from root of repo.

locust -f perf-test/locustfiles/ingest.py

This will set up the web interface in http://localhost:8089 (unless the port is already taken). Once you click into it, you should see the following

Input the number of users you would like to spawn and the spawn rate. Point the host to the deployed DataHub GMS ( locally, it should be http://localhost:8080). Click on the "Start swarming" button to start the load test.

The web interface should give you statistics on number of requests, latency, response rate, etc.

Command Line

To run on the command line, run the following

locust -f <<path-to-locustfile>> --headless -H <<host>> -u <<num-users>> -r <<spawn-rate>>

For instance, to replicate the setting in the previous section, run the following

locust -f perf-test/locustfiles/ingest.py --headless -H http://localhost:8080 -u 100 -r 100

It should start the load test and print out statistics on the command line.

Reference

For more details on how to run locust and various configs, refer to this doc

To customize the user behavior by modifying the locustfiles, refer to this doc