This document highlights the various ways a particular branch or changeset can be evaluated when it comes to performance characteristics.
When submitting changes that may impact the network's performance, we require to perform some level of comparison of the changes between that version and latest master or release.
- Execution (real) time
- CPU utilization at various interval, and/or percentile
- Disk I/O at the operating system level (ie: unit is typically in blocks/s), both reads and writes of the stellar-core process
- SQL: rate of operations (read and writes), Disk I/O of the SQL process if available (PostgreSQL)
- Memory utilization
There are 3 types of network available for testing.
- Testnet provides a realistic and moderate size dataset (transactions and ledger), and a few validators.
- Load testing of simple scenarios.
- The ledger size may not be big enough depending on the test being performed.
- SCP and flooding related changes typically require all validators to run code compatible with a change.
- Load testing or stress testing are not going to yield meaningful results on testnet due to the centralized nature of the deployment and dependence on specific compute profile.
- The public network has historical data dating back to when the network launched in 2015, the transaction history is as diverse as it can be.
- Validating the performance characteristics of old versions of the protocol - which may allow to quantify potential performance regressions in old versions (that can be acceptable).
- The public network has a much more diverse set of validators than testnet, which allows better evaluation of SCP related metrics.
- SCP and flooding related changes typically require all validators to run code compatible with a change.
- The ledger may not be large enough when evaluating future trends.
- Stress testing is expensive on the network and may lead to unnecessary outage.
- Injecting large amount of transactions in a network while controlling all parameters (can use some data from public network if using the same passphrase).
- Evaluating SCP and overlay for various network and trust topologies. Load testing (observe baseline metrics) and stress testing (evaluating breaking points).
- Evaluating heterogeneous environment (location of validators, connectivity, compute profile). This can be somewhat approximated by spinning up capacity in various datacenters (at a cost).
- Join a network with an empty node, wait until it’s fully in sync.
- this measures the overhead to get overlay up to speed as well as performing the various catchup tasks
- Catchup complete (replay entire history) vs recent (only replay around X ledgers from current).
- Observe a node for X (X=5/15/30) minutes in the “Synced!” state.
Performed using the command line catchup command (does not depend on overlay).
In order, this tests the following sequence:
- Download data from history
- Apply buckets from a fresh database
- Variable here is size of each bucket (~number of ledger entries)
- Apply transactions from N ledgers
- Variables to look for are the composition of each ledger (transaction set size, some ranges are busier than others)
Inject transactions, wait until they are incorporated into a ledger.
This scenario looks at the overhead of flooding transactions and SCP messages (required for transactions to be included in a ledger).
stellar-core has a built-in load generator that allows to inject transactions on private networks.
See the generateload
command for more detail.
Some tests (usually hidden, must be run directly) micro-benchmark (test tags contain bench
, like bucketbench
) or exercise certain parts of the code (for example [tx]
runs tests for all transaction related code).
They can be used as a way to demonstrate specific improvements in a specific subsystem.
In some cases it may make sense to submit changes to those tests (or write new micro-benchmarks) with the pull request.
Calling the metrics
command allows to gather the metrics at various intervals.
name | description |
---|---|
ledger.ledger.close | time to apply a ledger |
overlay.byte.read | number of bytes received |
overlay.byte.write | number of bytes sent |
Tools used to gather those metrics are O/S specific.
uptime
which reports the load average, can be a really good indicator of how a machine is performing in aggregate.
top
is often a good enough tool to give an idea of what is going on in the system.
For a list of performance related tool, see https://en.wikipedia.org/wiki/Load_(computing)
Utilization per processor
$ mpstat -P ALL
06:16:11 PM CPU %usr %nice %sys %iowait %irq %soft %steal %guest %gnice %idle
06:16:11 PM all 9.13 0.01 2.07 0.40 0.00 0.36 0.08 0.00 0.00 87.95
06:16:11 PM 0 10.41 0.01 2.42 0.46 0.00 0.73 0.12 0.00 0.00 85.86
06:16:11 PM 1 7.87 0.00 1.72 0.34 0.00 0.00 0.04 0.00 0.00 90.01
iotop
is the equivalent of top
for I/O; it also allows to aggregate data, which can be useful to identify small but steady utilization of I/O subsystems.
Basic view:
iostat -d
Linux 3.13.0-139-generic (core-live-005) 04/10/2018 _x86_64_ (2 CPU)
Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtn
xvda 0.82 0.23 8.53 1795356 66176300
xvdb 5.84 0.06 48.81 491097 378539212
xvdh 5.85 2.30 78.16 17845337 606107180
Detailed view:
$ iostat -d -x
Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
xvda 0.00 0.79 0.01 0.80 0.23 8.53 21.46 0.00 5.39 7.54 5.35 1.20 0.10
xvdb 0.00 6.49 0.01 5.83 0.06 48.81 16.75 0.00 0.18 0.31 0.18 0.12 0.07
xvdh 0.00 4.56 0.24 5.61 2.30 78.16 27.50 0.11 18.70 11.23 19.02 1.61 0.94
This section contains some examples on how to perform profiling on different platforms.
perf
is a good alternative to gprof
for event based profiling. A good tutorial is https://perf.wiki.kernel.org/index.php/Tutorial
Preparing the binary to be “perf friendly” (run before running make
):
export CXXFLAGS="-Og -fno-omit-frame-pointer"
export CFLAGS="$CXXFLAGS"
export LDFLAGS = -ltcmalloc_minimal
Gather data:
perf record ./stellar-core ...
Generate reports
perf report --stdio -g none -i ./perfdata | c++filt | less
perf report --stdio -g graph -i ./perfdata | c++filt | less
The main page for the profiler built into Visual Studio Community Edition is located there: https://docs.microsoft.com/en-us/visualstudio/profiling/index
Intel V-Tune (free, unlimited license 90 days renewal) https://software.intel.com/en-us/system-studio/choose-download