Skip to content

bettermarks/kong_8531

Repository files navigation

docker compose setup to reproduce Kong/kong#8531

setup

We create a dummy plugin in each available PDK language:

Rust PDK is incomplete:

Each plugin injects a specific header, e.g. x-goplugin and forwards the request to a simple reflection service, which the loadtest tests against to check if the plugin was intercepting the request.

We use Kong version 2.7.1.

kong

docker compose up > kong.log

loadtest

(cd loadtest && cargo run --release -- --host http://localhost --no-reset-metrics --no-task-metrics -u 8 -t 20) > metrics.log

The loadtest accepts LOADTEST_TASKS environment var to select specific plugin tests:

LOADTEST_TASKS=lua,python,js,go cargo run --release -- --host http://localhost

kong reload

docker compose exec kong kong reload

assumptions

Kong serves under reasonable load - see goose loadtest.

observations

In every case we were able to DOS kong plugins by reload.

metrics:

All 8 users hatched.

22:18:33 [WARN] "/js": error sending request for url (http://localhost/js): connection closed before message completed
22:18:33 [WARN] "/js": error sending request for url (http://localhost/js): connection closed before message completed
22:18:33 [WARN] "/go": error sending request for url (http://localhost/go): connection closed before message completed
22:18:33 [WARN] "/go": error sending request for url (http://localhost/go): connection closed before message completed
22:18:33 [WARN] "/go": error sending request for url (http://localhost/go): connection closed before message completed
22:18:54 [WARN] "/js": error sending request for url (http://localhost/js): operation timed out
22:18:54 [WARN] "/js": error sending request for url (http://localhost/js): operation timed out
22:18:55 [WARN] "/js": error sending request for url (http://localhost/js): operation timed out
22:18:55 [WARN] "/js": error sending request for url (http://localhost/js): operation timed out
22:18:55 [WARN] "/js": error sending request for url (http://localhost/js): operation timed out
22:18:55 [WARN] "/js": error sending request for url (http://localhost/js): operation timed out
22:18:56 [WARN] "/js": error sending request for url (http://localhost/js): operation timed out
22:18:56 [WARN] "/js": error sending request for url (http://localhost/js): operation timed out

 === PER REQUEST METRICS ===
 ------------------------------------------------------------------------------
 Name                     |        # reqs |        # fails |    req/s |  fail/s
 ------------------------------------------------------------------------------
 GET /go                  |        14,312 |     282 (2.0%) |   715.60 |   14.10
 GET /js                  |        14,320 |     618 (4.3%) |   716.00 |   30.90
 GET /python              |        14,320 |         2 (0%) |   716.00 |    0.10
 -------------------------+---------------+----------------+----------+--------
 Aggregated               |        42,952 |     902 (2.1%) |     2148 |   45.10
 ------------------------------------------------------------------------------
 Name                     |    Avg (ms) |        Min |         Max |     Median
 ------------------------------------------------------------------------------
 GET /go                  |        0.93 |          1 |         202 |          1
 GET /js                  |       13.70 |          1 |      20,001 |          2
 GET /python              |        1.93 |          1 |       2,061 |          1
 -------------------------+-------------+------------+-------------+-----------
 Aggregated               |        5.52 |          1 |      20,001 |          1
 ------------------------------------------------------------------------------
 Slowest page load within specified percentile of requests (in ms):
 ------------------------------------------------------------------------------
 Name                     |    50% |    75% |    98% |    99% |  99.9% | 99.99%
 ------------------------------------------------------------------------------
 GET /go                  |      1 |      1 |      3 |      4 |     11 |    200
 GET /js                  |      2 |      3 |      6 |      8 |     54 | 20,000
 GET /python              |      1 |      2 |      4 |      5 |     19 |    430
 -------------------------+--------+--------+--------+--------+--------+-------
 Aggregated               |      1 |      2 |      5 |      6 |     25 | 20,000

 === ERRORS ===
 ------------------------------------------------------------------------------
 Count       | Error
 ------------------------------------------------------------------------------
 608           GET /js: plugin header missing
 279           GET /go: plugin header missing
 8             GET /js: error sending request /js: operation timed out
 3             GET /go: error sending request /go: connection closed before message completed
 2             GET /python: plugin header missing
 2             GET /js: error sending request /js: connection closed before message completed
 ------------------------------------------------------------------------------
 ------------------------------------------------------------------------------
 Users: 8
 Target host: http://localhost/
 Starting: 2022-04-27 00:18:19 - 2022-04-27 00:18:26 (duration: 00:00:07)
 Running:  2022-04-27 00:18:26 - 2022-04-27 00:18:46 (duration: 00:00:20)
 Stopping: 2022-04-27 00:18:46 - 2022-04-27 00:18:56 (duration: 00:00:10)

 goose v0.15.2
 ------------------------------------------------------------------------------

error log contains further details.

Python plugin

  • is not graceful due to closed connections
  • plugin is nearly stable and seems to "repair" its corruption.

Go plugin

  • is not graceful - there are closed connections
  • plugin is skipped/omitted, but reflection service reached (missing header)
  • reaches a state, where a certain rate of all consecutive requests is corrupt.

JavaScript plugin

  • is not graceful - closed connections and also timeouts
  • plugin is skipped/omitted, but reflection service reached (missing header)
  • reaches a state, where all consecutive requests are corrupt.

Lua plugin

  • is not graceful due to close connections
  • we were unable to trigger any corruption

nginx worker 100% CPU

Sometimes a nginx worker turns crazy by consuming 100% CPU - strace shows:

epoll_pwait(50, [], 512, 0, NULL, 8)    = 0
epoll_pwait(50, [], 512, 0, NULL, 8)    = 0
epoll_pwait(50, [], 512, 0, NULL, 8)    = 0
epoll_pwait(50, [], 512, 0, NULL, 8)    = 0
epoll_pwait(50, [], 512, 0, NULL, 8)    = 0
epoll_pwait(50, [], 512, 0, NULL, 8)    = 0
epoll_pwait(50, [], 512, 0, NULL, 8)    = 0
epoll_pwait(50, [], 512, 0, NULL, 8)    = 0
epoll_pwait(50, [], 512, 0, NULL, 8)    = 0
epoll_pwait(50, [], 512, 0, NULL, 8)    = 0

summary

  • All plugin types resp. languages suffer from closed connections when reloading kong.
  • Lua and Python plugins seem to stay sane.
  • Go and JavaScript plugins end up in a currupted state, where the only way to fix this is to reload/restart without load (yes also cold restarts under load have their pitfalls).
  • JavaScript plugins seem to fail in 100 % of cases after curruption takes place.

Since Python and JavaScript use mp_rpc.lua and their behavior is dramatically different, we conclude, that this issue is driven by the specific implementation of the plugin server.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published