Skip to content

Commit

Permalink
Make health probe server more general purpose (#1079)
Browse files Browse the repository at this point in the history
* Make health probe server more general purpose

This removes the health check logic from the ProbeServer and renames the
ProbeServer to UtilityServer that accepts any Rack based app.

The health check and catchall logic are moved into simple Rack middleware
that can be composed by users however they like and be used to preserve
existing health check behavior while transitioning to a more general
purpose utility server.

All and all this pattern will allow users to add whatever functionality
they like to GoodJob's web server by composing Rack apps and using
GoodJob's configuration to pass in users' Rack apps. IE:

```
config.good_job.middleware = Rack::Builder.app do
  use GoodJob::Middleware::MyCustomMiddleware
  use GoodJob::Middleware::PrometheusExporter
  use GoodJob::Middleware::Healthcheck
  run GoodJob::Middleware::CatchAll
end
config.good_job.middleware_port = 7001
```

This could help resolve:

* #750
* #532

* Use new API

* Revert server name change

We decided to leave the original ProbeServer name better sets
expectations. See:

#1079 (review)

This also splits out middleware testing into separate specs.

* Restore original naming

This also helps ensure that the existing behavior and API remain intact.

* Appease linters

* Add required message for mock

* Make test description relevant

* Allow for handler to be injected into ProbeServer

* Add WEBrick WEBrick handler

* Add WEBrick as a development dependency

* Add WEBrick tests and configuration

* Add idle_timeout method to mock

* Namespace server handlers

* Warn and fallback when WEBrick isn't loadable

Since the probe server has the option to use WEBrick as a server
handler, but this library doesn't have WEBrick as a dependency,
we want to throw a warning when WEBrick is configured, but not in the
load path. This will also gracefully fallback to the built in HTTP
server.

* inspect load path

* Account for multiple webrick entries in $LOAD_PATH

* try removing load path test

* For error on require to initiate test

As opposed to manipulating the load path.

* Handle explicit nils in intialization

* Allow probe_handler to be set in configuration

* Add documentation for probe server customization

* Appease linter

* retrigger CI

* Rename `probe_server_app` to `probe_app`; make handler name a symbol; rename Rack middleware/app for clarity

* Update readme to have relevant app example

* Fix readme grammar

---------

Co-authored-by: Ben Sheldon [he/him] <[email protected]>
  • Loading branch information
jklina and bensheldon authored Jan 23, 2024
1 parent a0e3b8d commit eec1f4b
Show file tree
Hide file tree
Showing 16 changed files with 587 additions and 186 deletions.
1 change: 1 addition & 0 deletions Gemfile.lock
Original file line number Diff line number Diff line change
Expand Up @@ -542,6 +542,7 @@ DEPENDENCIES
spoom
stackprof
tapioca
webrick
yard
yard-activesupport-concern

Expand Down
98 changes: 86 additions & 12 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -177,18 +177,19 @@ Usage:
good_job start
Options:
[--queues=QUEUE_LIST] # Queues or pools to work from. (env var: GOOD_JOB_QUEUES, default: *)
[--max-threads=COUNT] # Default number of threads per pool to use for working jobs. (env var: GOOD_JOB_MAX_THREADS, default: 5)
[--poll-interval=SECONDS] # Interval between polls for available jobs in seconds (env var: GOOD_JOB_POLL_INTERVAL, default: 10)
[--max-cache=COUNT] # Maximum number of scheduled jobs to cache in memory (env var: GOOD_JOB_MAX_CACHE, default: 10000)
[--shutdown-timeout=SECONDS] # Number of seconds to wait for jobs to finish when shutting down before stopping the thread. (env var: GOOD_JOB_SHUTDOWN_TIMEOUT, default: -1 (forever))
[--enable-cron] # Whether to run cron process (default: false)
[--enable-listen-notify] # Whether to enqueue and read jobs with Postgres LISTEN/NOTIFY (default: true)
[--idle-timeout=SECONDS] # Exit process when no jobs have been performed for this many seconds (env var: GOOD_JOB_IDLE_TIMEOUT, default: nil)
[--daemonize] # Run as a background daemon (default: false)
[--pidfile=PIDFILE] # Path to write daemonized Process ID (env var: GOOD_JOB_PIDFILE, default: tmp/pids/good_job.pid)
[--probe-port=PORT] # Port for http health check (env var: GOOD_JOB_PROBE_PORT, default: nil)
[--queue-select-limit=COUNT] # The number of queued jobs to select when polling for a job to run. (env var: GOOD_JOB_QUEUE_SELECT_LIMIT, default: nil)"
[--queues=QUEUE_LIST] # Queues or pools to work from. (env var: GOOD_JOB_QUEUES, default: *)
[--max-threads=COUNT] # Default number of threads per pool to use for working jobs. (env var: GOOD_JOB_MAX_THREADS, default: 5)
[--poll-interval=SECONDS] # Interval between polls for available jobs in seconds (env var: GOOD_JOB_POLL_INTERVAL, default: 10)
[--max-cache=COUNT] # Maximum number of scheduled jobs to cache in memory (env var: GOOD_JOB_MAX_CACHE, default: 10000)
[--shutdown-timeout=SECONDS] # Number of seconds to wait for jobs to finish when shutting down before stopping the thread. (env var: GOOD_JOB_SHUTDOWN_TIMEOUT, default: -1 (forever))
[--enable-cron] # Whether to run cron process (default: false)
[--enable-listen-notify] # Whether to enqueue and read jobs with Postgres LISTEN/NOTIFY (default: true)
[--idle-timeout=SECONDS] # Exit process when no jobs have been performed for this many seconds (env var: GOOD_JOB_IDLE_TIMEOUT, default: nil)
[--daemonize] # Run as a background daemon (default: false)
[--pidfile=PIDFILE] # Path to write daemonized Process ID (env var: GOOD_JOB_PIDFILE, default: tmp/pids/good_job.pid)
[--probe-port=PORT] # Port for http health check (env var: GOOD_JOB_PROBE_PORT, default: nil)
[--probe-handler=PROBE_HANDLER] # Use 'webrick' to use WEBrick to handle probe server requests which is Rack compliant, otherwise default server that is not Rack compliant is used.
[--queue-select-limit=COUNT] # The number of queued jobs to select when polling for a job to run. (env var: GOOD_JOB_QUEUE_SELECT_LIMIT, default: nil)"
Executes queued jobs.
Expand Down Expand Up @@ -304,6 +305,18 @@ Available configuration options are:
config.good_job.on_thread_error = -> (exception) { Rails.error.report(exception) }
```
- `probe_server_app` (Rack application) allows you to specify a Rack application to be used for the probe server. Defaults to `nil` which uses the default probe server. Example:
```ruby
config.good_job.probe_app = -> (env) { [200, {}, ["OK"]] }
```
- `probe_handler` (string) allows you to use WEBrick, a fully Rack compliant webserver instead of the simple default server. **Note:** You'll need to ensure WEBrick is in your load path as GoodJob doesn't have WEBrick as a dependency. Example:
```ruby
config.good_job.probe_handler = 'webrick'
```
By default, GoodJob configures the following execution modes per environment:
```ruby
Expand Down Expand Up @@ -1321,6 +1334,8 @@ A workaround to this limitation is to make a direct database connection availabl
### CLI HTTP health check probes
#### Default configuration
GoodJob's CLI offers an http health check probe to better manage process lifecycle in containerized environments like Kubernetes:
```bash
Expand Down Expand Up @@ -1374,6 +1389,65 @@ spec:
periodSeconds: 10
```
#### Custom configuration
The CLI health check probe server can be customized to serve additional information. Two things to note when customizing the probe server:
- By default, the probe server uses a homespun single thread, blocking server so your custom app should be very simple and lightly used and could affect job performance.
- The default probe server is not fully Rack compliant. Rack specifies various mandatory fields and some Rack apps assume those fields exist. If you do need to use a Rack app that depends on being fully Rack compliant, you can configure GoodJob to [use WEBrick as the server](#using-webrick)
To customize the probe server, set `config.good_job.probe_app` to a Rack app or a Rack builder:
```ruby
# config/initializers/good_job.rb OR config/application.rb OR config/environments/{RAILS_ENV}.rb
Rails.application.configure do
config.good_job.probe_app = Rack::Builder.new do
# Add your custom middleware
use Custom::AuthorizationMiddleware
use Custom::PrometheusExporter
# This is the default middleware
use GoodJob::ProbeServer::HealthcheckMiddleware
run GoodJob::ProbeServer::NotFoundApp # will return 404 for all other requests
end
end
```
##### Using WEBrick
If your custom app requires a fully Rack compliant server, you can configure GoodJob to use WEBrick as the server:
```ruby
# config/initializers/good_job.rb OR config/application.rb OR config/environments/{RAILS_ENV}.rb
Rails.application.configure do
config.good_job.probe_handler = :webrick
end
```
You can also enable WEBrick through the command line:
```bash
good_job start --probe-handler=webrick
```
or via an environment variable:
```bash
GOOD_JOB_PROBE_HANDLER=webrick good_job start
```
Note that GoodJob doesn't include WEBrick as a dependency, so you'll need to add it to your Gemfile:
```ruby
# Gemfile
gem 'webrick'
```
If WEBrick is configured to be used, but the dependency is not found, GoodJob will log a warning and fallback to the default probe server.
## Contribute
<!-- Please keep this section in sync with CONTRIBUTING.md -->
Expand Down
1 change: 1 addition & 0 deletions good_job.gemspec
Original file line number Diff line number Diff line change
Expand Up @@ -66,6 +66,7 @@ Gem::Specification.new do |spec|
spec.add_development_dependency "puma", "~> 5.6" # waiting on Capybara support for Puma v6
spec.add_development_dependency "rspec-rails"
spec.add_development_dependency "selenium-webdriver"
spec.add_development_dependency "webrick"
spec.add_development_dependency "yard"
spec.add_development_dependency "yard-activesupport-concern"
end
5 changes: 4 additions & 1 deletion lib/good_job.rb
Original file line number Diff line number Diff line change
Expand Up @@ -32,8 +32,11 @@
require "good_job/multi_scheduler"
require "good_job/notifier"
require "good_job/poller"
require "good_job/http_server"
require "good_job/probe_server"
require "good_job/probe_server/healthcheck_middleware"
require "good_job/probe_server/not_found_app"
require "good_job/probe_server/simple_handler"
require "good_job/probe_server/webrick_handler"
require "good_job/scheduler"
require "good_job/shared_executor"
require "good_job/systemd_service"
Expand Down
5 changes: 4 additions & 1 deletion lib/good_job/cli.rb
Original file line number Diff line number Diff line change
Expand Up @@ -94,6 +94,9 @@ def exit_on_failure?
type: :numeric,
banner: 'PORT',
desc: "Port for http health check (env var: GOOD_JOB_PROBE_PORT, default: nil)"
method_option :probe_handler,
type: :string,
desc: "Use 'webrick' to use WEBrick to handle probe server requests which is Rack compliant, otherwise default server that is not Rack compliant is used. (env var: GOOD_JOB_PROBE_HANDLER, default: nil)"
method_option :queue_select_limit,
type: :numeric,
banner: 'COUNT',
Expand All @@ -112,7 +115,7 @@ def start
systemd.start

if configuration.probe_port
probe_server = GoodJob::ProbeServer.new(port: configuration.probe_port)
probe_server = GoodJob::ProbeServer.new(app: configuration.probe_app, port: configuration.probe_port, handler: configuration.probe_handler)
probe_server.start
end

Expand Down
20 changes: 18 additions & 2 deletions lib/good_job/configuration.rb
Original file line number Diff line number Diff line change
Expand Up @@ -342,10 +342,26 @@ def pidfile
end

# Port of the probe server
# @return [nil,Integer]
# @return [nil, Integer]
def probe_port
options[:probe_port] ||
(options[:probe_port] ||
env['GOOD_JOB_PROBE_PORT']
)&.to_i
end

# Probe server handler
# @return [nil, Symbol]
def probe_handler
(options[:probe_handler] ||
rails_config[:probe_handler] ||
env['GOOD_JOB_PROBE_HANDLER']
)&.to_sym
end

# Rack compliant application to be run on the ProbeServer
# @return [nil, Class]
def probe_app
rails_config[:probe_app]
end

def enable_listen_notify
Expand Down
77 changes: 0 additions & 77 deletions lib/good_job/http_server.rb

This file was deleted.

37 changes: 21 additions & 16 deletions lib/good_job/probe_server.rb
Original file line number Diff line number Diff line change
Expand Up @@ -8,13 +8,20 @@ def self.task_observer(time, output, thread_error) # rubocop:disable Lint/Unused
GoodJob._on_thread_error(thread_error) if thread_error
end

def initialize(port:)
@port = port
def self.default_app
::Rack::Builder.new do
use GoodJob::ProbeServer::HealthcheckMiddleware
run GoodJob::ProbeServer::NotFoundApp
end
end

def initialize(port:, handler: nil, app: nil)
app ||= self.class.default_app
@handler = build_handler(port: port, handler: handler, app: app)
end

def start
@handler = HttpServer.new(self, port: @port, logger: GoodJob.logger)
@future = Concurrent::Future.new { @handler.run }
@future = @handler.build_future
@future.add_observer(self.class, :task_observer)
@future.execute
end
Expand All @@ -28,19 +35,17 @@ def stop
@future&.value # wait for Future to exit
end

def call(env)
case Rack::Request.new(env).path
when '/', '/status'
[200, {}, ["OK"]]
when '/status/started'
started = GoodJob::Scheduler.instances.any? && GoodJob::Scheduler.instances.all?(&:running?)
started ? [200, {}, ["Started"]] : [503, {}, ["Not started"]]
when '/status/connected'
connected = GoodJob::Scheduler.instances.any? && GoodJob::Scheduler.instances.all?(&:running?) &&
GoodJob::Notifier.instances.any? && GoodJob::Notifier.instances.all?(&:connected?)
connected ? [200, {}, ["Connected"]] : [503, {}, ["Not connected"]]
def build_handler(port:, handler:, app:)
if handler == :webrick
begin
require 'webrick'
WebrickHandler.new(app, port: port, logger: GoodJob.logger)
rescue LoadError
GoodJob.logger.warn("WEBrick was requested as the probe server handler, but it's not in the load path. GoodJob doesn't keep WEBrick as a dependency, so you'll have to make sure its added to your Gemfile to make use of it. GoodJob will fallback to its own webserver in the meantime.")
SimpleHandler.new(app, port: port, logger: GoodJob.logger)
end
else
[404, {}, ["Not found"]]
SimpleHandler.new(app, port: port, logger: GoodJob.logger)
end
end
end
Expand Down
27 changes: 27 additions & 0 deletions lib/good_job/probe_server/healthcheck_middleware.rb
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
# frozen_string_literal: true

module GoodJob
class ProbeServer
class HealthcheckMiddleware
def initialize(app)
@app = app
end

def call(env)
case Rack::Request.new(env).path
when '/', '/status'
[200, {}, ["OK"]]
when '/status/started'
started = GoodJob::Scheduler.instances.any? && GoodJob::Scheduler.instances.all?(&:running?)
started ? [200, {}, ["Started"]] : [503, {}, ["Not started"]]
when '/status/connected'
connected = GoodJob::Scheduler.instances.any? && GoodJob::Scheduler.instances.all?(&:running?) &&
GoodJob::Notifier.instances.any? && GoodJob::Notifier.instances.all?(&:connected?)
connected ? [200, {}, ["Connected"]] : [503, {}, ["Not connected"]]
else
@app.call(env)
end
end
end
end
end
11 changes: 11 additions & 0 deletions lib/good_job/probe_server/not_found_app.rb
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
# frozen_string_literal: true

module GoodJob
class ProbeServer
module NotFoundApp
def self.call(_env)
[404, {}, ["Not found"]]
end
end
end
end
Loading

0 comments on commit eec1f4b

Please sign in to comment.