Alert whoever you want when your apps are in a bad shape. It uses Sickbay for app monitoring.
- Register the many apps you want to be monitored (with Name, URL to be checked and the HTTP statuses that indicate your app is fine).
- Every X minutes (completely up to you) The Nurse checks your apps
- If N of last M requests (again, completely up to you) returns a status code different from the one you expect, The Nurse will warn the Doctor about it.
- This warn is a POST request containing the name of the service, its URL and the last M HTTP codes received. This POST will be sent to whoever URL you want.
Notice: The app also registers an entry you your DB for each health check. This way you can easily go back in time and check how was your app at any given time.
The Nurse can be used to trigger a Kill Switch mechanism in your app: When your app receives the The Nurse's request into some endpoint, it stops some critical and automatic procedure to keep going.
This can be extremely useful when dealing with a microservice architecture or when you app depends on external services.
The Nurse can be also be used as a way to monitoring your apps and warn the right people when something is bad.
This setup assumes you have a proper Ruby workspace setted up with:
- Ruby 2.3.1
- Rails 5.0.0.1
- PostgreSQL
- Redis
Just run:
$ git clone http://github.com/IgorMarques/The-Nurse
$ cd The-Nurse
$ bundle install
$ rake db:create db:migrate
The app runs just fine for demo right out of the box (you just need to register some apps). But before putting your instance of The Nurse into production, remember to set it properly for your own needs.
Using rails console (don't worry, we have plans to add a proper web interface in the near future), create the apps/services you want to monitor. To run the console, run:
$ bundle exec rails console
And to create the apps, run this inside the console:
Service.create(name: 'ExampleService', url: 'www.example-service.com/health', allowed_codes: [200])
NOTICE: The allowed_codes
field is an array
Now your app will be properly monitored once you run the app.
By default, The Nurse uses my instance of Sickbay on Heroku (on a free tier plan) to run the checks. If you plan on using this app for real, please set your own Sickbay instance. The deploy on Heroku is pretty straightforward (you literally just need to push the code there).
After the setup, remember to set the ENV
variable SICKBAY_URL
to the proper URL.
By default, The Nurse will check the Sickbay instance every minute. You can change this by setting up the ENV
variable HEALTH_CHECK_RATE
to the time in minutes you desire.
By default, if 2 in the last 3 checks to the endpoint of the service return a value that is not present in the allowed_codes
list, The Nurse will notify your Doctor endpoint. You can custom set both values by setting up the ENV variables ENTRIES_FETCHED
and ENTRIES_OK
.
You can disable the monitoring for a specific app setting its active
attribute to false
. Only apps with the value true
are checked.
Just set the variable DOCTOR_URL
to whoever app should be notified when an outage happens. This URL should be able to receive a proper POST
HTTP request with the params like:
{
"service_name": "TheFailingService",
"service_url": "www.this_service_failed.com/health",
"codes": ["200", "500", "500"]
}
Once everything is setted up, this will run your healthchecks :)
$ foreman start
This will start all the components of the app:
You can also start each component alone. Check the Procfile for more info.
As mentioned earlier, you can use The Nurse to check the health at your app at any given time The Nurse was paying attention to it.
All health checks are stored into Statuses entries. Feel free to run the SQL or active record queries you like to fetch whatever data you want.
Example:
2.3.1 :001 > Service.first.statuses
Service Load (28.0ms) SELECT "services".* FROM "services" ORDER BY "services"."id" ASC LIMIT $1 [["LIMIT", 1]]
Status Load (55.4ms) SELECT "statuses".* FROM "statuses" WHERE "statuses"."service_id" = $1 [["service_id", 1]]
=> #<ActiveRecord::Associations::CollectionProxy [#<Status id: 1, code: 200, service_id: 1, created_at: "2016-11-29 19:25:31", updated_at: "2016-11-29 19:25:31">, #<Status id: 3, code: 200, service_id: 1, created_at: "2016-11-30 17:04:08", updated_at: "2016-11-30 17:04:08">, #<Status id: 6, code: 200, service_id: 1, created_at: "2016-11-30 17:04:59", updated_at: "2016-11-30 17:04:59">, #<Status id: 9, code: 200, service_id: 1, created_at: "2016-11-30 17:05:58", updated_at: "2016-11-30 17:05:58">, #<Status id: 12, code: 200, service_id: 1, created_at: "2016-11-30 17:06:59", updated_at: "2016-11-30 17:06:59">]>
You can do the same for outages:
2.3.1 :013 > Service.first.statuses
Service Load (0.3ms) SELECT "services".* FROM "services" ORDER BY "services"."id" ASC LIMIT $1 [["LIMIT", 1]]
Status Load (40.5ms) SELECT "statuses".* FROM "statuses" WHERE "statuses"."service_id" = $1 [["service_id", 1]]
=> #<ActiveRecord::Associations::CollectionProxy [#<Status id: 1, code: 200, service_id: 1, created_at: "2016-11-29 19:25:31", updated_at: "2016-11-29 19:25:31">, #<Status id: 3, code: 200, service_id: 1, created_at: "2016-11-30 17:04:08", updated_at: "2016-11-30 17:04:08">, #<Status id: 6, code: 200, service_id: 1, created_at: "2016-11-30 17:04:59", updated_at: "2016-11-30 17:04:59">, #<Status id: 9, code: 200, service_id: 1, created_at: "2016-11-30 17:05:58", updated_at: "2016-11-30 17:05:58">, #<Status id: 12, code: 200, service_id: 1, created_at: "2016-11-30 17:06:59", updated_at: "2016-11-30 17:06:59">]>
This app uses Rspec for testing. To run the test suit:
$ rspec
This project is compatible with heroku. Following their tutorial should be enough. You'll need at least one paid dyno, since the free plans only support up to two (and we have three components: the server, Sidekiq and Clockwork). Also remember to properly config a Redis to Go addon.
There's still a lot to be done. Here are some features planed:
- Web interface with the live status of each registered service
- Web interface for managing (creating, editing, deleting, etc) services
- Support for reading data from multiple Sickbay instances at once
Feel free to contribute with a PR :)