Devopsdays Day 1 (part two) - Intelligent Monitoring

If you've had anything to do with Ruby, or Ruby-based web development, you're certain to have heard of Cucumber. If you haven't, here's a brief overview.

Cucumber

Behaviour driven design (BDD) is an approach to test-first programming that emerged in 2003 as an attempt to ensure developers could extract the maximum value from the technique. Most developers rapidly see the benefits of writing unit tests to increase confidence in their code. This then develops (sometimes with some coaching) to a recognition that writing these tests first can help focus on writing only the code that is explicitly needed. Another benefit which is usually derived is that by looking at the tests, the developer is given a map to the code - some lightweight documentation. What was felt was that few developers would then go on to real expertise and discover that writing code test-first is actually a design method which helps to discover the API - in short to define the behaviour of their code. By creating tools which make it easy to define how the code should behave, in as close to every-day langauge as possible, we make this process easier.

Cucumber represents the latest evolution along the path to BDD enlightenment. It allows the developer to write at a very high level, in a plain text document. This makes it very easy to work with non-technical stake-holders to capture requirements, and write acceptance tests. Here's an example:

Feature: Transfer money between accounts
  As a savings account holder
  I want to be able to transfer money to my current account
  So that I can make a large purchase

Scenario:
  Given my savings account balance is £10000
  And my current account balance is £500
  When I transfer £5000 from my savings account to my current account
  Then my savings account should contain £5000
  And my current account should contain £5500

Note that this test focuses entirely on the behaviour and not on the implementation.

Nagios

I think it highly unlikely that anyone reading this blog won't know what [Nagios](http://www.nagios.org) is, but just in case, Nagios is one of the most mature and popular monitoring tools in the open source world. It is configured to monitor critial components of a system, including applications, services and network infrastructure. Nagios raises an alert and notifies support staff in the event of there being a problem with any of the services it is monitoring.

The problem is that Nagios is often configured to ask the wrong questions. You might be carrying out a ping test, or an http GET to verify that the service is up. This can lead to a false sense of security. The machine could be pingable, but not responsive. The application may be returning a blank or otherwise broken page, but Nagios will still see this as a 200 and report that the site is up.

Now, of course there are ways around this, and an experienced Nagios administrator will use more intelligent monitoring, but many don't, and one of the main reasons for this is that it isn't easy. Simple fact - if a tool isn't easy to use, people won't use it. cucumber-nagios solves this problem.

Cucumber-Nagios

Lindsay Holmwood is, in my view, a visionary. He, self-deprecatingly, describes himself as simply mad, but he's not - he's quite brilliant. He has the ability to identify a problem, to challenge the status quo, and then build something awesome to address the problem. Cucumber-nagios is an example of this. In what he describes as a thought experiment, he put forward the following question:

What happens if we combine Cucumber's ability to define the behavior of a system with Webrat's ability to interact with webpages, and the industry standard for system/network/application monitoring?

Well here's what it looks like:

Feature: google.co.uk
  It should be up
  And I should be able to search for things

  Scenario: Searching for things
    When I visit "http://www.google.com"
    And I fill in "q" with "wikipedia"
    And I press "Google Search"
    Then I should see "www.wikipedia.org"

    $ cucumber-nagios features/google.co.uk/search.feature
    Critical: 0, Warning: 0, 4 okay | value=4.000000;;;;

We've created a website monitor, in plain English, which genuinely checks the behaviour of a website, and which returns data in the form of a standard Nagios plugin. Suddenly the barrier to doing truly intelligent website monitoring has been reduced, very significantly.

Giving it a go

You'll need to have rubygems installed. If you want to use the latest version (0.5.1), which makes use of 'bundler', you'll need to have a rubygems newer than 1.3.5. Also, Lindsay is now hosting his projects on gemcutter, so you'll need to enable this, if you haven't already:

$ gem install gemcutter
$ gem list sources
$ gem tumble # If you don't see gemcutter.org as your first gem source

Now let's install cucumber-nagios:

$ gem install cucmber-nagios

OK, so next we create a monitoring project:

$ cucumber-nagios-gen project atalanta-systems

Now we go into the project and make it reusable elsewhere:

$ cd atalanta-systems
$ gem bundle

This will take about five minutes, as git downloads all the gems required to run cucumber-nagios, making it possible for us to push our code to a monitoring machine very quickly and easily.

$ git init
Initialized empty Git repository in /home/stephen/atalanta-systems/.git/
$ git add .
$ git commit -m "Initial commit"
[master (root-commit) 2c4aa14] Initial commit
 12 files changed, 409 insertions(+), 0 deletions(-)
 create mode 100644 .gitignore
 create mode 100644 Gemfile
 create mode 100644 README
 create mode 100755 bin/cucumber-nagios
 create mode 100755 bin/cucumber-nagios-gen
 create mode 100644 features/steps/benchmark_steps.rb
 create mode 100644 features/steps/result_steps.rb
 create mode 100644 features/steps/webrat_steps.rb
 create mode 100644 features/support/env.rb
 create mode 100644 features/support/nagios.rb
 create mode 100644 lib/generators/feature/%feature_name%.feature
 create mode 100644 lib/generators/feature/%feature_name%_steps.rb

Now generate a feature:

$ bin/cucumber-nagios-gen feature atalanta-systems.com browse
Generating with feature generator:
     [ADDED]  features/atalanta-systems.com/browse.feature
     [ADDED]  features/atalanta-systems.com/steps/browse_steps.rb

    $ cat features/atalanta-systems.com/browse.feature
    Feature: atalanta-systems.com
      It should be up

      Scenario: Visiting home page
        When I go to http://atalanta-systems.com
        Then the request should succeed

Now we can test our feature:

$ bin/cucumber-nagios features/atalanta-systems.com/browse.feature 
Critical: 1, Warning: 0, 0 okay | passed=0, failed=1, nosteps=0, total=1

This failed because I had unplugged my laptop. Let's see what happens when I plug it in:

$ bin/cucumber-nagios features/atalanta-systems.com/browse.feature 
Critical: 0, Warning: 0, 2 okay | passed=2, failed=0, nosteps=0, total=2

Our plugin has reported that the site is fine, and the plugin is ready to be used in our Nagios configuration. Hopefully you're beginning to see how powerful this tool is. It's easy to extend a feature - there are a number of built-in things you can test for. It's also possible to extend the DSL, if you're familiar with Ruby. I'll cover how to do this in a different article.

Conclusion

cucumber-nagios brings the paradigm of behaviour-driven testing into the syadmin world. To quote James Turnbull:

"systems testing should be about behaviour not about metrics. Who cares if the host is up and reports normal load if it doesn’t do the ACTUAL job it’s designed to: server web pages, send and receive email, resolve hosts, authenticate, backup and restore, etc, etc."

Up next...

As if cucumber-nagios wasn't cool enough, Lindsay then went on to tell us about his 'submarine project' - Flapjack - a complete rethink of the way monitoring works, with a distributed design suitable for today's cloud computing requirements. Stay tuned to find out more!