Command-line cookbook dependency solving with knife exec

Imagine you have a fairly complicated infrastructre with a large number of nodes and roles. Suppose you have a requirement to take one of the nodes and rebuild it in an entirely new network, perhaps even for a completely different organization. This should be easy, right? We have our infrastructure in the form of code. However, our current infrastructure has hundreds of uploaded cookbooks - how do we know the minimum ones to download and move over? We need to find out from a node exactly what cookbooks are needed for that node to be built.

The obvious place to start is with the node itself:

$ knife node show controller
Node Name:   controller
Environment: _default
FQDN:        controller
IP:          182.13.194.41
Run List:    role[base], recipe[apt::cacher], role[pxe_server]
Roles:       pxe_server, base
Recipes      apt::cacher, pxe_dust::server, dhcp, dhcp::config
Platform:    ubuntu 10.04

OK, this tells us we need the apt, pxe_dust and dhcp cookbooks. But what about them - do they have any dependencies? How could we find out? Well, dependencies are specified in two places - in the cookbook metadata, and in the individual recipes. Here’s a primitive way to illustrate this:

bash-3.2$ for c in apt pxe_dust dhcp
> do
> grep -iER 'include_recipe|^depends' $c/* | cut -d '"' -f 2 | sort | uniq
> done
apt::cacher-client
apache2
pxe_dust::server
tftp
tftp::server
utils

As I said - primitive. However the problem doesn’t end here. In order to be sure, we now need to repeat this for each dependency, recursively. And of course it would be nice to present them more attractively. Thinking about it, it would be rather useful to know what cookbook versions are in use too. This is definitely not a job for a shell one liner - is there a better way?

As it happens, there is. Think about it - the Chef server already needs to solve these dependencies to know what cookbooks to push to API clients. Can we access this logic? Of course we can - clients carry out all their interactions with the Chef server via the API. This means we can let the server solve the dependencies and query it via the API ourselves.

Chef provides two powerful ways to access the API without having to write a RESTful client. The first, Shef, is an interactive REPL based on IRB, which when launched gives access to the Chef server. This isn’t trivial to use. The second, much simpler way is the knife exec subcommand. This allows you to write Ruby scripts or simple one-liners that are executed in the context of a fully configured Chef API Client using the knife configuration file.

knife exec -E '(api.get "nodes/controller/cookbooks").each { |cb| pp cb[0] => cb[1].version }'

The /nodes/NODE_NAME/cookbooks endpoint returns the cookbook attributes, definitions, libraries and recipes that are required for this node. The response is a hash of cookbook name and Chef::CookbookVersion object. We simply iterate over each one, and pretty print the cookbook name and the version.

Let’s give it a try:

$ knife exec -E '(api.get "nodes/controller/cookbooks").each { |cb| pp cb[0] => cb[1].version }'
{"apt"=>"1.1.1"}
{"tftp"=>"0.1.0"}
{"apache2"=>"0.99.3"}
{"dhcp"=>"0.1.0"}
{"utils"=>"0.9.5"}
{"pxe_dust"=>"1.1.0"}

Nifty! :)