Manage Chef Cookbooks in an Organization repo

While more and more services are starting to adopt Chef and move towards an automatic an automatic way of building, managing and deploying the infrastructure, things tend to be developed individually for each service instead of building reusable cookbooks and focus on common environments or role. Why you may ask? It looks easier to manage at first, but it’s just hiding the dust under the carpet. Soon, you’ll probably realize that it’s way harder to take advantage of what Chef was intended to provide and you’ll need many hours or days for refactoring and re-architecture the entire system

So what are the options then? What are the best practices you should take advantage of when you have to manage the Chef cookbooks from your organization?

First thing we should take notice of, is that we’re using the cookbooks for multiple purposes. On one hand we have the Chef cookbooks we use to manage application specific things, and on the other hand we have to cookbooks that have a specific purpose and can be easily shared with other, since these are not application dependent (or shouldn’t be)  – for example, the community cookbooks. We have cookbook we use to manage an entire system, and we have cookbooks that wrap the functionality of other cookbooks. If we were to try and define both types of cookbooks by their purpose, we have:

Application Cookbooks

  • Represent a single piece of software to be installed/managed on a node
  • Generally useful, highly reusable due to its focus on managing a single piece of software
  • May depend on other library cookbooks
  • Uses lax / optimistic versioning of dependencies (e.g. “>=”, “~>”, etc or no version constraint at all) so that it can be easily combined with other library cookbooks

Environment Cookbooks

  • Represent a whole node to be installed/managed
  • Very specific, not likely to be reused due to its focus on managing a complete system
  • Combines a set of library cookbooks into a coherent whole
  • Uses strict versioning of dependencies (e.g. “= 1.0.0”), including transitive ones(!), so that the exact same set of library cookbooks is used on this node, even years later
  • Very similar with an application cookbook

Library Cookbooks

  • Adding LWRPs that abstract common functionality
  • Including Libraries that add Ruby modules/classes for any depending cookbooks
  • Abstract common things into re-usable building blocks

Wrapper Cookbooks

  • The lightest Cookbook out of all the known Cookbook patterns
  • Depends on a single Application cookbooks and exposes it’s functionality with some small changes

So by this definition all of the community cookbooks are application cookbooks. It doesn’t really matter if whether they carry only libraries / LWRPs or recipes as well. In fact, any cookbook which installs something is an application cookbook The important distinction is the fact that they are created to be reused and usually one such cookbook never makes up for a whole system. In case you need to alter the behavior of an Library Cookbook or to override some of it’s attributes, it’s best to use the cookbooks patterns to achieve that so when you need to merge the community cookbook with your local fork you won’t have to struggle with conflicts.

On the other hand you have the application/environment cookbooks to set up a very specific system by reusing as much as possible from the library cookbooks, but this system is so specific that it would unlikely be reusable in other projects / customers / companies.

Git Versioning the Chef Cookbooks

First of all, it’s strongly recommend against managing all your cookbooks in a single Git repository. It may be look cleaner that you have a single Git repository containing all the Chef resources similar to how a Chef repository should look like, but it’s not. Just think about it: if you want to work on a cookbook and rewrite it from scratch, you’re obviously need a branch to work on. But this means that you’re going to branch the entire chef-repo along with your other cookbooks, environments, roles and data bags. It’s messy and pretty hard to reintegrate in the master branch.

Each cookbook should have it’s own Git repository, build process, and test suite. Cookbooks should be treated as software projects of their own. 

Even if it’s harder to start with, this practice allows you to easily branch, tag and control the merges for a single cookbook, just like for any other software project.

You may opt to exclude the library cookbooks from versioning, and add them as directories in your chef-repo. This is just a business-specific decision, depending on what you really need.

Supposing your Chef repository from Git looks something like this one:

Screen Shot 2014-12-01 at 19.39.20

You can still take advantage of an unified view of all of your cookbooks and have separate repository for each of your cookbooks, by using Git Submodules.

So, you’re going to have two places where you store all of your Chef Resources:

  • A big chef-repo, as the one described above, containing everything beside the cookbooks you need versioning on.
  • An organization, similar to opscode-cookbooks, which will have the rest of the cookbooks, separately versioned in it’s own repository.

If you want to add a new submodule, just run the following command from your chef-repo directory:

and a new folder will be created in the cookbooks with a clone of your cookbooks repository.

The results should look similar to this:

Screen Shot 2014-12-01 at 21.20.58

At the end, you should place yourself in one of the three scenarios:

  • Use just a monolithic chef-repo, containing all the cookbooks without any submodule. This is just fine if you’re working on a small project and use just couple of cookbooks and more importantly you’re the only one working with them. Managing them shouldn’t be that hard in this case.
  •  Cookbook per repo, where every cookbook is placed in it’s own separate repository and treated like a separate project. It doesn’t matter what kind of cookbook it is (application/library/wrapper cookbook/environment).
  • Hybrid cookbook per repo and cookbook per folder, which is exactly what i’ve used in the example from above. The library cookbooks (or some of them at least) un-versioned and placed as regular folders and all the other cookbooks added as submodules in order to have a better tracking.

Matching Git versioning with Chef versioning

If you’re not using Chef Solo, you probably want to push all the cookbooks into the Chef Server once modified. Again, this is a step where it really depends on if your using Chef Bundlers (Berkshelf, Librarian) or not, what are your exact requirements. However, i’ll try to define some guidelines on how you should match your Git cookbooks’ version with the Chef versioning system (if you’re not using Chef Solo)

Having two places where you store your cookbooks and without having a single source of truth, can be pretty messy. Some people might use Git to push their cookbook changes, other my push the cookbooks directly in Chef. Since Git is more evolved when it comes to triggers and APIs, my personal choice is having Git as source of truth. Yeah, and i have versioning, and history and everything that comes with Git :)

 

Assuming that you want this process to be done automatically, whenever you push something in Git, to be automatically pushed in Chef Server, your flow should look something like this:

  • A developer clones (recursively!) chef-repo in order to modify an a cookbook, myservice-backend for example.
  • All the required changes for myservice-backend are done in a separate branch, and the developer bumps up the cookbooks’ version
  • The code is merged in master and a new tag is created with the exact version specified in the cookbook’s metadata. Please note that merging in master is not really required if you’re working on a different version of the cookbook (like rewriting it from scratch, and you need it solely for tests that can’t be done in Vagrant). In this specific case, you can just create a tag from that branch, without merging it in master and whenever you want to continue with the changes, fetch directly the tag and work from there.
  • The tag and all the changes are pushed into origin

This all happens on the client side. The changes are now pushed into Git repository, but we still need to push them on the Chef server, so we can take advantage of Git Webhooks to trigger a Jenkins job that will handle the push, so the server side will look similar to this:

  • Jenkins job iterates through each of the cookbooks. If the cookbook is versioned, it should check that the version found in the metadata is the same as the tag’s name (otherwise you may want to automatically create a new tag with the metadata’s version just to be on the safe side).
  • Run all the tests for your cookbooks (kitchen ftw!)
  • After doing all the required sanity checks, the cookbook is pushed in the chef repository with –frozen, to avoid having developers that forgot to bump the version break systems that are already using that version

As tknerr states, If you are using Chef Solo, the trick is that within the infrastructure repository you must resolve each application cookbook in isolation, because they might have conflicting library cookbook dependencies (which is okay, because they run on different nodes anyway). The infrastructure repo contains the databags, environments, etc.. as well.

Here is an example infrastructure repo for Chef Solo. It uses the vagrant-application-cookbooks plugin for resolving the application cookbooks for each VM in isolation during vagrant up. It uses the Berskfile of the referenced application cookbook for resolving the cookbooks, so you don’t have to repeat yourself here.

Also you can checkout his approach using Vagrant/Chef Solo/Berkshelf:
http://lists.opscode.com/sympa/arc/chef/2013-10/msg00307.html

Chef Bundlers

Berkshelf and Librarian-Chef are cookbook bundlers (in the spirit of Bundler) but that are philosophically very different from each other in how they work.

I think the rule of thumb when deciding on which to adopt depends on how you prefer to view and manage your cookbook infrastructure:

If you’re using the single-repo-per-cookbook  approach then Berkshelf is likely a great fit. Make sure to checkout the nifty built-in Vagrant integration that makes the local build test/inspect cycle really easy.

If you’re using the single-chef-repo (monolithic, not recommended) approach then Librarian-Chef could improve your workflow, just move your existing privately maintained cookbooks into a site-cookbooks directory within your chef-repo. Berkshelf could still work with this approach but it’s definitely not the recommended way.

As a Berkshelf user, you might wonder if we still need a chef-repo and what goes in it? The scenarios described above are still viable even if you’re using Berkshelf to manage your cookbooks, and I’m happy to share something from @reset‘s Chef Conf Talk 

Yes, most developers using Chef to manage live infrastructure will still choose to use a chef-repo to do so. That chef-repo will contain all the usual things—environments, roles (if you use them), and data bags—with one important exception: cookbooks.

We built Berkshelf in large part because of our experience managing cookbooks in a chef-repo, which did not work well for us. Most cookbooks grow to become sufficiently complex software products to merit their own testing and continuous integration pipelines, and even simple ones can benefit from being tracked in their own repositories with their own branch structures and the like.

In short, extract all your cookbooks from your chef-repo if you have one. When you’re done, delete the cookbooks directory from your repo and don’t look back. You can continue to reference all your cookbooks from your roles and environments, and you can continue to manage the data bags that those cookbooks will use when they run on your infrastructure.

If you’re used to using your chef-repo as a “launchpad” to push your cookbooks onto a Chef Server, or package them up into an archive for use by Chef Solo, you can still do that. Just put a Berksfile in the root of your chef-repo that lists all of the cookbooks your infrastructure requires. Use berks upload to upload all of your cookbooks to your Chef Server, or berks package to package them. What you shouldn’t use your chef-repo for anymore is developing cookbooks. For that, you should work in the repo of the cookbook you are hacking on.

Once you fully embrace this approach, you might find that pulling down all your cookbooks onto your development machine just to deploy them starts to feel backward. We agree! Instead, consider setting up each of your cookbooks with continuous integration, so that every revision is automatically tested, and if it passes, is automatically pushed to your Chef Server. If you don’t use Chef Server, consider setting one up anyway just to receive those “ready for production” cookbooks. You can then pull them down and package them for use with Chef Solo.

I’ve worked with Librarian just to get a taste of it, but Berkshelf is awesome, and even if your not using it you still should check it out!

Final thoughts

First of all, i’d like to apologies if the terms used to define cookbook types were a bit misleading for you. I’ve liked the separation and i’ve adopted it, but you can really name them however you want.  I find it easier to go with this naming i’ve chosen due to the fact that it fits better in my business model.

Also, this is my personal opinion, based on some materials i’ve read in the past weeks, having these pages worth mentioning as source:

http://www.prashantrajan.com/posts/2013/06/leveling-up-chef-best-practices/

https://github.com/berkshelf/berkshelf/issues/535

http://blog.vialstudios.com/the-environment-cookbook-pattern/

I don’t want to convince anyone that one definition / view is better than the other, but I think its good to start trying and defining something. And update it based on our new findings as a community. And then proceed to create a different model if something comes up that makes our life easier, and so on. So please, don’t hesitate to leave a comment with your personal choice in matter of managing Chef cookbooks repositories, and let’s try building something together.

 

 

  • wjimenez5271

    Can you elaborate on this statement?: “But this means that you’re going to branch the entire chef-repo along with your other cookbooks, environments, roles and data bags. It’s messy and pretty hard to reintegrate in the master branch.”

    I’m not understanding the issue since git will only have to reintegrate what you’ve changed in the branch, so its only as complex as the nature of your change/refactor.

    • Sorry, i’ve been in a pretty long holiday. Well… yes, you’re right, Git does that. But if I want to work on CookbookX where I need to do some refactoring and mess-up the code a bit, I’ll obviously create a new branch from the Git Repo. This is entirely acceptable if you’re working alone or there are just a couple of people doing this. But if you apply this on a larger scale, with 10+ developers branching the entire repo, making changes to their cookbooks (which most of the time implies changing more than one cookbook), roles and environments, you’ll end-up having ~ 10 branches that are functioning on their own. When you start merging everything together things start getting pretty interesting, because you might end-up with conflicts in some common cookbooks, roles and environments and it’s really a pain in the ass. Even more, if you have an automatic method that reflects the Git changes in Chef server, having multiple versions of environments and roles being concurrently updated is not really the definition of consistency.

      I’m seeing this from a perspective where the developers don’t have a proper visualization of the big picture and the chef-repo is a common playground. If the chef-repo starts growing you can’t be in sync with that everyone else is doing and you’ll spend most of the time dealing with conflicts.

  • Petar Koraca

    Can you please explain where Role cookbook fits in? Thanks