6 practices for super smooth Ansible experience

I started porting my setup from Chef to Ansible a few weeks ago. Having had plenty of experience with Chef gave me a pretty good idea of what I wanted to achieve. One of the main advantages I see in Ansible is the ability to drive your server setup via ssh from your own machine. If you don’t have 100s of servers (update: actually more like tens of thousands, see the comment by mpdehaan), this agentless “push” approach is very powerful. You get to simplify things tremendously in ways like

deterministic order of operations across hosts
centralized configuration (no immediate need for the likes of etcd/consul)
agent forwarding
better control over host resources (no unnecessary periodic runs)

In essence, you have an entity that can see and orchestrate all the pieces in the system rather than having each piece trying to maintain itself by catching up to its surroundings.

Given the above points, this article is about running Ansible from your local machine. It assumes that the target hosts are only accessible via ssh, and helps setup Vagrant in the same way, as if it was a VPS.

Nevertheless, during my venture into Ansible I immediately ran into some sticking points, which I knew had to have elegant solutions, yet they were hard to search for online, or easy to miss in the docs. Naturally, they ate away my time and now I’d like to help you save yours.

1. Build a convenient local playground

Just a few servers that can talk to each other is all you want. Your multiple production machines, their interactions, their firewalls and dns config should all just be reproduced on a smaller scale. Is that really so hard? If your hosting provider is kind of like Digital Ocean it’s especially useful to get it all thoroughly mimicked, since you don’t get any security groups or virtual private clouds there, so all your ipconfig and dns stuff has to be configured by hand.

Well, turns out it’s easy, after some screwing around.

Path to failure

You start hooking up ansible provisioner in Vagrant. Don’t. It’s not even a good approximation of how you will run Ansible in production.

Path to success

There are 4 quick steps to having a very convenient setup.

Make it easy to sync your hosts file with your VMs
Automate adding your pub key to VMs
Configure your ssh client
Write your Vagrantfile

1. Make it easy to sync your hosts file with your VMs

This assumes you have vagrant installed. A very convenient vagrant plugin can automatically add and remove hosts every time you add or destroy VMs. Install as follows.

$ vagrant plugin install vagrant-hostsupdater

Now every time you boot or destroy a VM your /etc/hosts will have the hostname added/removed automatically. You will notice it asking you for your sudo password every time it tries to do that.

2. Automate adding your pub key to VMs

I wrote a small ruby script for Vagrant which lets you conveniently put your pub key into VM akin to how Digital Ocean would bootstrap your machine with your key. Assuming you’ve made a root dir for your Ansible project (I called mine stack), do this while in it.

$ mkdir vagrant
$ cd vagrant
$ curl -O https://gist.githubusercontent.com/maxim/dafc3b6da5754419babb/raw/7789793ed7e799dc22e6222c30c6130f34a055e7/key_authorization.rb
$ cd ..

Now you have a vagrant/key_authorization.rb file in there, I’ll show you how to use it in just a bit.

3. Configure your ssh client

SECURITY NOTICE: Absolutely do not do this for your production servers. This is only safe on a private vagrant network with your own VMs.

We will setup our machines on certain IP range, and I’d like them to be accessible just like Digital Ocean machines, directly as root. So this ~/.ssh/config makes it much more convenient.

# For vagrant virtual machines
Host 192.168.33.* *.myapp.dev
  StrictHostKeyChecking no
  UserKnownHostsFile=/dev/null
  User root
  LogLevel ERROR

With this one config you murdered a whole bunch of birds. Specifically,

SSH won’t complain about non-matching keys for your ever-changing vagrant VMs
SSH won’t try to remember and manage those keys via known_hosts
You won’t have to specify root@… every time
SSH will shut up about how you’re making it do such awful things

Just make sure you replace myapp with whatever local hostname you’d like for your app, and ip address with your desired vagrant ip range.

4. Write your Vagrantfile

Now that you have everything else in place, let’s add the Vagrantfile into your ansible dir.

require_relative './vagrant/key_authorization'

Vagrant.configure('2') do |config|
  config.vm.box = 'ubuntu/trusty64'
  authorize_key_for_root config, '~/.ssh/id_dsa.pub', '~/.ssh/id_rsa.pub'

  {
    'db1'    => '192.168.33.10',
    'app1'   => '192.168.33.11',
    'redis1' => '192.168.33.12',
  }.each do |short_name, ip|
    config.vm.define short_name do |host|
      host.vm.network 'private_network', ip: ip
      host.vm.hostname = "#{short_name}.myapp.dev"
    end
  end
end

This makes it super easy to add more machines into the ruby hash, specify their exact ips, and bring your whole stack up and down with vagrant up and vagrant suspend.

Also notice the require line on top, and the authorize_key_for_root command. This is a reference to my script you downloaded earlier. With this in place the first key it finds among the ones listed will go into the VM as one of root user’s authorized_keys. This way you can ssh as root without a password.

Also thanks to our ssh config, you now get to run the following, and it’ll just work.

$ vagrant up db1
$ ssh db1.myapp.dev
root@db1:~#

This might make you wonder, why not simply let Ansible setup a non-root user for you, and do everything via sudo? Based on my conversations with friendly neighborhood sysadmins, passwordless sudo gives you no more security than bootstrapping via root does. All it does is add an extra useless step to every operation. As far as using Ansible as Vagrant provisioner: as I mentioned in the intro, my goal is a very production-like environment. Vagrant shouldn’t play any role in it except leave me with a few blank machines similar to the ones my VPS provider would build for me. In essence, I want my starting point on Vagrant to be almost exactly like if I used an actual live VPS, and I like to keep it simple by making a good use of the default config. In my case it means a machine with a hostname, IP, and a root user with my key authorized. That’s exactly what we’re doing here.

2. Teach Ansible to talk to Github on your behalf

In an effort to keep things simple, I avoid having to create extra ssh keys on my servers and add them to Github. Instead there is a way to let servers access Github on your behalf without creating any extra identities. Ansible would take the identity of the user who initiated the playbook run, and forward it to the host, which in its turn will use it to talk to Github.

This mechanism is called agent forwarding. You might not want this if you have a complex deploy pipeline, where a deploy server acts autonomously and has its own identity, but Ansible makes it so easy to orchestrate various processes, that I decided not to build one for my setup.

So there is a setting for this. Create a file right here in the root dir called ansible.cfg with the following contents, and it will be automatically picked up when you run Ansible.

[ssh_connection]
ssh_args = -o ForwardAgent=yes

That’s it. No need to add new keys to github.

3. Add Github to known_hosts properly and securely

For those who are not sure what this is: a server like github can give you a key which your ssh client will use to ensure that you have a secure ssh connection. That key is easily obtained by using the following command.

ssh-keyscan -t rsa github.com

Path to failure

People out there suggest that you should run that command on your remote hosts in your Ansible playbooks to set the key dynamically. Don’t. That defeats the purpose of having the key. A man-in-the-middle attack could compromise the result you get, leaving you in the exact situation this measure was meant to prevent.

Path to success

Use Ansible feature called lookup. Here’s an example Ansible task that will set the key in a secure way.

- name: ensure github.com is a known host
  lineinfile:
    dest: /root/.ssh/known_hosts
    create: yes
    state: present
    line: ""
    regexp: "^github\\.com"

{:.notice} Careful: If you do this while having a large number of target servers, you’re gonna have a bad time. This might cause some serious bombardment of your control machine. In that case use accept_hostkeys=yes in your git task. I only have about 10-20 machines, so this isn’t a problem for me. (from the comment by mpdehaan)

You might wonder how is that different than the fail path above? First of all, this doesn’t run on a remote host, it runs on your control machine. Second of all, it only sets this key once per host. If github decides to change it you would have to write another play to update it, or modify this one. This is good because we don’t want a MITM attack to trigger a change of the real key.

Another advice out there is to actually hardcode this key in a variable. That’s also a good way to do it, but I don’t like having ugly strings pollute my var files.

4. Keep your secret vars separate

I’m personally not a fan of shit-work involved in placing variables in many different files. It’s more convenient to see the whole picture in one place. However, I do believe secret variables should be either git-ignored or encrypted, and for that you need to put them into their own file.

In my setup I use group_vars/all to keep all non-secret things. So now that this file is taken, how can you share secrets among all your hosts?

Path to failure

I spent a long time trying to figure this one out. I was recommended things like using lookups to fetch each individual variable from their own files elsewhere on my machine. I was also recommended to place these variables into vars file for each individual host, repeatedly. Both are fail. When I discovered the way, I admit I was kind of kicking myself.

Path to success

I simply didn’t know one little fact. Your group_vars/all can be a directory. All files in there can contain variables for all hosts. So I created 2 files in there, config and secrets. I also added group_vars/all/secrets to .gitignore and solved all my issues. Another approach would be to encrypt that file with ansible-vault and let it stay in your repo. I didn’t need that.

5. Avoid perpetually “changed” and “skipping” tasks

As a slightly obsessive-compulsive person, I didn’t like the fact that some tasks kept showing me “changed” or “skipping” status. Besides the fact that it feels wrong, various notification tools might end up bothering you about things changing while they actually aren’t. One such offender was the way to create a postgres extensions in your database.

Path to failure

This is the way a typical postgres create extension task looks.

- name: ensure postgresql hstore extension is created
  sudo: yes
  sudo_user: postgres
  shell: "psql my_database -c 'CREATE EXTENSION IF NOT EXISTS hstore;'"

Every time you run it, it will be detected as “changed” even though nothing actually changes.

Path to success

Instead we can leverage Ansible’s register, changed_when and failed_when to make this task report ok, as it should. Take a look at this version.

- name: ensure postgresql hstore extension is created
  sudo: yes
  sudo_user: postgres
  shell: "psql my_database -c 'CREATE EXTENSION hstore;'"
  register: psql_result
  failed_when: >
    psql_result.rc != 0 and ("already exists" not in psql_result.stderr)
  changed_when: "psql_result.rc == 0"

This clever trick takes advantage of psql exit codes and stderr output. Notice also that we removed IF NOT EXISTS part from the SQL to make sure we get an error if extension is already there. This is done on purpose, because we only consider the task failed if the exit code is not zero and the error is something other than “already exists”. If the error is actually “already exists”, then it’s not really a failure, it’s exactly what we want. The changed_when piece indicates that if there is no error and we exited successfully, then it means psql actually added the extension, and therefore changed. All neat now.

It’s worth noting that while it hurts an obsessive person like me, sometimes it’s hard to achieve an ok report on some tasks. For example, if you use the shell module with creates option, it might generate skipping instead of ok, and you should let it go. Instead focus on getting rid of changed reports, and leave skipping alone.

6. Separate your setup and deploy playbooks

Every time you use a package module in Ansible (like apt or npm) you have a choice between state=present and state=latest. The former will simply ensure that a desired package is installed, while the latter will, in addition to that, go ahead and update it if it’s not of the latest available version. When you are building your stack, my advice is to always prefer present. This also means that when using VCS modules like git set update: no. This is important because you need to be able to converge your server configuration without actually deploying and changing your software. A software update, whether it’s your app’s deploy, or a dependency version bump, has nothing to do with your server configuration, and could really break your production. Your updates have to be strict, purposeful, and well thought out, which is why I suggest to write separate playbooks for them. In those playbooks it would be acceptable to use state=latest, since you’d only run them when you’re ready to deal with the consequences. Chances are you would need to choreograph some data and configuration to get all the updated pieces working anyway, so having a different “convergence vector” for it is a much simpler approach.

Well, time to grab some coffee and dive back into building an awesome stack.