A Cosmic Dance in a Little Box

It’s Hack Week again. This time around I decided to look at running TripleO on openSUSE. If you’re not familiar with TripleO, it’s short for OpenStack on OpenStack, i.e. it’s a project to deploy OpenStack clouds on bare metal, using the components of OpenStack itself to do the work. I take some delight in bootstrapping of this nature – I think there’s a nice symmetry to it. Or, possibly, I’m just perverse.

Anyway, onwards. I had a chat to Robert Collins about TripleO while at PyCon AU 2013. He introduced me to diskimage-builder and suggested that making it capable of building openSUSE images would be a good first step. It turned out that making diskimage-builder actually run on openSUSE was probably a better first step, but I managed to get most of that out of the way in a random fit of hackery a couple of months ago. Further testing this week uncovered a few more minor kinks, two of which I’ve fixed here and here. It’s always the cross-distro work that seems to bring out the edge cases.

Then I figured there’s not much point making diskimage-builder create openSUSE images without knowing I can set up some sort of environment to validate them. So I’ve spent large parts of the last couple of days working my way through the TripleO Dev/Test instructions, deploying the default Ubuntu images with my openSUSE 12.3 desktop as VM host. For those following along at home the install-dependencies script doesn’t work on openSUSE (some manual intervention required, which I’ll try to either fix, document, or both, later). Anyway, at some point last night, I had what appeared to be a working seed VM, and a broken undercloud VM which was choking during cloud-init:

Calling http://169.254.169.254/2009-04-04/meta-data/instance-id' failed
Request timed out

Figuring that out, well…  There I was with a seed VM deployed from an image built with some scripts from several git repositories, automatically configured to run even more pieces of OpenStack than I’ve spoken about before, which in turn had attempted to deploy a second VM, which wanted to connect back to the first over a virtual bridge and via the magic of some iptables rules and I was running tcpdump and tailing logs and all the moving parts were just suddenly this GIANT COSMIC DANCE in a tiny little box on my desk on a hill on an island at the bottom of the world.

It was at this point I realised I had probably been sitting at my computer for too long.

It turns out the problem above was due to my_ip being set to an empty string in /etc/nova/nova.conf on the seed VM. Somehow I didn’t have the fix in my local source repo. An additional problem is that libvirt on openSUSE, like Fedora, doesn’t set uri_default="qemu:///system". This causes nova baremetal calls from the seed VM to the host to fail as mentioned in bug #1226310. This bug is apparently fixed, but apparently the fix doesn’t work for me (another thing to investigate), so I went with the workaround of putting uri_default="qemu:///system" in ~/.config/libvirt/libvirt.conf.

So now (after a rather spectacular amount of disk and CPU thrashing) there are three OpenStack clouds running on my desktop PC. No smoke has come out.

  • The seed VM has successfully spun up the “baremetal_0” undercloud VM and deployed OpenStack to it.
  • The undercloud VM has successfully spun up the “baremetal_1” and “baremetal_2” VMs and deployed them as the overcloud control and compute nodes.
  • I have apparently booted a demo VM in the overcloud, i.e. I’ve got a VM running inside a VM, although I haven’t quite managed to ssh into the latter yet (I suspect I’m missing a route or a firewall rule somewhere).

I think I had it right last night. There is a giant cosmic dance being performed in a tiny little box on my desk on a hill on an island at the bottom of the world.

Or, I’ve been sitting at my computer for too long again.