Hackweek 9: Ceph Appliance Odyssey

This week is SUSE Hack Week 9. I wanted to spend some time working on a Ceph appliance image to make it easy to play with Ceph on openSUSE and/or SLES.

I tried making a SLES 11 SP2 appliance with SUSE Studio. I had to add the filesystems and devel:libraries:c_c++ repos from OBS to get reasonably up-to-date Ceph 0.56 and libboost_thread.so.1.49.0, but on boot when the appliance tried to expand its root filesystem, it died claiming it couldn’t load libe2p.so.2. Studio claims to be pulling in e2fsprogs from both the SP2 Updates and filesystems repo, so maybe that’s the problem. It seems impossible to choose one or the other, as they are the same version. (Update: it was just pointed out to me that you can click the little box next to the version number to choose which one is installed – must try again.)

So I left that alone and tried an openSUSE 12.3 appliance. The filesystems/ceph build for 12.3 is disabled, so I branched it and kicked off a build which failed with an exciting OOM error:

[ 3831s] [ 3803.167109] Out of memory: Kill process 16364 (cc1plus) score 254 or sacrifice child
[ 3831s] [ 3803.167959] Killed process 16364 (cc1plus) total-vm:825128kB, anon-rss:168760kB, file-rss:4kB
[ 3831s] g++: internal compiler error: Killed (program cc1plus)
[ 3831s] Please submit a full bug report,
[ 3831s] with preprocessed source if appropriate.
[ 3831s] See  for instructions.

Guess I should do what it says and file a bug. But I really did want something to play with immediately, so I added http://ceph.com/rpm/opensuse12/x86_64/ as a repo, and pulled in the upstream Ceph 0.56 RPMs. This seems to have worked and given me an openSUSE 12.3 image I can use to run through the Ceph 5-Minute Quick Start, Block Device Quick Start and CephFS Quick Start. So, here’s my extremely terse openSUSEified version of those quick start documents:

5-Minute Quick Start

Deploy the Appliance Image

I’m doing this with a couple of VMs, so in my case I make a couple of copies of the image:

# cp ~/openSUSE_12.3_Ceph_0.56.x86_64-0.0.3.qcow2 \
    /var/lib/libvirt/images/ceph-quickstart-server.qcow2
# cp ~/openSUSE_12.3_Ceph_0.56.x86_64-0.0.3.qcow2 \
    /var/lib/libvirt/images/ceph-quickstart-client.qcow2

Then I use virt-manager to create two VMs, backed by those images. Boot ’em up, log in (root password is “linux”), run yast network and set sensible hostnames (“ceph-client” and “ceph-server” instead of “linux-kjqd”, although admittedly those names wouldn’t be very sensible in a real deployment with more than one node).

Edit the Configuration File

The appliance image includes the /etc/ceph/ceph.conf file from the original 5-minute quick start, so log in to ceph-server, edit that file and replace {hostname} and {ip-address} with their real values, then copy the configuration file to ceph-client:

# scp /etc/ceph/ceph.conf ceph-client:/etc/ceph/

Deploy the Configuration

On ceph-server, create directories for each daemon:

# mkdir -p /var/lib/ceph/osd/ceph-0
# mkdir -p /var/lib/ceph/osd/ceph-1
# mkdir -p /var/lib/ceph/mon/ceph-a
# mkdir -p /var/lib/ceph/mds/ceph-a

Still on ceph-server, run the following:

# cd /etc/ceph
# mkcephfs -a -c /etc/ceph/ceph.conf -k ceph.keyring

Start Ceph

On ceph-server:

# chkconfig ceph on
# rcceph start
# ceph health

This will initially show something like:

HEALTH_ERR 576 pgs stuck inactive; 576 pgs stuck unclean; no osds

Eventually it will say HEALTH_OK and you’re good to go.

Copy the Keyring to the Client

This is necessary for authentication:

# scp /etc/ceph/ceph.keyring ceph-client:/etc/ceph/

Block Device Quick Start

On ceph-client:

# rbd create foo --size 4096
# modprobe rbd
# rbd map foo --pool rbd --name client.admin
# mkfs.ext4 -m0 /dev/rbd1
# mkdir /mnt/myrbd
# mount /dev/rbd1 /mnt/myrbd

(Why is this /dev/rbd1, not /dev/rbd/rbd/foo as in the original quick start?)

CephFS Quick Start

On ceph-client (kernel driver, not FUSE):

# mkdir /mnt/mycephfs
# mount -t ceph -o name=admin,secret=$(ceph-authtool \
    --name client.admin /etc/ceph/ceph.keyring --print-key) \
    ceph-server:/ /mnt/mycephfs

Interestingly, this gives “mount: error writing /etc/mtab: Invalid argument”, but still seems to actually mount the filesystem.

Also note that it appears I have 32GB of space for Ceph to use, even though ceph-server only has a 16GB root partition. I rather think that’s because there’s two OSDs, but both are just running off the root filesystem, they’re not separate disks/filesystems. I assume this is one of those Don’t Try This At Home things.