Anyone who’s ever deployed Ceph presumably knows about ceph-deploy. It’s right there in the Deployment chapter of the upstream docs, and it’s pretty easy to use to get a toy test cluster up and running. For any decent sized cluster though, ceph-deploy rapidly becomes cumbersome… As just one example, do you really want to have to `ceph-deploy osd prepare` every disk? For larger production clusters it’s almost certainly better to use a fully-fledged configuration management tool, such as Salt, which is what this post is about.
For those not familiar with Salt, the quickest way I can think to describe it is as follows:
- One host is the Salt master.
- All the hosts you’re managing are Salt minions.
- From the master, you can do things to the minions, for example you can run arbitrary commands. More interestingly though, you can create state files which will ensure minions are configured in a certain way – maybe a specific set of packages are installed, or some configuration files are created or modified.
- Salt has a means of storing per-minion configuration data, in what’s called a pillar. All pillar data is stored on the Salt master.
- Salt has various runner modules, which are convenience applications that execute on the Salt master. One such is the orchestrate runner, which can be used to apply configuration to specific minions in a certain order.
There’s a bit more to Salt than that, but the above should hopefully provide enough background that what follows below makes sense.
To use Salt to deploy Ceph, you obviously need to install Salt, but you also need a bunch of Salt state files, runners and modules that know how to do all the little nitty gritty pieces of Ceph deployment – installing the software, bootstrapping the MONs, creating the OSDs and so forth. Thankfully some of my esteemed colleagues on the SUSE Enterprise Storage team have created DeepSea, which provides exactly that. To really understand DeepSea, you should probably review the Intro, Management and Policy docs, but for this blog post, I’m going to take the classic approach of walking through set up of (you guessed it) a toy test cluster.
I have six hosts here, imaginatively named:
- ses4-0.example.com
- ses4-1.example.com
- ses4-2.example.com
- ses4-3.example.com
- ses4-4.example.com
- ses4-5.example.com
They’re all running SLES 12 SP2, and they all have one extra disk, most of which will be used for Ceph OSDs.
ses4-0 will be my Salt master. Every host will also be a Salt minion, including ses4-0. This is because DeepSea needs to perform certain operations on the Salt master, notably automatically creating initial pillar configuration data.
So, first we have to install Salt. You can do this however you see fit, but in my case it goes something like this:
ssh ses4-0 'zypper --non-interactive install salt-master ;
systemctl enable salt-master ;
systemctl start salt-master'
for n in $(seq 0 5) ; do
ssh ses4-$n 'zypper --non-interactive install salt-minion ;
echo "master: ses4-0.example.com" > /etc/salt/minion.d/master.conf ;
systemctl enable salt-minion ;
systemctl start salt-minion'
done
Then, on ses4-0, accept all the minion keys, and do a test.ping if you like:
ses4-0:~ # salt-key -A
The following keys are going to be accepted:
Unaccepted Keys:
ses4-0.example.com
ses4-1.example.com
ses4-2.example.com
ses4-3.example.com
ses4-4.example.com
ses4-5.example.com
Proceed? [n/Y] y
Key for minion ses4-0.example.com accepted.
Key for minion ses4-1.example.com accepted.
Key for minion ses4-2.example.com accepted.
Key for minion ses4-3.example.com accepted.
Key for minion ses4-4.example.com accepted.
Key for minion ses4-5.example.com accepted.
ses4-0:~ # salt '*' test.ping
ses4-0.example.com:
True
ses4-5.example.com:
True
ses4-3.example.com:
True
ses4-2.example.com:
True
ses4-1.example.com:
True
ses4-4.example.com:
True
That works, so now DeepSea needs to be installed on the salt-master host (ses4-0). There’s RPMs available for various SUSE distros, so I just ran `zypper in deepsea`, but for other distros you can run `make install` from a clone of the source tree to get everything in the right place. If you’re doing the latter, you’ll need to `systemctl restart salt-master` so that it picks up /etc/salt/master.d/modules.conf and /etc/salt/master.d/reactor.conf included with DeepSea. Note: we’ll happily accept patches from anyone who’s interested in helping with packaging for other distros 😉
There’s one other tweak required. /srv/pillar/ceph/master_minion.sls specifies the hostname of the salt master. When installing the RPM, this is automatically set to $(hostname -f), so in my case I have:
ses4-0:~ # cat /srv/pillar/ceph/master_minion.sls master_minion: ses4-0.example.com
If you’re not using the RPM, you’ll need to tweak that file appropriately by hand.
Now comes the interesting part. DeepSea splits Ceph deployment into several stages, as follows:
- Stage 0: Provisioning
- Ensures latest updates are installed on all minions (technically this is optional).
- Stage 1: Discovery
- Interrogates all the minions and creates pillar configuration fragments in /srv/pillar/ceph/proposals.
- Stage 2: Configure
- Before running this stage, you have to create a /srv/pillar/ceph/proposals/policy.cfg file, specifying which nodes are to have which roles (MON, OSD, etc.). This stage then merges the pillar data into its final form.
- Stage 3: Deploy
- Validates the pillar data to ensure the configuration is correct, then deploys the MONs and OSDs.
- Stage 4: Services
- Deploys non-core services (iSCSI gateway, CephFS, RadosGW, openATTIC).
- Stage 5: Removal
- Used to decommission hosts.
Let’s give it a try. Note that there’s presently a couple of SUSE-isms in DeepSea (notably some invocations of `zypper` to install software), so if you’re following along at home on a different distro and run into any kinks, please either let us know what’s broken or open a PR if you’ve got a fix.
ses4-0:~ # salt-run state.orch ceph.stage.0
master_minion : valid
ceph_version : valid
None
###########################################################
The salt-run command reports when all minions complete.
The command may appear to hang. Interrupting (e.g. Ctrl-C)
does not stop the command.
In another terminal, try 'salt-run jobs.active' or
'salt-run state.event pretty=True' to see progress.
###########################################################
False
True
[WARNING ] All minions are ready
True
ses4-0.example.com_master:
----------
ID: sync master
Function: salt.state
Result: True
Comment: States ran successfully. Updating ses4-0.example.com.
Started: 23:04:06.166492
Duration: 459.509 ms
Changes:
ses4-0.example.com:
----------
ID: load modules
Function: module.run
Name: saltutil.sync_all
Result: True
Comment: Module function saltutil.sync_all executed
Started: 23:04:06.312110
Duration: 166.785 ms
Changes:
----------
ret:
----------
beacons:
grains:
log_handlers:
modules:
output:
proxymodules:
renderers:
returners:
sdb:
states:
utils:
Summary for ses4-0.example.com
------------
Succeeded: 1 (changed=1)
Failed: 0
------------
Total states run: 1
[...]
There’s a whole lot more output than I’ve quoted above, because that’s what happens with Salt when you apply a whole lot of state to a bunch of minions, but it finished up with:
Summary for ses4-0.example.com_master ------------- Succeeded: 15 (changed=9) Failed: 0 ------------- Total states run: 15
Stage 1 (discovery) is next, to interrogate all the minions and create configuration fragments:
ses4-0:~ # salt-run state.orch ceph.stage.1
[WARNING ] All minions are ready
True
- True
ses4-0.example.com_master:
----------
ID: ready
Function: salt.runner
Name: minions.ready
Result: True
Comment: Runner function 'minions.ready' executed.
Started: 23:05:02.991371
Duration: 588.429 ms
Changes: Invalid Changes data: True
----------
ID: discover
Function: salt.runner
Name: populate.proposals
Result: True
Comment: Runner function 'populate.proposals' executed.
Started: 23:05:03.580038
Duration: 1627.563 ms
Changes: Invalid Changes data: [True]
Summary for ses4-0.example.com_master
------------
Succeeded: 2 (changed=2)
Failed: 0
------------
Total states run: 2
Now there’s a bunch of interesting data from all the minions stored as SLS and YAML files in subdirectories of /srv/pillar/ceph/proposals. This is the point at which you get to decide exactly how your cluster will be deployed – what roles will be assigned to each node, and how the OSDs will be configured. You do this by creating a file called /srv/pillar/ceph/proposals/policy.cfg, which in turn includes the relevant configuration fragments generated during the discovery stage.
Creating /srv/pillar/ceph/proposals/policy.cfg is probably the one part of using DeepSea that’s easy to screw up. We’re working on making it easier, and the policy docs and examples will help, but in the meantime the approach I’ve taken personally is to generate a policy.cfg including every possible option, then get rid of the ones I don’t want. Here’s a dump of every single configuration fragment generated by the discovery stage on my toy test cluster:
ses4-0:~ # cd /srv/pillar/ceph/proposals/ ses4-0:/srv/pillar/ceph/proposals # find * -name \*.sls -o -name \*.yml | sort > policy.cfg ses4-0:/srv/pillar/ceph/proposals # cat policy.cfg cluster-ceph/cluster/ses4-0.example.com.sls cluster-ceph/cluster/ses4-1.example.com.sls cluster-ceph/cluster/ses4-2.example.com.sls cluster-ceph/cluster/ses4-3.example.com.sls cluster-ceph/cluster/ses4-4.example.com.sls cluster-ceph/cluster/ses4-5.example.com.sls cluster-unassigned/cluster/ses4-0.example.com.sls cluster-unassigned/cluster/ses4-1.example.com.sls cluster-unassigned/cluster/ses4-2.example.com.sls cluster-unassigned/cluster/ses4-3.example.com.sls cluster-unassigned/cluster/ses4-4.example.com.sls cluster-unassigned/cluster/ses4-5.example.com.sls config/stack/default/ceph/cluster.yml config/stack/default/global.yml profile-1QEMU24GB-1/cluster/ses4-0.example.com.sls profile-1QEMU24GB-1/cluster/ses4-1.example.com.sls profile-1QEMU24GB-1/cluster/ses4-2.example.com.sls profile-1QEMU24GB-1/cluster/ses4-3.example.com.sls profile-1QEMU24GB-1/cluster/ses4-4.example.com.sls profile-1QEMU24GB-1/cluster/ses4-5.example.com.sls profile-1QEMU24GB-1/stack/default/ceph/minions/ses4-0.example.com.yml profile-1QEMU24GB-1/stack/default/ceph/minions/ses4-1.example.com.yml profile-1QEMU24GB-1/stack/default/ceph/minions/ses4-2.example.com.yml profile-1QEMU24GB-1/stack/default/ceph/minions/ses4-3.example.com.yml profile-1QEMU24GB-1/stack/default/ceph/minions/ses4-4.example.com.yml profile-1QEMU24GB-1/stack/default/ceph/minions/ses4-5.example.com.yml role-admin/cluster/ses4-0.example.com.sls role-admin/cluster/ses4-1.example.com.sls role-admin/cluster/ses4-2.example.com.sls role-admin/cluster/ses4-3.example.com.sls role-admin/cluster/ses4-4.example.com.sls role-admin/cluster/ses4-5.example.com.sls role-client-cephfs/cluster/ses4-0.example.com.sls role-client-cephfs/cluster/ses4-1.example.com.sls role-client-cephfs/cluster/ses4-2.example.com.sls role-client-cephfs/cluster/ses4-3.example.com.sls role-client-cephfs/cluster/ses4-4.example.com.sls role-client-cephfs/cluster/ses4-5.example.com.sls role-client-iscsi/cluster/ses4-0.example.com.sls role-client-iscsi/cluster/ses4-1.example.com.sls role-client-iscsi/cluster/ses4-2.example.com.sls role-client-iscsi/cluster/ses4-3.example.com.sls role-client-iscsi/cluster/ses4-4.example.com.sls role-client-iscsi/cluster/ses4-5.example.com.sls role-client-radosgw/cluster/ses4-0.example.com.sls role-client-radosgw/cluster/ses4-1.example.com.sls role-client-radosgw/cluster/ses4-2.example.com.sls role-client-radosgw/cluster/ses4-3.example.com.sls role-client-radosgw/cluster/ses4-4.example.com.sls role-client-radosgw/cluster/ses4-5.example.com.sls role-igw/cluster/ses4-0.example.com.sls role-igw/cluster/ses4-1.example.com.sls role-igw/cluster/ses4-2.example.com.sls role-igw/cluster/ses4-3.example.com.sls role-igw/cluster/ses4-4.example.com.sls role-igw/cluster/ses4-5.example.com.sls role-igw/stack/default/ceph/minions/ses4-0.example.com.yml role-igw/stack/default/ceph/minions/ses4-1.example.com.yml role-igw/stack/default/ceph/minions/ses4-2.example.com.yml role-igw/stack/default/ceph/minions/ses4-3.example.com.yml role-igw/stack/default/ceph/minions/ses4-4.example.com.yml role-igw/stack/default/ceph/minions/ses4-5.example.com.yml role-master/cluster/ses4-0.example.com.sls role-master/cluster/ses4-1.example.com.sls role-master/cluster/ses4-2.example.com.sls role-master/cluster/ses4-3.example.com.sls role-master/cluster/ses4-4.example.com.sls role-master/cluster/ses4-5.example.com.sls role-mds/cluster/ses4-0.example.com.sls role-mds/cluster/ses4-1.example.com.sls role-mds/cluster/ses4-2.example.com.sls role-mds/cluster/ses4-3.example.com.sls role-mds/cluster/ses4-4.example.com.sls role-mds/cluster/ses4-5.example.com.sls role-mds-nfs/cluster/ses4-0.example.com.sls role-mds-nfs/cluster/ses4-1.example.com.sls role-mds-nfs/cluster/ses4-2.example.com.sls role-mds-nfs/cluster/ses4-3.example.com.sls role-mds-nfs/cluster/ses4-4.example.com.sls role-mds-nfs/cluster/ses4-5.example.com.sls role-mon/cluster/ses4-0.example.com.sls role-mon/cluster/ses4-1.example.com.sls role-mon/cluster/ses4-2.example.com.sls role-mon/cluster/ses4-3.example.com.sls role-mon/cluster/ses4-4.example.com.sls role-mon/cluster/ses4-5.example.com.sls role-mon/stack/default/ceph/minions/ses4-0.example.com.yml role-mon/stack/default/ceph/minions/ses4-1.example.com.yml role-mon/stack/default/ceph/minions/ses4-2.example.com.yml role-mon/stack/default/ceph/minions/ses4-3.example.com.yml role-mon/stack/default/ceph/minions/ses4-4.example.com.yml role-mon/stack/default/ceph/minions/ses4-5.example.com.yml role-rgw/cluster/ses4-0.example.com.sls role-rgw/cluster/ses4-1.example.com.sls role-rgw/cluster/ses4-2.example.com.sls role-rgw/cluster/ses4-3.example.com.sls role-rgw/cluster/ses4-4.example.com.sls role-rgw/cluster/ses4-5.example.com.sls role-rgw-nfs/cluster/ses4-0.example.com.sls role-rgw-nfs/cluster/ses4-1.example.com.sls role-rgw-nfs/cluster/ses4-2.example.com.sls role-rgw-nfs/cluster/ses4-3.example.com.sls role-rgw-nfs/cluster/ses4-4.example.com.sls role-rgw-nfs/cluster/ses4-5.example.com.sls
What I actually wanted to deploy was a Ceph cluster with MONs on ses4-1, ses4-2 and ses4-3, and OSDs on ses4-1, ses4-2, ses4-3 and ses4-4. I didn’t want DeepSea to do anything with ses4-5, and I’ve elected not to deploy CephFS, RadosGW, openATTIC or any other services, because this blog post is going to be long enough as it is. So here’s what I pared my policy.cfg back to:
ses4-0:/srv/pillar/ceph/proposals # cat policy.cfg cluster-unassigned/cluster/*.sls cluster-ceph/cluster/ses4-[0-4].example.com.sls config/stack/default/ceph/cluster.yml config/stack/default/global.yml profile-1QEMU24GB-1/cluster/ses4-[1-4].example.com.sls profile-1QEMU24GB-1/stack/default/ceph/minions/ses4-[1-4].example.com.yml role-master/cluster/ses4-0.example.com.sls role-admin/cluster/ses4-[1-3].example.com.sls role-mon/cluster/ses4-[1-3].example.com.sls role-mon/stack/default/ceph/minions/ses4-[1-3].example.com.yml
The cluster-unassigned line defaults all nodes to not be part of the Ceph cluster. The following cluster-ceph line adds only those nodes I want DeepSea to manage (this is how I’m excluding ses4-5.example.com). Ordering is important here as later lines will override earlier lines.
The role-* lines determine which nodes are going to be MONs. role-admin is needed on the MON nodes to ensure the Ceph admin keyring is installed on those nodes.
The profile-* lines determine how my OSDs will be deployed. In my case, because this is a ridiculous toy cluster, I have only one disk configuration on all my nodes (a single 24GB volume). On a real cluster there may be several profiles available to choose from, potentially mixing drive types and using SSDs for journals. Again, this is covered in more detail in the policy docs.
Now that policy.cfg is set up correctly, it’s time to runs stages 2 and 3:
ses4-0:~ # salt-run state.orch ceph.stage.2
True
True
ses4-0.example.com_master:
----------
ID: push proposals
Function: salt.runner
Name: push.proposal
Result: True
Comment: Runner function 'push.proposal' executed.
Started: 23:13:43.092320
Duration: 209.321 ms
Changes: Invalid Changes data: True
----------
ID: refresh_pillar1
Function: salt.state
Result: True
Comment: States ran successfully. Updating ses4-1.example.com, ses4-5.example.com, ses4-0.example.com, ses4-2.example.com, ses4-4.example.com, ses4-3.example.com.
Started: 23:13:43.302018
Duration: 705.173 ms
[...]
Again, I’ve elided quite a bit of Salt output above. Stage 3 (deployment) can take a while, so if you’re looking for something to do while that’s happening, you can either play with `salt-run jobs.active` or `salt-run state.event pretty=True` in another terminal, or you can watch a video:
ses4-0:~ # salt-run state.orch ceph.stage.3
firewall : disabled
True
fsid : valid
public_network : valid
public_interface : valid
cluster_network : valid
cluster_interface : valid
monitors : valid
storage : valid
master_role : valid
mon_role : valid
mon_host : valid
mon_initial_members : valid
time_server : valid
fqdn : valid
True
ses4-1.example.com
[ERROR ] Run failed on minions: ses4-1.example.com, ses4-4.example.com, ses4-3.example.com, ses4-0.example.com, ses4-2.example.com
Failures:
ses4-1.example.com:
----------
ID: ntp
Function: pkg.installed
Result: True
Comment: All specified packages are already installed
Started: 23:16:18.251858
Duration: 367.172 ms
Changes:
----------
ID: sync time
Function: cmd.run
Name: sntp -S -c ses4-0.example.com
Result: False
Comment: Command "sntp -S -c ses4-0.example.com" run
Started: 23:16:18.620013
Duration: 37.438 ms
Changes:
----------
pid:
11002
retcode:
1
stderr:
sock_cb: 192.168.12.225 not in sync, skipping this server
stdout:
sntp 4.2.8p8@1.3265-o Mon Jun 6 08:12:56 UTC 2016 (1)
[...]
----------
ID: packages
Function: salt.state
Result: True
Comment: States ran successfully. Updating ses4-1.example.com, ses4-4.example.com, ses4-3.example.com, ses4-0.example.com, ses4-2.example.com.
Started: 23:16:19.035412
Duration: 15967.272 ms
Changes:
ses4-1.example.com:
----------
ID: ceph
Function: pkg.installed
Result: True
Comment: The following packages were installed/updated: ceph
Started: 23:16:19.666218
Duration: 15134.487 ms
[...]
----------
ID: monitors
Function: salt.state
Result: True
Comment: States ran successfully. Updating ses4-1.example.com, ses4-3.example.com, ses4-2.example.com.
Started: 23:16:36.622000
Duration: 891.694 ms
[...]
----------
ID: osd auth
Function: salt.state
Result: True
Comment: States ran successfully. Updating ses4-0.example.com.
Started: 23:16:37.513840
Duration: 540.991 ms
[...]
----------
ID: storage
Function: salt.state
Result: True
Comment: States ran successfully. Updating ses4-1.example.com, ses4-4.example.com, ses4-3.example.com, ses4-2.example.com.
Started: 23:16:38.054970
Duration: 10854.171 ms
[...]
The only failure above is a minor complaint about NTP. Everything else (installing the packages, deploying the MONs, creating the OSDs etc.) ran through just fine. Check it out:
ses4-1:~ # ceph status
cluster 9b259825-0af1-36a9-863a-e058e4b0706b
health HEALTH_OK
monmap e1: 3 mons at {ses4-1=192.168.12.170:6789/0,ses4-2=192.168.12.167:6789/0,ses4-3=192.168.12.148:6789/0}
election epoch 4, quorum 0,1,2 ses4-3,ses4-2,ses4-1
osdmap e9: 4 osds: 4 up, 4 in
flags sortbitwise
pgmap v18: 64 pgs, 1 pools, 16 bytes data, 3 objects
133 MB used, 77646 MB / 77779 MB avail
64 active+clean
We now have a running Ceph cluster. Like I said, this one is a toy, and I haven’t demonstrated stage 4 (services), but hopefully this has demonstrated the scalability possible when using Salt with DeepSea to deploy Ceph. The above process is the same whether you have four nodes or four hundred; it’s just the creation of a policy.cfg file plus a few `salt-run` invocations.
Finally, if you’re wondering about the title of this post, its what Cordelia Chase said the first time she saw Angel in Buffy The Vampire Slayer (time index 0:22 in this video). I was going to lead with that, but after watching the episode again, there’s teenage angst, a trip to the morgue and all sorts of other stuff, none of which really makes a good analogy for the technology I’m talking about here. The clickbait in my “Salt and Pepper Squid with Fresh Greens” post was much better.
Update: there’s now a DeepSea mailing list: http://lists.suse.com/mailman/listinfo/deepsea-users
Hm…
following the guide (opensuse 15.1) leads me to an error on stage 3 complaining “No highstate or sls specified, no execution made”.
Trying to apply a high state – fails “Comment: No Top file or master_tops data matches found. Please see master log for details”.
I’m trying to further debug it.