Hello Salty Goodness

Anyone who’s ever deployed Ceph presumably knows about ceph-deploy. It’s right there in the Deployment chapter of the upstream docs, and it’s pretty easy to use to get a toy test cluster up and running. For any decent sized cluster though, ceph-deploy rapidly becomes cumbersome… As just one example, do you really want to have to `ceph-deploy osd prepare` every disk? For larger production clusters it’s almost certainly better to use a fully-fledged configuration management tool, such as Salt, which is what this post is about.

For those not familiar with Salt, the quickest way I can think to describe it is as follows:

One host is the Salt master.
All the hosts you’re managing are Salt minions.
From the master, you can do things to the minions, for example you can run arbitrary commands. More interestingly though, you can create state files which will ensure minions are configured in a certain way – maybe a specific set of packages are installed, or some configuration files are created or modified.
Salt has a means of storing per-minion configuration data, in what’s called a pillar. All pillar data is stored on the Salt master.
Salt has various runner modules, which are convenience applications that execute on the Salt master. One such is the orchestrate runner, which can be used to apply configuration to specific minions in a certain order.

There’s a bit more to Salt than that, but the above should hopefully provide enough background that what follows below makes sense.

To use Salt to deploy Ceph, you obviously need to install Salt, but you also need a bunch of Salt state files, runners and modules that know how to do all the little nitty gritty pieces of Ceph deployment – installing the software, bootstrapping the MONs, creating the OSDs and so forth. Thankfully some of my esteemed colleagues on the SUSE Enterprise Storage team have created DeepSea, which provides exactly that. To really understand DeepSea, you should probably review the Intro, Management and Policy docs, but for this blog post, I’m going to take the classic approach of walking through set up of (you guessed it) a toy test cluster.

I have six hosts here, imaginatively named:

ses4-0.example.com
ses4-1.example.com
ses4-2.example.com
ses4-3.example.com
ses4-4.example.com
ses4-5.example.com

They’re all running SLES 12 SP2, and they all have one extra disk, most of which will be used for Ceph OSDs.

ses4-0 will be my Salt master. Every host will also be a Salt minion, including ses4-0. This is because DeepSea needs to perform certain operations on the Salt master, notably automatically creating initial pillar configuration data.

So, first we have to install Salt. You can do this however you see fit, but in my case it goes something like this:

ssh ses4-0 'zypper --non-interactive install salt-master ;
            systemctl enable salt-master ;
            systemctl start salt-master'
for n in $(seq 0 5) ; do
    ssh ses4-$n 'zypper --non-interactive install salt-minion ;
                 echo "master: ses4-0.example.com" > /etc/salt/minion.d/master.conf ;
                 systemctl enable salt-minion ;
                 systemctl start salt-minion'
done

Then, on ses4-0, accept all the minion keys, and do a test.ping if you like:

ses4-0:~ # salt-key -A
The following keys are going to be accepted:
Unaccepted Keys:
ses4-0.example.com
ses4-1.example.com
ses4-2.example.com
ses4-3.example.com
ses4-4.example.com
ses4-5.example.com
Proceed? [n/Y] y
Key for minion ses4-0.example.com accepted.
Key for minion ses4-1.example.com accepted.
Key for minion ses4-2.example.com accepted.
Key for minion ses4-3.example.com accepted.
Key for minion ses4-4.example.com accepted.
Key for minion ses4-5.example.com accepted.

ses4-0:~ # salt '*' test.ping
ses4-0.example.com:
    True
ses4-5.example.com:
    True
ses4-3.example.com:
    True
ses4-2.example.com:
    True
ses4-1.example.com:
    True
ses4-4.example.com:
    True

That works, so now DeepSea needs to be installed on the salt-master host (ses4-0). There’s RPMs available for various SUSE distros, so I just ran `zypper in deepsea`, but for other distros you can run `make install` from a clone of the source tree to get everything in the right place. If you’re doing the latter, you’ll need to `systemctl restart salt-master` so that it picks up /etc/salt/master.d/modules.conf and /etc/salt/master.d/reactor.conf included with DeepSea. Note: we’ll happily accept patches from anyone who’s interested in helping with packaging for other distros 😉

There’s one other tweak required. /srv/pillar/ceph/master_minion.sls specifies the hostname of the salt master. When installing the RPM, this is automatically set to $(hostname -f), so in my case I have:

ses4-0:~ # cat /srv/pillar/ceph/master_minion.sls 
master_minion: ses4-0.example.com

If you’re not using the RPM, you’ll need to tweak that file appropriately by hand.

Now comes the interesting part. DeepSea splits Ceph deployment into several stages, as follows:

Stage 0: Provisioning: Ensures latest updates are installed on all minions (technically this is optional).
Stage 1: Discovery: Interrogates all the minions and creates pillar configuration fragments in /srv/pillar/ceph/proposals.
Stage 2: Configure: Before running this stage, you have to create a /srv/pillar/ceph/proposals/policy.cfg file, specifying which nodes are to have which roles (MON, OSD, etc.). This stage then merges the pillar data into its final form.
Stage 3: Deploy: Validates the pillar data to ensure the configuration is correct, then deploys the MONs and OSDs.
Stage 4: Services: Deploys non-core services (iSCSI gateway, CephFS, RadosGW, openATTIC).
Stage 5: Removal: Used to decommission hosts.

Let’s give it a try. Note that there’s presently a couple of SUSE-isms in DeepSea (notably some invocations of `zypper` to install software), so if you’re following along at home on a different distro and run into any kinks, please either let us know what’s broken or open a PR if you’ve got a fix.

ses4-0:~ # salt-run state.orch ceph.stage.0
master_minion            : valid
ceph_version             : valid
None

###########################################################
The salt-run command reports when all minions complete.
The command may appear to hang.  Interrupting (e.g. Ctrl-C) 
does not stop the command.

In another terminal, try 'salt-run jobs.active' or
'salt-run state.event pretty=True' to see progress.
###########################################################

False
True
[WARNING ] All minions are ready
True
ses4-0.example.com_master:
----------
          ID: sync master
    Function: salt.state
      Result: True
     Comment: States ran successfully. Updating ses4-0.example.com.
     Started: 23:04:06.166492
    Duration: 459.509 ms
     Changes:   
              ses4-0.example.com:
              ----------
                        ID: load modules
                  Function: module.run
                      Name: saltutil.sync_all
                    Result: True
                   Comment: Module function saltutil.sync_all executed
                   Started: 23:04:06.312110
                  Duration: 166.785 ms
                   Changes:   
                            ----------
                            ret:
                                ----------
                                beacons:
                                grains:
                                log_handlers:
                                modules:
                                output:
                                proxymodules:
                                renderers:
                                returners:
                                sdb:
                                states:
                                utils:

              Summary for ses4-0.example.com
              ------------
              Succeeded: 1 (changed=1)
              Failed:    0
              ------------
              Total states run:     1
[...]

There’s a whole lot more output than I’ve quoted above, because that’s what happens with Salt when you apply a whole lot of state to a bunch of minions, but it finished up with:

Summary for ses4-0.example.com_master
-------------
Succeeded: 15 (changed=9)
Failed:     0
-------------
Total states run:     15

Stage 1 (discovery) is next, to interrogate all the minions and create configuration fragments:

ses4-0:~ # salt-run state.orch ceph.stage.1
[WARNING ] All minions are ready
True
- True
ses4-0.example.com_master:
----------
          ID: ready
    Function: salt.runner
        Name: minions.ready
      Result: True
     Comment: Runner function 'minions.ready' executed.
     Started: 23:05:02.991371
    Duration: 588.429 ms
     Changes:   Invalid Changes data: True
----------
          ID: discover
    Function: salt.runner
        Name: populate.proposals
      Result: True
     Comment: Runner function 'populate.proposals' executed.
     Started: 23:05:03.580038
    Duration: 1627.563 ms
     Changes:   Invalid Changes data: [True]

Summary for ses4-0.example.com_master
------------
Succeeded: 2 (changed=2)
Failed:    0
------------
Total states run:     2

Now there’s a bunch of interesting data from all the minions stored as SLS and YAML files in subdirectories of /srv/pillar/ceph/proposals. This is the point at which you get to decide exactly how your cluster will be deployed – what roles will be assigned to each node, and how the OSDs will be configured. You do this by creating a file called /srv/pillar/ceph/proposals/policy.cfg, which in turn includes the relevant configuration fragments generated during the discovery stage.

Creating /srv/pillar/ceph/proposals/policy.cfg is probably the one part of using DeepSea that’s easy to screw up. We’re working on making it easier, and the policy docs and examples will help, but in the meantime the approach I’ve taken personally is to generate a policy.cfg including every possible option, then get rid of the ones I don’t want. Here’s a dump of every single configuration fragment generated by the discovery stage on my toy test cluster:

ses4-0:~ # cd /srv/pillar/ceph/proposals/
ses4-0:/srv/pillar/ceph/proposals # find * -name \*.sls -o -name \*.yml | sort > policy.cfg
ses4-0:/srv/pillar/ceph/proposals # cat policy.cfg 
cluster-ceph/cluster/ses4-0.example.com.sls
cluster-ceph/cluster/ses4-1.example.com.sls
cluster-ceph/cluster/ses4-2.example.com.sls
cluster-ceph/cluster/ses4-3.example.com.sls
cluster-ceph/cluster/ses4-4.example.com.sls
cluster-ceph/cluster/ses4-5.example.com.sls
cluster-unassigned/cluster/ses4-0.example.com.sls
cluster-unassigned/cluster/ses4-1.example.com.sls
cluster-unassigned/cluster/ses4-2.example.com.sls
cluster-unassigned/cluster/ses4-3.example.com.sls
cluster-unassigned/cluster/ses4-4.example.com.sls
cluster-unassigned/cluster/ses4-5.example.com.sls
config/stack/default/ceph/cluster.yml
config/stack/default/global.yml
profile-1QEMU24GB-1/cluster/ses4-0.example.com.sls
profile-1QEMU24GB-1/cluster/ses4-1.example.com.sls
profile-1QEMU24GB-1/cluster/ses4-2.example.com.sls
profile-1QEMU24GB-1/cluster/ses4-3.example.com.sls
profile-1QEMU24GB-1/cluster/ses4-4.example.com.sls
profile-1QEMU24GB-1/cluster/ses4-5.example.com.sls
profile-1QEMU24GB-1/stack/default/ceph/minions/ses4-0.example.com.yml
profile-1QEMU24GB-1/stack/default/ceph/minions/ses4-1.example.com.yml
profile-1QEMU24GB-1/stack/default/ceph/minions/ses4-2.example.com.yml
profile-1QEMU24GB-1/stack/default/ceph/minions/ses4-3.example.com.yml
profile-1QEMU24GB-1/stack/default/ceph/minions/ses4-4.example.com.yml
profile-1QEMU24GB-1/stack/default/ceph/minions/ses4-5.example.com.yml
role-admin/cluster/ses4-0.example.com.sls
role-admin/cluster/ses4-1.example.com.sls
role-admin/cluster/ses4-2.example.com.sls
role-admin/cluster/ses4-3.example.com.sls
role-admin/cluster/ses4-4.example.com.sls
role-admin/cluster/ses4-5.example.com.sls
role-client-cephfs/cluster/ses4-0.example.com.sls
role-client-cephfs/cluster/ses4-1.example.com.sls
role-client-cephfs/cluster/ses4-2.example.com.sls
role-client-cephfs/cluster/ses4-3.example.com.sls
role-client-cephfs/cluster/ses4-4.example.com.sls
role-client-cephfs/cluster/ses4-5.example.com.sls
role-client-iscsi/cluster/ses4-0.example.com.sls
role-client-iscsi/cluster/ses4-1.example.com.sls
role-client-iscsi/cluster/ses4-2.example.com.sls
role-client-iscsi/cluster/ses4-3.example.com.sls
role-client-iscsi/cluster/ses4-4.example.com.sls
role-client-iscsi/cluster/ses4-5.example.com.sls
role-client-radosgw/cluster/ses4-0.example.com.sls
role-client-radosgw/cluster/ses4-1.example.com.sls
role-client-radosgw/cluster/ses4-2.example.com.sls
role-client-radosgw/cluster/ses4-3.example.com.sls
role-client-radosgw/cluster/ses4-4.example.com.sls
role-client-radosgw/cluster/ses4-5.example.com.sls
role-igw/cluster/ses4-0.example.com.sls
role-igw/cluster/ses4-1.example.com.sls
role-igw/cluster/ses4-2.example.com.sls
role-igw/cluster/ses4-3.example.com.sls
role-igw/cluster/ses4-4.example.com.sls
role-igw/cluster/ses4-5.example.com.sls
role-igw/stack/default/ceph/minions/ses4-0.example.com.yml
role-igw/stack/default/ceph/minions/ses4-1.example.com.yml
role-igw/stack/default/ceph/minions/ses4-2.example.com.yml
role-igw/stack/default/ceph/minions/ses4-3.example.com.yml
role-igw/stack/default/ceph/minions/ses4-4.example.com.yml
role-igw/stack/default/ceph/minions/ses4-5.example.com.yml
role-master/cluster/ses4-0.example.com.sls
role-master/cluster/ses4-1.example.com.sls
role-master/cluster/ses4-2.example.com.sls
role-master/cluster/ses4-3.example.com.sls
role-master/cluster/ses4-4.example.com.sls
role-master/cluster/ses4-5.example.com.sls
role-mds/cluster/ses4-0.example.com.sls
role-mds/cluster/ses4-1.example.com.sls
role-mds/cluster/ses4-2.example.com.sls
role-mds/cluster/ses4-3.example.com.sls
role-mds/cluster/ses4-4.example.com.sls
role-mds/cluster/ses4-5.example.com.sls
role-mds-nfs/cluster/ses4-0.example.com.sls
role-mds-nfs/cluster/ses4-1.example.com.sls
role-mds-nfs/cluster/ses4-2.example.com.sls
role-mds-nfs/cluster/ses4-3.example.com.sls
role-mds-nfs/cluster/ses4-4.example.com.sls
role-mds-nfs/cluster/ses4-5.example.com.sls
role-mon/cluster/ses4-0.example.com.sls
role-mon/cluster/ses4-1.example.com.sls
role-mon/cluster/ses4-2.example.com.sls
role-mon/cluster/ses4-3.example.com.sls
role-mon/cluster/ses4-4.example.com.sls
role-mon/cluster/ses4-5.example.com.sls
role-mon/stack/default/ceph/minions/ses4-0.example.com.yml
role-mon/stack/default/ceph/minions/ses4-1.example.com.yml
role-mon/stack/default/ceph/minions/ses4-2.example.com.yml
role-mon/stack/default/ceph/minions/ses4-3.example.com.yml
role-mon/stack/default/ceph/minions/ses4-4.example.com.yml
role-mon/stack/default/ceph/minions/ses4-5.example.com.yml
role-rgw/cluster/ses4-0.example.com.sls
role-rgw/cluster/ses4-1.example.com.sls
role-rgw/cluster/ses4-2.example.com.sls
role-rgw/cluster/ses4-3.example.com.sls
role-rgw/cluster/ses4-4.example.com.sls
role-rgw/cluster/ses4-5.example.com.sls
role-rgw-nfs/cluster/ses4-0.example.com.sls
role-rgw-nfs/cluster/ses4-1.example.com.sls
role-rgw-nfs/cluster/ses4-2.example.com.sls
role-rgw-nfs/cluster/ses4-3.example.com.sls
role-rgw-nfs/cluster/ses4-4.example.com.sls
role-rgw-nfs/cluster/ses4-5.example.com.sls

What I actually wanted to deploy was a Ceph cluster with MONs on ses4-1, ses4-2 and ses4-3, and OSDs on ses4-1, ses4-2, ses4-3 and ses4-4. I didn’t want DeepSea to do anything with ses4-5, and I’ve elected not to deploy CephFS, RadosGW, openATTIC or any other services, because this blog post is going to be long enough as it is. So here’s what I pared my policy.cfg back to:

ses4-0:/srv/pillar/ceph/proposals # cat policy.cfg 
cluster-unassigned/cluster/*.sls
cluster-ceph/cluster/ses4-[0-4].example.com.sls
config/stack/default/ceph/cluster.yml
config/stack/default/global.yml
profile-1QEMU24GB-1/cluster/ses4-[1-4].example.com.sls
profile-1QEMU24GB-1/stack/default/ceph/minions/ses4-[1-4].example.com.yml
role-master/cluster/ses4-0.example.com.sls
role-admin/cluster/ses4-[1-3].example.com.sls
role-mon/cluster/ses4-[1-3].example.com.sls
role-mon/stack/default/ceph/minions/ses4-[1-3].example.com.yml

The cluster-unassigned line defaults all nodes to not be part of the Ceph cluster. The following cluster-ceph line adds only those nodes I want DeepSea to manage (this is how I’m excluding ses4-5.example.com). Ordering is important here as later lines will override earlier lines.

The role-* lines determine which nodes are going to be MONs. role-admin is needed on the MON nodes to ensure the Ceph admin keyring is installed on those nodes.

The profile-* lines determine how my OSDs will be deployed. In my case, because this is a ridiculous toy cluster, I have only one disk configuration on all my nodes (a single 24GB volume). On a real cluster there may be several profiles available to choose from, potentially mixing drive types and using SSDs for journals. Again, this is covered in more detail in the policy docs.

Now that policy.cfg is set up correctly, it’s time to runs stages 2 and 3:

ses4-0:~ # salt-run state.orch ceph.stage.2
True
True
ses4-0.example.com_master:
----------
          ID: push proposals
    Function: salt.runner
        Name: push.proposal
      Result: True
     Comment: Runner function 'push.proposal' executed.
     Started: 23:13:43.092320
    Duration: 209.321 ms
     Changes:   Invalid Changes data: True
----------
          ID: refresh_pillar1
    Function: salt.state
      Result: True
     Comment: States ran successfully. Updating ses4-1.example.com, ses4-5.example.com, ses4-0.example.com, ses4-2.example.com, ses4-4.example.com, ses4-3.example.com.
     Started: 23:13:43.302018
    Duration: 705.173 ms
[...]

Again, I’ve elided quite a bit of Salt output above. Stage 3 (deployment) can take a while, so if you’re looking for something to do while that’s happening, you can either play with `salt-run jobs.active` or `salt-run state.event pretty=True` in another terminal, or you can watch a video:

ses4-0:~ # salt-run state.orch ceph.stage.3
firewall                 : disabled
True
fsid                     : valid
public_network           : valid
public_interface         : valid
cluster_network          : valid
cluster_interface        : valid
monitors                 : valid
storage                  : valid
master_role              : valid
mon_role                 : valid
mon_host                 : valid
mon_initial_members      : valid
time_server              : valid
fqdn                     : valid
True
ses4-1.example.com
[ERROR   ] Run failed on minions: ses4-1.example.com, ses4-4.example.com, ses4-3.example.com, ses4-0.example.com, ses4-2.example.com
Failures:
    ses4-1.example.com:
    ----------
              ID: ntp
        Function: pkg.installed
          Result: True
         Comment: All specified packages are already installed
         Started: 23:16:18.251858
        Duration: 367.172 ms
         Changes:   
    ----------
              ID: sync time
        Function: cmd.run
            Name: sntp -S -c ses4-0.example.com
          Result: False
         Comment: Command "sntp -S -c ses4-0.example.com" run
         Started: 23:16:18.620013
        Duration: 37.438 ms
         Changes:   
                  ----------
                  pid:
                      11002
                  retcode:
                      1
                  stderr:
                      sock_cb: 192.168.12.225 not in sync, skipping this server
                  stdout:
                      sntp 4.2.8p8@1.3265-o Mon Jun  6 08:12:56 UTC 2016 (1)
[...]
----------
          ID: packages
    Function: salt.state
      Result: True
     Comment: States ran successfully. Updating ses4-1.example.com, ses4-4.example.com, ses4-3.example.com, ses4-0.example.com, ses4-2.example.com.
     Started: 23:16:19.035412
    Duration: 15967.272 ms
     Changes:   
              ses4-1.example.com:
              ----------
                        ID: ceph
                  Function: pkg.installed
                    Result: True
                   Comment: The following packages were installed/updated: ceph
                   Started: 23:16:19.666218
                  Duration: 15134.487 ms
[...]
----------
          ID: monitors
    Function: salt.state
      Result: True
     Comment: States ran successfully. Updating ses4-1.example.com, ses4-3.example.com, ses4-2.example.com.
     Started: 23:16:36.622000
    Duration: 891.694 ms
[...]
----------
          ID: osd auth
    Function: salt.state
      Result: True
     Comment: States ran successfully. Updating ses4-0.example.com.
     Started: 23:16:37.513840
    Duration: 540.991 ms
[...]
----------
          ID: storage
    Function: salt.state
      Result: True
     Comment: States ran successfully. Updating ses4-1.example.com, ses4-4.example.com, ses4-3.example.com, ses4-2.example.com.
     Started: 23:16:38.054970
    Duration: 10854.171 ms
[...]

The only failure above is a minor complaint about NTP. Everything else (installing the packages, deploying the MONs, creating the OSDs etc.) ran through just fine. Check it out:

ses4-1:~ # ceph status
    cluster 9b259825-0af1-36a9-863a-e058e4b0706b
     health HEALTH_OK
     monmap e1: 3 mons at {ses4-1=192.168.12.170:6789/0,ses4-2=192.168.12.167:6789/0,ses4-3=192.168.12.148:6789/0}
            election epoch 4, quorum 0,1,2 ses4-3,ses4-2,ses4-1
     osdmap e9: 4 osds: 4 up, 4 in
            flags sortbitwise
      pgmap v18: 64 pgs, 1 pools, 16 bytes data, 3 objects
            133 MB used, 77646 MB / 77779 MB avail
                  64 active+clean

We now have a running Ceph cluster. Like I said, this one is a toy, and I haven’t demonstrated stage 4 (services), but hopefully this has demonstrated the scalability possible when using Salt with DeepSea to deploy Ceph. The above process is the same whether you have four nodes or four hundred; it’s just the creation of a policy.cfg file plus a few `salt-run` invocations.

Finally, if you’re wondering about the title of this post, its what Cordelia Chase said the first time she saw Angel in Buffy The Vampire Slayer (time index 0:22 in this video). I was going to lead with that, but after watching the episode again, there’s teenage angst, a trip to the morgue and all sorts of other stuff, none of which really makes a good analogy for the technology I’m talking about here. The clickbait in my “Salt and Pepper Squid with Fresh Greens” post was much better.

2 thoughts on “Hello Salty Goodness”

Tim on 2016-12-01 at 15:10 said:

Update: there’s now a DeepSea mailing list: http://lists.suse.com/mailman/listinfo/deepsea-users

Reply ↓
Strahil Nikolov on 2019-06-25 at 06:11 said:

Hm…
following the guide (opensuse 15.1) leads me to an error on stage 3 complaining “No highstate or sls specified, no execution made”.

Trying to apply a high state – fails “Comment: No Top file or master_tops data matches found. Please see master log for details”.

I’m trying to further debug it.

Reply ↓

Ourobengr

An engineer eating his own tail

Hello Salty Goodness

2 thoughts on “Hello Salty Goodness”

Leave a Reply to Tim Cancel reply