Hackweek 9: Ceph Appliance Odyssey

This week is SUSE Hack Week 9. I wanted to spend some time working on a Ceph appliance image to make it easy to play with Ceph on openSUSE and/or SLES.

I tried making a SLES 11 SP2 appliance with SUSE Studio. I had to add the filesystems and devel:libraries:c_c++ repos from OBS to get reasonably up-to-date Ceph 0.56 and libboost_thread.so.1.49.0, but on boot when the appliance tried to expand its root filesystem, it died claiming it couldn’t load libe2p.so.2. Studio claims to be pulling in e2fsprogs from both the SP2 Updates and filesystems repo, so maybe that’s the problem. It seems impossible to choose one or the other, as they are the same version. (Update: it was just pointed out to me that you can click the little box next to the version number to choose which one is installed – must try again.)

So I left that alone and tried an openSUSE 12.3 appliance. The filesystems/ceph build for 12.3 is disabled, so I branched it and kicked off a build which failed with an exciting OOM error:

[ 3831s] [ 3803.167109] Out of memory: Kill process 16364 (cc1plus) score 254 or sacrifice child
[ 3831s] [ 3803.167959] Killed process 16364 (cc1plus) total-vm:825128kB, anon-rss:168760kB, file-rss:4kB
[ 3831s] g++: internal compiler error: Killed (program cc1plus)
[ 3831s] Please submit a full bug report,
[ 3831s] with preprocessed source if appropriate.
[ 3831s] See  for instructions.

Guess I should do what it says and file a bug. But I really did want something to play with immediately, so I added http://ceph.com/rpm/opensuse12/x86_64/ as a repo, and pulled in the upstream Ceph 0.56 RPMs. This seems to have worked and given me an openSUSE 12.3 image I can use to run through the Ceph 5-Minute Quick Start, Block Device Quick Start and CephFS Quick Start. So, here’s my extremely terse openSUSEified version of those quick start documents:

5-Minute Quick Start

Deploy the Appliance Image

I’m doing this with a couple of VMs, so in my case I make a couple of copies of the image:

# cp ~/openSUSE_12.3_Ceph_0.56.x86_64-0.0.3.qcow2 \
    /var/lib/libvirt/images/ceph-quickstart-server.qcow2
# cp ~/openSUSE_12.3_Ceph_0.56.x86_64-0.0.3.qcow2 \
    /var/lib/libvirt/images/ceph-quickstart-client.qcow2

Then I use virt-manager to create two VMs, backed by those images. Boot ’em up, log in (root password is “linux”), run yast network and set sensible hostnames (“ceph-client” and “ceph-server” instead of “linux-kjqd”, although admittedly those names wouldn’t be very sensible in a real deployment with more than one node).

Edit the Configuration File

The appliance image includes the /etc/ceph/ceph.conf file from the original 5-minute quick start, so log in to ceph-server, edit that file and replace {hostname} and {ip-address} with their real values, then copy the configuration file to ceph-client:

# scp /etc/ceph/ceph.conf ceph-client:/etc/ceph/

Deploy the Configuration

On ceph-server, create directories for each daemon:

# mkdir -p /var/lib/ceph/osd/ceph-0
# mkdir -p /var/lib/ceph/osd/ceph-1
# mkdir -p /var/lib/ceph/mon/ceph-a
# mkdir -p /var/lib/ceph/mds/ceph-a

Still on ceph-server, run the following:

# cd /etc/ceph
# mkcephfs -a -c /etc/ceph/ceph.conf -k ceph.keyring

Start Ceph

On ceph-server:

# chkconfig ceph on
# rcceph start
# ceph health

This will initially show something like:

HEALTH_ERR 576 pgs stuck inactive; 576 pgs stuck unclean; no osds

Eventually it will say HEALTH_OK and you’re good to go.

Copy the Keyring to the Client

This is necessary for authentication:

# scp /etc/ceph/ceph.keyring ceph-client:/etc/ceph/

Block Device Quick Start

On ceph-client:

# rbd create foo --size 4096
# modprobe rbd
# rbd map foo --pool rbd --name client.admin
# mkfs.ext4 -m0 /dev/rbd1
# mkdir /mnt/myrbd
# mount /dev/rbd1 /mnt/myrbd

(Why is this /dev/rbd1, not /dev/rbd/rbd/foo as in the original quick start?)

CephFS Quick Start

On ceph-client (kernel driver, not FUSE):

# mkdir /mnt/mycephfs
# mount -t ceph -o name=admin,secret=$(ceph-authtool \
    --name client.admin /etc/ceph/ceph.keyring --print-key) \
    ceph-server:/ /mnt/mycephfs

Interestingly, this gives “mount: error writing /etc/mtab: Invalid argument”, but still seems to actually mount the filesystem.

Also note that it appears I have 32GB of space for Ceph to use, even though ceph-server only has a 16GB root partition. I rather think that’s because there’s two OSDs, but both are just running off the root filesystem, they’re not separate disks/filesystems. I assume this is one of those Don’t Try This At Home things.

 

openSUSE 12.3 / Lenovo T430

My new Lenovo T430 arrived last week. After delighting in that satisfying new laptop smell, I made recovery DVDs I will presumably never need, then blew away Windows 7 and installed openSUSE 12.3 (full disclosure: I work for SUSE, so my choice of distro may not be entirely unbiased).

Some niceties:

  • The textured touchpad is lovely. Much better feel than a pure flat surface.
  • As I’d expect, the keyboard is excellent (even if PGUP/PGDN aren’t where I’m used to).
  • The openSUSE installer is quick and easy. I’m pretty sure there’s less steps than last time I did a regular openSUSE install from scratch a couple of years ago.
  • No problem setting up encrypted LVM, although on my ~500GB drive it defaults to a 20GB root and 25GB /home, with a whole lotta free space left over in the encrypted partition, so that might want some tweaking.
  • Entering the passphrase on boot happens on a pretty graphical screen, you don’t get thrown back to a terminal window where random junk is appearing over the passphrase entry prompt.
  • Moving my mail over from my old laptop was pretty much just an rsync of the Thunderbird profile directory (and maybe a tweak to ~/.thunderbird/profiles.ini)

Some oddities:

  • The Novell GroupWise 8.0.2 client had a couple of problems:
    • It claims to need libXm.so.3 (listed in RPM Requires), but works fine without it. This is fortunate, because openSUSE 12.3 doesn’t ship openmotif22-libs-32bit anymore.
    • Unless you’ve installed libpangox-1_0-0-32bit, the GroupWise client will segfault somewhere in libwebrenderer.so. This is less than obvious.
  • The YaST disk partitioner seems slightly confused adding new LVs inside my encrypted VG later on (it either locked up or crashed). I haven’t had time to investigate this properly, so I’ve ignored it for the moment and used lvcreate and mkfs in a terminal instead.
  • You do need to reboot at least once after initial install for NetworkManager to work properly (this is mentioned in the release notes).
  • I’m running GNOME 3.6, and I tried using the tweak tool to have it just blank the screen – not suspend – when closing the laptop lid. Turns out systemd is being too clever for me, so I had to fiddle with that a bit (set HandleLidSwitch=ignore in /etc/systemd/logind.conf, then run sudo systemctl restart systemd-logind).

Very little else to report so far. Aside from the oddities above everything else seems to Just WorkTM. OTOH, all I’ve really done is web browsing, email and assorted fiddling around in terminals. Maybe listened to a bit of music (the inbuilt speakers are well and truly loud enough, but a bit tinnier than real speakers – can’t say I’m terribly surprised by that though).

Cloud Infrastructure, Distributed Storage and High Availability at LCA 2013

I’m pleased to announce that we will be holding a one day Cloud Infrastructure, Distributed Storage and High Availability mini conference on Monday 28 January 2013 as part of linux.conf.au 2013 in Canberra, Australia.

This miniconf is about building reliable infrastructure, from two-node HA failover pairs to multi-thousand-core cloud systems. You might like to think of it as a sequel to the LCA 2012 High Availability and Distributed Storage miniconf (videos here).

Do any of the following describe you?

  • You’re building cloud infrastructure for others to use (openstack, cloudstack, eucalyptus, …)
  • Your data needs to be reliably available everywhere (ceph, glusterfs, drbd, …)
  • Your system absolutely must be up all the time (pacemaker, corosync, …)

If so, this is the miniconf for you! Please consider submitting a presentation at http://tinyurl.com/cidsha-lca2013

We’re expecting most talk slots to be 25 minutes (including questions and changeover), but there will be openings for shorter lightning talks and maybe a couple of longer talks. CFP closes on Sunday November 4, 2012. Notifications of acceptance will be emailed out after this date.

Note that there is also an OpenStack-specific miniconf running on Tuesday 29 January. We’re hoping this will give us a pretty awesome two-day LCA 2013 CloudFest. As a rough rule of thumb, more generic or infrastructure-related talks should go to Cloud, Distributed Storage & HA, while deeper OpenStack-specific talks should probably go to the OpenStack miniconf. If in doubt, or if you have any other questions, please contact me directly.

Thanks!

A Real openSUSE 12.2 Overo Image

Carrying on from my last post, with advice from Alexander Graf on #opensuse-arm, I was able to put together a reasonable looking pre-built openSUSE 12.2 image for the Gumstix Overo.  This is currently in home:tserong:branches:openSUSE:12.2:ARM on OBS, and will presumably remain there until next time I’m able to hack on this.

The two things that needed doing were:

  1. u-boot-omap3overo package, which is u-boot-omap4panda with an extra spec file and a small patch to add some load addresses and make the default environment boot off ext2 instead of FAT:
Index: u-boot-2012.04.01/include/configs/omap3_overo.h
===================================================================
--- u-boot-2012.04.01.orig/include/configs/omap3_overo.h
+++ u-boot-2012.04.01/include/configs/omap3_overo.h
@@ -148,6 +148,8 @@

 #define CONFIG_EXTRA_ENV_SETTINGS \
 	"loadaddr=0x82000000\0" \
+	"kerneladdr=0x80200000\0" \
+	"ramdiskaddr=0x81000000\0" \
 	"console=ttyO2,115200n8\0" \
 	"mpurate=500\0" \
 	"optargs=\0" \
@@ -175,10 +177,10 @@
 		"omapdss.def_disp=${defaultdisplay} " \
 		"root=${nandroot} " \
 		"rootfstype=${nandrootfstype}\0" \
-	"loadbootscript=fatload mmc ${mmcdev} ${loadaddr} boot.scr\0" \
+	"loadbootscript=ext2load mmc ${mmcdev} ${loadaddr} boot.scr\0" \
 	"bootscript=echo Running bootscript from mmc ...; " \
 		"source ${loadaddr}\0" \
-	"loaduimage=fatload mmc ${mmcdev} ${loadaddr} uImage\0" \
+	"loaduimage=ext2load mmc ${mmcdev} ${loadaddr} uImage\0" \
 	"mmcboot=echo Booting from mmc ...; " \
 		"run mmcargs; " \
 		"bootm ${loadaddr}\0" \
  1. A JeOS-overo image, which is just another KIWI file for the base ARM JeOS image with a few tweaks to make it use my u-boot-omap3 overo package and the right kernel.  To give an idea, the diff between JeOS-panda.kiwi and JeOS-overo.kiwi is:
--- JeOS-panda.kiwi	2012-09-24 22:47:28.448247822 +1000
+++ JeOS-overo.kiwi	2012-09-24 22:49:16.844588795 +1000
@@ -1,5 +1,5 @@
 <?xml version="1.0" encoding="utf-8"?>
-<image schemaversion="5.3" name="openSUSE-12.2-ARM-panda">
+<image schemaversion="5.3" name="openSUSE-12.2-ARM-overo">
   <!--
  *****************************************************************************
  *****************************************************************************
@@ -13,11 +13,11 @@
     <author>Marcus Schäfer</author>
     <contact>ms@novell.com</contact>
     <specification>
-   openSUSE 12.2 image for ARM (panda) boards
+   openSUSE 12.2 image for ARM (overo) boards
   </specification>
   </description>
   <preferences>
-    <type image="oem" filesystem="ext3" boot="oemboot/suse-12.2" bootloader="uboot" bootkernel="omap4panda" kernelcmdline="console=ttyO2 vram=16M">
+    <type image="oem" filesystem="ext3" boot="oemboot/suse-12.2" bootloader="uboot" kernelcmdline="console=ttyO2 vram=12M">
       <oemconfig>
         <oem-swapsize>500</oem-swapsize>
       </oemconfig>
@@ -35,6 +35,9 @@
     <user pwd="$1$wYJUgpM5$RXMMeASDc035eX.NbYWFl0" home="/root" name="root"/>
   </users>
   <repository type="rpm-md">
+    <source path="obs://home:tserong:branches:openSUSE:12.2:ARM/standard"/>
+  </repository>
+  <repository type="rpm-md">
     <source path="obs://openSUSE:12.2:ARM/standard"/>
   </repository>
   <!-- dont remove qemu binfmt helpers from initrd -->
@@ -44,7 +47,7 @@
   </strip>
   <packages type="bootstrap">
     <package name="kernel-omap2plus" bootinclude="true"/>
-    <package name="u-boot-omap4panda"/>
+    <package name="u-boot-omap3overo"/>
     <package name="aaa_base"/>
     <package name="aaa_base-extras"/>
     <package name="branding-openSUSE"/>

I couldn’t specify bootkernel="omap3overo" because that profile doesn’t exist in kiwi. Leaving this out, combined with the reference to my repository and <package name="kernel-omap2plus" bootinclude="true"/> miraculously gave me the right kernel.

To actually get it running, first the image went onto a MicroSD card:

# xzcat openSUSE-12.2-ARM-overo.armv7l-1.12.1-Build1.17.3.raw.xz | dd bs=4M of=/dev/mmcblk0
# sync

Then the card went into the Overo, power was applied and it immediately failed to boot, because X-Loader couldn’t read the boot sector on the MicroSD.

Texas Instruments X-Loader 1.4.4ss (Oct 20 2010 - 10:10:28)
OMAP3530-GP ES3.1
Board revision: 0
Reading boot sector
Error: reading boot sector
Loading u-boot.bin from nand

U-Boot 2010.09 (Oct 20 2010 - 10:11:49)

OMAP3530-GP ES3.1, CPU-OPP2, L3-165MHz, Max CPU Clock 720 mHz
Gumstix Overo board + LPDDR/NAND
I2C:   ready
DRAM:  256 MiB
NAND:  256 MiB
In:    serial
Out:   serial
Err:   serial
Board revision: 0
Tranceiver detected on mmc2
timed out in wait_for_bb: I2C_STAT=1000
timed out in wait_for_bb: I2C_STAT=1000
timed out in wait_for_pin: I2C_STAT=1000
I2C read: I/O error
Unrecognized expansion board
Die ID #529e0004000000000403a1f303025013
Net:   smc911x-0
Hit any key to stop autoboot:  0 
Overo #

But, the old u-boot in my Overo’s NAND is sufficiently advanced that it can read from an ext2 partition, so I was able to do this:

Overo # mmc init
mmc1 is available
Overo # setenv kerneladdr 0x80200000
Overo # setenv ramdiskaddr 0x81000000
Overo # ext2load mmc1 0:1 ${loadaddr} boot.scr
Loading file "boot.scr" from mmc1 device 0:1 (xxa1)
541 bytes read
Overo # run bootscript
Running bootscript from mmc ...
## Executing script at 82000000
kerneladdr=0x80200000
ramdiskaddr=0x81000000
Loading file "boot/linux.vmx" from mmc device 0:1 (xxa1)
4063704 bytes read
Loading file "boot/initrd.uboot" from mmc device 0:1 (xxa1)
38289754 bytes read
## Booting kernel from Legacy Image at 80200000 ...
   Image Name:   Linux-3.4.6-2.10-omap2plus
   Image Type:   ARM Linux Kernel Image (uncompressed)
   Data Size:    4063640 Bytes = 3.9 MiB
   Load Address: 80008000
   Entry Point:  80008000
   Verifying Checksum ... OK
## Loading init Ramdisk from Legacy Image at 81000000 ...
   Image Name:   Initrd
   Image Type:   ARM Linux RAMDisk Image (uncompressed)
   Data Size:    38289690 Bytes = 36.5 MiB
   Load Address: 00000000
   Entry Point:  00000000
   Verifying Checksum ... OK
   Loading Kernel Image ... OK
OK

Uncompressing Linux... done, booting the kernel.
[    0.000000] Booting Linux on physical CPU 0
[    0.000000] Initializing cgroup subsys cpuset
[    0.000000] Initializing cgroup subsys cpu
[    0.000000] Linux version 3.4.6-2.10-omap2plus (abuild@build14) (gcc version 4.7.1 20120723 [gcc-4_7-branch revision 189773] (SUSE Linux) ) #2 SMP Sat Sep 8 06:38:16 UTC 2012
[    0.000000] CPU: ARMv7 Processor [411fc083] revision 3 (ARMv7), cr=10c5387d
[    0.000000] CPU: PIPT / VIPT nonaliasing data cache, VIPT nonaliasing instruction cache
[    0.000000] Machine: Gumstix Overo
...

And away we go! Interestingly while KIWI is doings its bootstrap thing, I got a bunch of I/O errors:

...
[    9.048309] mmc0: new high speed SDHC card at address 0001
[    9.061431] mmcblk0: mmc0:0001 00000 7.41 GiB 
[    9.075683]  mmcblk0: p1 p2
[    9.162902] mmc1: new SDIO card at address 0001
setterm: cannot (un)set powersave mode: Invalid argument
Loading KIWI OEM Boot-System...
-------------------------------
Creating device nodes with udev
[   11.118896] udevd[168]: starting version 182
udevd[172]: ctx=0x4ae4d8 path=/lib/modules/3.4.6-2.10-omap2plus/kernel/drivers/net/wireless/libertas/libertas.ko error=No such file or directory
^M
[   11.648712] twl4030_usb twl4030_usb: Initialized TWL4030 USB module
[    4.950012] Including oem partition info file
[    5.145965] Searching for boot device...
[   16.582397] end_request: I/O error, dev mtdblock0, sector 0
[   16.588317] Buffer I/O error on device mtdblock0, logical block 0
[   16.595031] uncorrectable error : 
[   16.598480] end_request: I/O error, dev mtdblock0, sector 8
[   16.604522] Buffer I/O error on device mtdblock0, logical block 1
[   16.611145] uncorrectable error : 
[   16.614562] end_request: I/O error, dev mtdblock0, sector 16
[   16.620727] Buffer I/O error on device mtdblock0, logical block 2
[   16.627349] end_request: I/O error, dev mtdblock0, sector 24
[   16.633300] Buffer I/O error on device mtdblock0, logical block 3
[   16.640106] end_request: I/O error, dev mtdblock0, sector 0
[   16.645996] Buffer I/O error on device mtdblock0, logical block 0
[   17.727172] kjournald starting.  Commit interval 5 seconds
[   17.733154] EXT3-fs (mmcblk0p1): mounted filesystem with ordered data mode
modprobe: FATAL: Could not read '/lib/modules/3.4.6-2.10-omap2plus/kernel/fs/fat/vfat.ko': No such file or directory
(...modprobe error repeated several times...)
[   20.200988] end_request: I/O error, dev mtdblock0, sector 0
[   20.206939] Buffer I/O error on device mtdblock0, logical block 0
[   20.213684] uncorrectable error : 
[   20.217102] end_request: I/O error, dev mtdblock0, sector 8
[   20.223175] Buffer I/O error on device mtdblock0, logical block 1
[   20.229827] uncorrectable error : 
[   20.233245] end_request: I/O error, dev mtdblock0, sector 16
[   20.239410] Buffer I/O error on device mtdblock0, logical block 2
[   20.246032] end_request: I/O error, dev mtdblock0, sector 24
[   20.251983] Buffer I/O error on device mtdblock0, logical block 3
[   20.258880] end_request: I/O error, dev mtdblock0, sector 0
[   20.264770] Buffer I/O error on device mtdblock0, logical block 0
[    9.848419] Found boot device: /dev/mmcblk0
[   12.598419] Repartition the disk according to real geometry [ parted ]
[   16.442352] Repartition the disk according to real geometry [ parted ]
[   30.380310]  mmcblk0: p1 p2 p3
[   20.821899] Activating swap space on /dev/mmcblk0p3
[   21.109192] Filesystem of OEM system is: ext3 -> /dev/mmcblk0p2
[   21.243927] Resize EXT3 filesystem to full partition space...
/dev/mmcblk0p2: clean, 17708/49056 files, 119867/195839 blocks
Resizing the filesystem on /dev/mmcblk0p2 to 1776688 (4k) blocks.
Begin pass 1 (max = 49)
Extending the inode table     XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
The filesystem on /dev/mmcblk0p2 is now 1776688 blocks long.

[   99.565460] kjournald starting.  Commit interval 5 seconds
[   99.578399] EXT3-fs (mmcblk0p2): using internal journal
[   99.584350] EXT3-fs (mmcblk0p2): mounted filesystem with ordered data mode
[  100.448394] kjournald starting.  Commit interval 5 seconds
[  100.458648] EXT3-fs (mmcblk0p1): using internal journal
[  100.464263] EXT3-fs (mmcblk0p1): mounted filesystem with ordered data mode
[  108.013458] kjournald starting.  Commit interval 5 seconds
[  108.025207] EXT3-fs (mmcblk0p1): using internal journal
[  108.030761] EXT3-fs (mmcblk0p1): mounted filesystem with ordered data mode
/dev/mmcblk0p1: LABEL="BOOT" UUID="6505f4d4-6000-4609-899f-9b6634916922" SEC_TYPE="ext2" TYPE="ext3"
[   98.791229] Creating boot loader configuration
[  101.126831] Activating Image: [/dev/mmcblk0p2]

Anyway, it did boot successfully after that and I was able to log in, poke around a bit, and reboot. During reboot I got a couple of kernel oopses, same as (or presumably the same as) with my previous manually constructed image:

[  353.409118] Restarting system.
[  353.412963] Internal error: Oops: 80000007 [#1] SMP ARM
[  353.418640] Modules linked in: af_packet autofs4 dm_mod omapdrm(C) drm_kms_helper snd_soc_twl4030 snd_soc_core drm regmap_spi libertas_sdio snd_pcm fb_sys_fops sysimgblt sysfillrect libertas syscopyarea snd_timer snd cfg80211 soundcore snd_page_alloc rfkill twl4030_wdt lib80211 twl4030_usb
[  353.446960] CPU: 0    Tainted: G         C    (3.4.6-2.10-omap2plus #2)
[  353.454162] PC is at 0x0
[  353.456970] LR is at smp_send_stop+0x50/0xe4
[  353.461639] pc : [<00000000>]    lr : [<c0019378>]    psr: 600f0013
[  353.461669] sp : cc82be60  ip : 00000000  fp : 00012fc0
[  353.474060] r10: c07dd930  r9 : cc82a000  r8 : 4321fedc
[  353.479766] r7 : 45584543  r6 : cc82be64  r5 : c07a8ee0  r4 : 000f4241
[  353.486846] r3 : 00000000  r2 : 00000000  r1 : 00000006  r0 : cc82be64
[  353.493927] Flags: nZCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment user
[  353.501647] Control: 10c5387d  Table: 838b4019  DAC: 00000015
[  353.507904] Process systemd-shutdow (pid: 1, stack limit = 0xcc82a2f8)
[  353.514984] Stack: (0xcc82be60 to 0xcc82c000)
[  353.519744] be60: 00000000 00000000 00000000 00000000 cc82a000 c00152f4 00000000 00000000
[  353.528656] be80: 01234567 c0051dd4 cc82bea8 c86a9dd8 0000002b 00000005 002dc6c0 00000000
[  353.537536] bea0: 0112a880 00000000 c0a868c8 c0071e48 c07a8f58 00000000 c2c56078 cc81ed4c
[  353.546417] bec0: c0a868c8 cc8bbaf8 c000e788 fffffffe 00000000 c050db8c cc8ca01c cc81eac0
[  353.555328] bee0: cc82a000 c07a9ce8 00000000 c0788880 c8539580 cc82a000 cc82bfac c050ad08
[  353.564208] bf00: cc82bf24 c0069e08 00000001 c07a9ce8 c07a9ce8 00000000 c0013e60 c0788880
[  353.573089] bf20: 42cb86f0 00000052 c0788880 c0788880 00000000 00000000 c0786398 c0788880
[  353.582000] bf40: c53df680 c5377284 c080c5bc c0013fc8 cc82a000 00000000 00012fc0 c538e400
[  353.590881] bf60: c538e400 c538e800 c07e44d8 c011c050 00000001 00000000 00000000 00000000
[  353.599761] bf80: 00000024 c0013fc8 cc82a000 00000000 00000000 00000000 00000058 c0013fc8
[  353.608673] bfa0: 00000000 c0013e00 00000000 00000000 fee1dead 28121969 01234567 45584543
[  353.617553] bfc0: 00000000 00000000 00000000 00000058 00000000 00000000 00000000 00012fc0
[  353.626434] bfe0: b6eda2b0 be862944 0000b7dc b6eda2d0 600f0010 fee1dead 00fbff04 107ffe00
[  353.635437] [<c0019378>] (smp_send_stop+0x50/0xe4) from [<c00152f4>] (machine_restart+0xc/0x4c)
[  353.644958] [<c00152f4>] (machine_restart+0xc/0x4c) from [<c0051dd4>] (sys_reboot+0x174/0x1f4)
[  353.654357] [<c0051dd4>] (sys_reboot+0x174/0x1f4) from [<c0013e00>] (ret_fast_syscall+0x0/0x30)
[  353.663818] Code: bad PC value
[  353.667480] ---[ end trace a8f4050048b60a4e ]---
[  353.690063] Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b
[  353.690093] 
[  353.700286] Unable to handle kernel NULL pointer dereference at virtual address 00000000
[  353.709259] pgd = c0004000
[  353.712341] [00000000] *pgd=00000000
[  353.716308] Internal error: Oops: 80000007 [#2] SMP ARM
[  353.721984] Modules linked in: af_packet autofs4 dm_mod omapdrm(C) drm_kms_helper snd_soc_twl4030 snd_soc_core drm regmap_spi libertas_sdio snd_pcm fb_sys_fops sysimgblt sysfillrect libertas syscopyarea snd_timer snd cfg80211 soundcore snd_page_alloc rfkill twl4030_wdt lib80211 twl4030_usb
[  353.750274] CPU: 0    Tainted: G      D  C    (3.4.6-2.10-omap2plus #2)
[  353.757476] PC is at 0x0
[  353.760253] LR is at smp_send_stop+0x50/0xe4
[  353.764923] pc : [<00000000>]    lr : [<c0019378>]    psr: 600f0113
[  353.764953] sp : cc82bbd0  ip : 00000000  fp : cc82a000
[  353.777374] r10: fffffffc  r9 : cc81eac0  r8 : cc82bc83
[  353.783050] r7 : c078c040  r6 : cc82bbd4  r5 : c07a8ee0  r4 : 000f4241
[  353.790130] r3 : 00000000  r2 : 00000000  r1 : 00000006  r0 : cc82bbd4
[  353.797210] Flags: nZCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment user
[  353.804962] Control: 10c5387d  Table: 838b4019  DAC: 00000015
[  353.811187] Process systemd-shutdow (pid: 1, stack limit = 0xcc82a2f8)
[  353.818267] Stack: (0xcc82bbd0 to 0xcc82c000)
[  353.823028] bbc0:                                     c078c040 00000000 c0819230 c07dde04
[  353.831939] bbe0: cc81eac0 c05025b8 fffffffc cc82bc14 cc81eac0 c07dde04 c07dde04 cc81eac0
[  353.840820] bc00: c078c040 cc82bc83 fffffffc c0042358 c0651090 0000000b 00000000 cc82dc00
[  353.849700] bc20: c0817a58 cc82bc38 cc82a000 00000001 fffffff8 cc81ed18 cc82bc38 cc82bc38
[  353.858612] bc40: cc82bc83 cc82be18 c0817a58 cc82a000 00000001 cc82bc83 fffffff8 fffffffc
[  353.867492] bc60: cc82a000 c0017e3c cc82a2f8 0000000b 5d383536 00000000 00000008 bf000000
[  353.876373] bc80: 627c3694 50206461 61762043 0065756c 00000000 c3a11000 cc82be18 00000000
[  353.885284] bca0: c8539580 80000007 cc82be18 00000000 00000000 00000028 cc82a000 00000000
[  353.894165] bcc0: c8539580 80000007 cc82be18 00000000 00000000 00000028 cc82a000 c05023e4
[  353.903045] bce0: cc81eac0 c050d8c4 00000000 00000000 00000000 cc892c08 c0313000 c85395b8
[  353.911956] bd00: 00010000 00000000 c0820022 600f0093 00000022 c030bf84 cc892c00 00000004
[  353.920837] bd20: 00000005 00000010 c07a891c c03138c0 cc82bd5c c00659a8 00050000 00000001
[  353.929718] bd40: 00000000 c07f5acc c0819680 000064bc c07dbcec 00000007 c050d6fc c07ae574
[  353.938629] bd60: 00000000 cc82be18 cc82a000 c07dd930 00012fc0 c0008408 000064de 000064de
[  353.947509] bd80: c0819680 600f0093 ffffff96 00000000 000064de c003e93c 00000019 000064de
[  353.956390] bda0: 00000052 0000000f cc82be14 cc82be14 00000028 c0819732 00000000 c0819747
[  353.965301] bdc0: c07dbcec c003eeb0 00000000 c0819732 2d177fff 0000002c 48d331bb 00000052
[  353.974182] bde0: 00000003 c0819680 00000000 3b9aca00 10624dd3 600f0013 00000010 0088de08
[  353.983062] be00: 00000000 600f0013 ffffffff cc82be4c 4321fedc c050c1b8 cc82be64 00000006
[  353.991973] be20: 00000000 00000000 000f4241 c07a8ee0 cc82be64 45584543 4321fedc cc82a000
[  354.000854] be40: c07dd930 00012fc0 00000000 cc82be60 c0019378 00000000 600f0013 ffffffff
[  354.009735] be60: 00000000 00000000 00000000 00000000 cc82a000 c00152f4 00000000 00000000
[  354.018646] be80: 01234567 c0051dd4 cc82bea8 c86a9dd8 0000002b 00000005 002dc6c0 00000000
[  354.027526] bea0: 0112a880 00000000 c0a868c8 c0071e48 c07a8f58 00000000 c2c56078 cc81ed4c
[  354.036437] bec0: c0a868c8 cc8bbaf8 c000e788 fffffffe 00000000 c050db8c cc8ca01c cc81eac0
[  354.045318] bee0: cc82a000 c07a9ce8 00000000 c0788880 c8539580 cc82a000 cc82bfac c050ad08
[  354.054199] bf00: cc82bf24 c0069e08 00000001 c07a9ce8 c07a9ce8 00000000 c0013e60 c0788880
[  354.063110] bf20: 42cb86f0 00000052 c0788880 c0788880 00000000 00000000 c0786398 c0788880
[  354.071990] bf40: c53df680 c5377284 c080c5bc c0013fc8 cc82a000 00000000 00012fc0 c538e400
[  354.080871] bf60: c538e400 c538e800 c07e44d8 c011c050 00000001 00000000 00000000 00000000
[  354.089752] bf80: 00000024 c0013fc8 cc82a000 00000000 00000000 00000000 00000058 c0013fc8
[  354.098663] bfa0: 00000000 c0013e00 00000000 00000000 fee1dead 28121969 01234567 45584543
[  354.107543] bfc0: 00000000 00000000 00000000 00000058 00000000 00000000 00000000 00012fc0
[  354.116424] bfe0: b6eda2b0 be862944 0000b7dc b6eda2d0 600f0010 fee1dead 00fbff04 107ffe00
[  354.125396] [<c0019378>] (smp_send_stop+0x50/0xe4) from [<c05025b8>] (panic+0x98/0x1cc)
[  354.134124] [<c05025b8>] (panic+0x98/0x1cc) from [<c0042358>] (do_exit+0x6f4/0x7f8)
[  354.142486] [<c0042358>] (do_exit+0x6f4/0x7f8) from [<c0017e3c>] (die+0x294/0x320)
[  354.150756] [<c0017e3c>] (die+0x294/0x320) from [<c05023e4>] (__do_kernel_fault.part.8+0x54/0x74)
[  354.160430] [<c05023e4>] (__do_kernel_fault.part.8+0x54/0x74) from [<c050d8c4>] (do_page_fault+0x1c8/0x3ac)
[  354.171020] [<c050d8c4>] (do_page_fault+0x1c8/0x3ac) from [<c0008408>] (do_PrefetchAbort+0x34/0x9c)
[  354.180877] [<c0008408>] (do_PrefetchAbort+0x34/0x9c) from [<c050c1b8>] (__pabt_svc+0x38/0x80)
[  354.190185] Exception stack(0xcc82be18 to 0xcc82be60)
[  354.195678] be00:                                                       cc82be64 00000006
[  354.204589] be20: 00000000 00000000 000f4241 c07a8ee0 cc82be64 45584543 4321fedc cc82a000
[  354.213470] be40: c07dd930 00012fc0 00000000 cc82be60 c0019378 00000000 600f0013 ffffffff
[  354.222381] [] (__pabt_svc+0x38/0x80) from [] (smp_send_stop+0x50/0xe4)
[  354.231506] [] (smp_send_stop+0x50/0xe4) from [] (machine_restart+0xc/0x4c)
[  354.241027] [] (machine_restart+0xc/0x4c) from [] (sys_reboot+0x174/0x1f4)
[  354.250427] [] (sys_reboot+0x174/0x1f4) from [] (ret_fast_syscall+0x0/0x30)
[  354.259857] Code: bad PC value
[  354.263519] ---[ end trace a8f4050048b60a4f ]---
[  354.268707] Fixing recursive fault but reboot is needed!
[  354.383544] omap_i2c omap_i2c.3: timeout waiting for bus ready

So ignoring that for the moment, on the second boot (after KIWI had done its magic to finish setting up the MMC card), my new u-boot loaded. Note though that it’s still using the old environment variables at this point (which try to “fatload” in “loadbootscript”, which isn’t going to work), so that needed resetting:

U-Boot SPL 2012.04.01 (Sep 24 2012 - 12:48:02)
OMAP SD/MMC: 0
mkimage signature not found - ih_magic = ea000014

U-Boot 2012.04.01 (Sep 24 2012 - 12:48:02)

OMAP3530-GP ES3.1, CPU-OPP2, L3-165MHz, Max CPU Clock 720 mHz
Gumstix Overo board + LPDDR/NAND
I2C:   ready
DRAM:  256 MiB
NAND:  256 MiB
MMC:   OMAP SD/MMC: 0
In:    serial
Out:   serial
Err:   serial
Board revision: 0
Tranceiver detected on mmc2
No EEPROM on expansion board
Die ID #529e0004000000000403a1f303025013
Net:   smc911x-0
Hit any key to stop autoboot:  0 
Overo # nand erase 240000 20000

NAND erase: device 0 offset 0x240000, size 0x20000
Erasing at 0x240000 -- 100% complete.
OK
Overo # reset
resetting ...

Third time, we can see the new default environment:

U-Boot SPL 2012.04.01 (Sep 24 2012 - 12:48:02)
OMAP SD/MMC: 0
mkimage signature not found - ih_magic = ea000014


U-Boot 2012.04.01 (Sep 24 2012 - 12:48:02)

OMAP3530-GP ES3.1, CPU-OPP2, L3-165MHz, Max CPU Clock 720 mHz
Gumstix Overo board + LPDDR/NAND
I2C:   ready
DRAM:  256 MiB
NAND:  256 MiB
MMC:   OMAP SD/MMC: 0
*** Warning - bad CRC, using default environment

In:    serial
Out:   serial
Err:   serial
Board revision: 0
Tranceiver detected on mmc2
No EEPROM on expansion board
Die ID #529e0004000000000403a1f303025013
Net:   smc911x-0
Hit any key to stop autoboot:  0 
Overo # printenv
baudrate=115200
bootcmd=if mmc rescan ${mmcdev}; then if run loadbootscript; then run bootscript; else if run loaduimage; then run mmcboot; else run nan
dboot; fi; fi; else run nandboot; fi
bootdelay=5
bootscript=echo Running bootscript from mmc ...; source ${loadaddr}
console=ttyO2,115200n8
defaultdisplay=dvi
dieid#=529e0004000000000403a1f303025013
dvimode=1024x768MR-16@60
ethact=smc911x-0
kerneladdr=0x80200000
loadaddr=0x82000000
loadbootscript=ext2load mmc ${mmcdev} ${loadaddr} boot.scr
loaduimage=ext2load mmc ${mmcdev} ${loadaddr} uImage
mmcargs=setenv bootargs console=${console} ${optargs} mpurate=${mpurate} vram=${vram} omapfb.mode=dvi:${dvimode} omapdss.def_disp=${defaultdisplay} root=${mmcroot} rootfstype=${mmcrootfstype}
mmcboot=echo Booting from mmc ...; run mmcargs; bootm ${loadaddr}
mmcdev=0
mmcroot=/dev/mmcblk0p2 rw
mmcrootfstype=ext3 rootwait
mpurate=500
nandargs=setenv bootargs console=${console} ${optargs} mpurate=${mpurate} vram=${vram} omapfb.mode=dvi:${dvimode} omapdss.def_disp=${defaultdisplay} root=${nandroot} rootfstype=${nandrootfstype}
nandboot=echo Booting from nand ...; run nandargs; nand read ${loadaddr} 280000 400000; bootm ${loadaddr}
nandroot=ubi0:rootfs ubi.mtd=4
nandrootfstype=ubifs
ramdiskaddr=0x81000000
stderr=serial
stdin=serial
stdout=serial
vram=12M

Environment size: 1363/131068 bytes
Overo # saveenv
Saving Environment to NAND...
Erasing Nand...
Erasing at 0x240000 -- 100% complete.
Writing to Nand... done
Overo # reset
resetting ...

Fourth time’s the charm. Straight into loading the kernel and initrd without any manual intervention required:

U-Boot SPL 2012.04.01 (Sep 24 2012 - 12:48:02)
OMAP SD/MMC: 0
mkimage signature not found - ih_magic = ea000014


U-Boot 2012.04.01 (Sep 24 2012 - 12:48:02)

OMAP3530-GP ES3.1, CPU-OPP2, L3-165MHz, Max CPU Clock 720 mHz
Gumstix Overo board + LPDDR/NAND
I2C:   ready
DRAM:  256 MiB
NAND:  256 MiB
MMC:   OMAP SD/MMC: 0
In:    serial
Out:   serial
Err:   serial
Board revision: 0
Tranceiver detected on mmc2
No EEPROM on expansion board
Die ID #529e0004000000000403a1f303025013
Net:   smc911x-0
Hit any key to stop autoboot:  0 
Loading file "boot.scr" from mmc device 0:1 (xxa1)
625 bytes read
Running bootscript from mmc ...
## Executing script at 82000000
kerneladdr=0x80200000
ramdiskaddr=0x81000000
Loading file "uImage" from mmc device 0:1 (xxa1)
4063704 bytes read
Loading file "initrd" from mmc device 0:1 (xxa1)
9087228 bytes read
## Booting kernel from Legacy Image at 80200000 ...
   Image Name:   Linux-3.4.6-2.10-omap2plus
   Image Type:   ARM Linux Kernel Image (uncompressed)
   Data Size:    4063640 Bytes = 3.9 MiB
   Load Address: 80008000
   Entry Point:  80008000
   Verifying Checksum ... OK
## Loading init Ramdisk from Legacy Image at 81000000 ...
   Image Name:   Initrd
   Image Type:   ARM Linux RAMDisk Image (uncompressed)
   Data Size:    9087164 Bytes = 8.7 MiB
   Load Address: 00000000
   Entry Point:  00000000
   Verifying Checksum ... OK
   Loading Kernel Image ... OK
OK

Starting kernel ...

Uncompressing Linux... done, booting the kernel.
[    0.000000] Booting Linux on physical CPU 0
[    0.000000] Initializing cgroup subsys cpuset
[    0.000000] Initializing cgroup subsys cpu
[    0.000000] Linux version 3.4.6-2.10-omap2plus (abuild@build14) (gcc version 4.7.1 20120723 [gcc-4_7-branch revision 189773] (SUSE Linux) ) #2 SMP Sat Sep 8 06:38:16 UTC 2012
[    0.000000] CPU: ARMv7 Processor [411fc083] revision 3 (ARMv7), cr=10c5387d
[    0.000000] CPU: PIPT / VIPT nonaliasing data cache, VIPT nonaliasing instruction cache
[    0.000000] Machine: Gumstix Overo
...

Then after a little while…

Welcome to openSUSE 12.2 "Mantis" - Kernel 3.4.6-2.10-omap2plus (ttyO2).

linux login: root
Password: *****
Last login: Sat Jan  1 01:07:59 on ttyO2
This is the Lime-JeOS 12.2 SuSE Linux System.
To upgrade your system call:

    zypper refresh
    zypper install -t product openSUSE-12.2

Have a lot of fun...
linux:~ # uname -a
Linux linux 3.4.6-2.10-omap2plus #2 SMP Sat Sep 8 06:38:16 UTC 2012 armv7l armv7l armv7l GNU/Linux
linux:~ # uptime
 02:01am  up   0:01,  1 user,  load average: 2.23, 0.77, 0.27

So I call that reasonable success. At least now I have a documented point to start from next time. Oh, and that little red LED? It’s a heartbeat indicator. So long as it’s blinking, the kernel hasn’t crashed.

openSUSE 12.2 on a Gumstix Overo Fire

I’m having a late Hackweek (or hackish week, or at least a few hackdays), which seems like the ideal time to try running openSUSE on the Gumstix Overo Fire board that’s been sitting in my hackdrawer for the last year or so.  Thanks to the people working on the openSUSE ARM port, there are already suitable rootfs images and a kernel that’s close enough, so mostly this has been an exercise in getting the right bits onto a MicroSD card and learning how to interact with u-boot.

Current status: It boots and I can log in on the console.  The kernel seems to panic during shutdown. I haven’t tried networking, video, sound or anything else yet.

If anyone else wants to replicate my setup, you will need:

To prepare the MicroSD card (based on these docs):

  1. Plug it into your laptop or desktop system somehow (my laptop has an SD card reader).
  2. Figure out what device it is (in my case, it’s /dev/mmcblk0).
  3. Figure out how big it is, then partition it into a 64MB bootable FAT partition, and an ext3 Linux partition.  Apparently this involves some slightly weird geometry. Frankly I’m a bit dubious about this, given a couple of warnings, but it’s what the Gumstix docs said to do…
# fdisk -l /dev/mmcblk0
Disk /dev/mmcblk0: 7958 MB, 7958691840 bytes
...
# echo 7958691840/255/63/512 | bc
967

Now we know how many cylinders to tell it to use (the -C parameter of sfdisk), so being careful not to break anything, we do this:

# dd if=/dev/zero of=/dev/mmcblk0 bs=1024 count=1024
...
# sfdisk --force -D -uS -H 255 -S 63 -C 967 /dev/mmcblk0
Checking that no-one is using this disk right now ...
OK
...
No partitions found Input in the following format; absent fields get a default value.
<start> <size> <type [E,S,L,X,hex]> <bootable [-,*]> <c,h,s> <c,h,s>
...
/dev/mmcblk0p1 :128,130944,0x0C,*
/dev/mmcblk0p1   *       128    131071     130944   c  W95 FAT32 (LBA)
/dev/mmcblk0p2 :131072,,,-
/dev/mmcblk0p2        131072  15544319   15413248  83  Linux
/dev/mmcblk0p3 :0,0
/dev/mmcblk0p3             0         -          0   0  Empty
/dev/mmcblk0p4 :0,0
/dev/mmcblk0p4             0         -          0   0  Empty
New situation:
Units = sectors of 512 bytes, counting from 0

   Device Boot    Start       End   #sectors  Id  System
/dev/mmcblk0p1   *       128    131071     130944   c  W95 FAT32 (LBA)
/dev/mmcblk0p2        131072  15544319   15413248  83  Linux
/dev/mmcblk0p3             0         -          0   0  Empty
/dev/mmcblk0p4             0         -          0   0  Empty
Warning: partition 1 does not end at a cylinder boundary
Warning: partition 2 does not start at a cylinder boundary
Warning: partition 2 does not end at a cylinder boundary
end of partition 2 has impossible value for cylinders: 967 (should be in 0-966)
Do you want to write this to disk? [ynq] y
Successfully wrote the new partition table
...
  1. Format and mount both partitions:
# mkfs.vfat -F 32 /dev/mmcblk0p1 -n boot
mkfs.vfat 3.0.10 (12 Sep 2010)
# mke2fs -j -L rootfs /dev/mmcblk0p2
mke2fs 1.41.14 (22-Dec-2010)
...
# mkdir /media/{boot,rootfs}
# mount /dev/mmcblk0p1 /media/boot
# mount /dev/mmcblk0p2 /media/rootfs
  1. Extract the JeOS image and the kernel RPM into the rootfs partition (this may take a while):
# tar -xjvf openSUSE-12.2-ARM-JeOS.armv7l-1.12.1-Build1.15.3.tbz -C /media/rootfs
...
# cd /media/rootfs
# rpm2cpio kernel-omap2plus-3.4.6-2.10.1.armv7hl.rpm | cpio -idmv
...
# cd
  1. Put MLO, u-boot.bin and the kernel uImage on the boot partition (MLO needs to go on first, and make sure you get the omap2plus uImage, not the default one, which won’t work):
# cp MLO /media/boot/
# cp u-boot.bin /media/boot/
# cp /media/rootfs/boot/uImage-3.4.6-2.10-omap2plus /media/boot/uImage
  1. Sync, unmount both partitions and remove the card:
# sync ; umount /media/boot ; umount /media/rootfs

To boot the Gumstix board:

  1. Insert the MicroSD card
  2. Plug the USB console in to your desktop/laptop and use screen (or kermit, or whatever) to connect to it:
# screen /dev/ttyUSB0 115200
  1. Apply power. You’ll almost certainly want to stop the autoboot and reset the envrionment:
Texas Instruments X-Loader 1.5.0 (Aug 29 2011 - 12:52:49)
OMAP3530-GP ES3.1
Board revision: 0
Reading boot sector
Loading u-boot.bin from mmc

U-Boot 2010.12 (Aug 22 2011 - 09:49:35)

OMAP3530-GP ES3.1, CPU-OPP2, L3-165MHz, Max CPU Clock 720 mHz
Gumstix Overo board + LPDDR/NAND
I2C:   ready
DRAM:  256 MiB
NAND:  256 MiB
MMC:   OMAP SD/MMC: 0
In:    serial
Out:   serial
Err:   serial
Board revision: 0
Tranceiver detected on mmc2
No EEPROM on expansion board
Die ID #529e0004000000000403a1f303025013
Net:   smc911x-0
Hit any key to stop autoboot:  0 [ENTER]
Overo # nand erase 240000 20000
...
Overo # saveenv
...
Overo # reset
  1. If all has gone well, the system will boot, a few services will fail to start, and you’ll be able to log in as root (password “linux”)
Welcome to openSUSE 12.2 "Mantis" - Kernel 3.4.6-2.10-omap2plus (ttyO2).

linux login: root
Password: *****
This is the Lime-JeOS 12.2 SuSE Linux System.
To upgrade your system call:

    zypper refresh
    zypper install -t product openSUSE-12.2

Have a lot of fun...
linux:~ # uname -a
Linux linux 3.4.6-2.10-omap2plus #2 SMP Sat Sep 8 06:38:16 UTC 2012 armv7l armv7l armv7l GNU/Linux

Here’s some (remarkably poor quality) video proof:

Now I just need to figure out if the blinking red LED next to the ethernet port is blinking in merry happiness, or trying to warn me of impending doom.

Shameless High Availability Googlebait

I’m sure newcomers to high availability on Linux are still being bewildered by reams of readily googlable semi-ancient information floating out there in the ether.  So I’m going to try to help remedy this by saying:

This has been a public service announcement.  Thank you for reading.

That UEFI Secure Boot Thing

Yesterday Matthew Garret posted Implementing Secure Boot in Fedora, which was subsequently covered by Cory Doctorow in Lockdown: free/open OS maker pays Microsoft ransom for the right to boot on users’ computers.  I find myself somewhat torn by the whole affair.  I understand how the choice by Fedora to cough up $99 to have their shim bootloader signed by Microsoft can be seen as a sellout.  But at the same time, if your goal is to ensure your distro is bootable without forcing the user to screw around with their firmware settings, I think Fedora has probably made the least-worst choice, and I think other distros should also consider evaluating this approach.

Immediately, speaking purely practically, a single $99 payment by a distro to cover a (presumably) infrequently updated shim bootloader, and thus have Linux work with UEFI secure boot, is not terribly onerous.  Even if many distros did this, I’m not seeing it amounting to much of a revenue stream for Microsoft.  And it meets the stated goal (make Linux run on new hardware with minimum user effort or even awareness).  So that’s fine as far as it goes.

I’m far less happy about it from a political perspective, where this amounts to supporting another instance of what I’d call The Certificate Cartel, a term I used to apply to SSL CAs.

So, like I said, I find myself somewhat torn by the whole affair.