08

September
2012

initramfs pivot_root solution

In the previous post I discussed initramfs, its pros and cons. The worst problem was the fact it doesn't support pivot_root, which seems to be essential for clean shutdown of Slax (or a live distro in general) since it is used to switch between ramfs root and union root (back and forth, once during the boot, and finally switching back during shutdown).

I wasn't able to pivot_root back to initramfs durnig shutdown of the live OS, thus wasn't able to unmount the media Slax is running from. Fortunately I found a workaround, which completely eliminates the need for the hack mentioned in previous post.

It's relatively simple, yet not obvious. Immediately after the kernel is booted and init is started in the initramfs root, we make a directory /m (or something) and copy the current initramfs filesystem to it. This doubles RAM usage for a while, but don't worry, since the next step we do is to switch root to /m. This does two things. First, it frees all the memory used by initramfs, and second it simply restarts the init process from a mounted filesystem. And as soon as root is mounted, the real startup procedures of LiveKit can be started (mounting union, and so on), and finally pivot_root is now magically available again. Voila!

User comments
linuxpr 2012-09-08 04:55

check this out
#!/bin/sh

launch_init()
{
[ -d /proc/sys ] && umount /proc
echo -e "\\033[70G[ \\033[1;32mOK\\033[0;39m ]"
exec /sbin/switch_root mnt /sbin/init
}

launch_init_modular()
{
[ -d /proc/sys ] && umount /proc
echo -e "\\033[70G[ \\033[1;32mOK\\033[0;39m ]"
[ -d /mnt/initramfs ] || mkdir -p /mnt/initramfs
SYS_DIR="dev bin etc boot lib sbin home root run usr var"
for dir in $SYS_DIR; do
cp -a /$dir /mnt/initramfs
done
[ -f /mnt/dev/sdc ] || mknod /mnt/dev/sdc b 8 32
[ -f /mnt/dev/sdc1 ] || mknod /mnt/dev/sdc1 b 8 33
MK_DIR="sys proc tmp media mnt"
for dir1 in $MK_DIR; do
mkdir -p /mnt/initramfs/$dir1
done
exec /sbin/switch_root mnt /linuxrc
}

failed()
{
[ -d /proc/sys ] && umount /proc
echo -e "\\033[70G[ \\033[1;31mFailed\\033[0;39m ]"
conspy -d 1 > /init.log
}

try_init()
{
if [ ! -d /mnt/etc ] && grep -q cryptoroot= /proc/cmdline; then
modprobe dm-mod
modprobe dm-crypt
modprobe aes-i586
root="$(sed 's/.*cryptoroot=\([^ ]*\).*/\1/' < /proc/cmdline)"
dev=${root#/dev/}
dmlabel=crypto-$dev
if cryptsetup isLuks $root 2> /dev/null; then
cryptsetup luksOpen $root $dmlabel
else
read -s -t 60 -p "Pass phrase : " passphrase
key=$(echo $passphrase | hashalot -x -n 32 sha512)
blocks=$(cat $(find /sys/block | grep /$dev/size))
echo 0 $blocks crypt aes-plain $key 0 $root 0 | \
dmsetup create $dmlabel
fi
mount /dev/mapper/$dmlabel /mnt
fi
if [ -d /mnt/etc ]; then
umount /sys
[ -n "$1" ] && for i in $@ ; do
cp -a $i /mnt$(dirname $i)
done
mount /mnt -o remount,ro
launch_init
fi
failed
}

mount_mapper()
{
mount $root /mnt
try_init /dev/mapper $@
}

lvmsetup()
{
grep -q lvmroot= /proc/cmdline || return 1
modprobe dm-mod
vgscan --ignorelockingfailure
vgchange -ay --ignorelockingfailure
root="/dev/mapper/$(sed 's/.*lvmroot=\([^ ]*\).*/\1/' < /proc/cmdline)"
return 0
}

load_raid()
{
while read line; do
case "$line" in
*raid10*) modprobe raid10 ;;
*raid0*) modprobe raid0 ;;
*raid1*) modprobe raid1 ;;
*raid[456]*) modprobe raid456 ;;
esac
done
}

mount -t proc proc /proc
mount -t sysfs sysfs /sys
echo -n "Switching / to "
if grep -q dmraid= /proc/cmdline; then
root="$(sed 's/.*dmraid=\([^ ]*\).*/\1/' < /proc/cmdline)"
echo -n "dmraid $root..."
dmraid -s | grep ^type | awk '{ print $3 }' | load_raid
case "$root" in
/dev/*);;
*) root=/dev/mapper/$(dmraid -s|grep ^name|awk '{print $3}')p${root#p};;
esac
dmraid -ay
lvmsetup
mount_mapper
fi
if grep -q softraid= /proc/cmdline; then
root="$(sed 's/.*softraid=\([^ ]*\).*/\1/' < /proc/cmdline)"
echo -n "softraid $root..."
mdadm --examine --scan --config=partitions > /etc/mdadm.conf
grep -qs " $root " /etc/mdadm.conf ||
root=$(awk '/dev.md/ { print $2; exit }' < /etc/mdadm.conf)
grep level=raid /etc/mdadm.conf | load_raid
for i in 1 2 3 4 5 6 7 8 9; do
sleep $i
mdadm --assemble --scan
grep -qs ': active' /proc/mdstat && break
done
lvmsetup
mount_mapper /etc/mdadm.conf
fi
if lvmsetup; then
echo -n "lvm $root..."
mount_mapper
fi
if grep -q mount= /proc/cmdline; then
root="$(sed 's/.*mount=\([^ ]*\).*/\1/' < /proc/cmdline)"
dev=$(blkid | grep $root | sed 's/:.*//;q')
echo -n "Mounting $dev ($root) ..."
if ! mount $dev /mnt; then
if echo $dev | grep -q "/dev/sd"; then
delay=`cat /sys/module/usb_storage/parameters/delay_use`
delay=$((1+$delay))
echo -n "sleep for $delay seconds..."
sleep $delay
fi
mount $dev /mnt
fi
grep -q posixovl /proc/cmdline && mount.posixovl /mnt
fi
if grep -q loopfs= /proc/cmdline; then
loopfs="$(sed 's/.*loopfs=\([^ ]*\).*/\1/' < /proc/cmdline)"
echo -n "loop $loopfs..."
losetup /dev/loop0 /mnt/$loopfs
mount /dev/loop0 /mnt 2> /dev/null
fi
if grep -q bindfs= /proc/cmdline; then
bind="$(sed 's/.*bindfs=\([^ ]*\).*/\1/' < /proc/cmdline)"
mount --bind /mnt/${bind%,*} /mnt/${bind%,*}/${bind#*,}
fi
grep -q cryptoroot= /proc/cmdline && try_init
umount /sys
if grep -q subroot= /proc/cmdline; then
subroot="/$(sed 's/.*subroot=\([^ ]*\).*/\1/' < /proc/cmdline)" &&
if [ -s /usr/share/boot/busybox-static ]; then
mv /usr/share/boot/busybox-static .
/busybox-static rm -rf /etc /lib /*bin /usr /var
exec /busybox-static chroot /mnt$subroot /sbin/init
else
exec chroot /mnt$subroot /sbin/init
fi
fi
echo -n "tmpfs..."
size="$(grep rootfssize= < /proc/cmdline | \
sed 's/.*rootfssize=\([0-9]*[kmg%]\).*/-o size=\1/')"
free=$(busybox free | busybox awk '/Mem:/ { print int(($4*100)/$3) }')
umount /proc
[ -n "$size" ] || size="-o size=90%"
if [ $free -lt 100 ] || ! mount -t tmpfs $size tmpfs /mnt; then
echo -e "\\033[70G[ \\033[1;33mSkipped\\033[0;39m]"
exec /sbin/init
fi
for i in $(ls -a /); do
case "$i" in
.|..) ;;
mnt) mkdir /mnt/mnt;;
*) if ! cp -a /$i /mnt 2> /dev/null; then
failed
umount /mnt
exec /sbin/init
fi;;
esac
done
launch_init_modular

Manfred 2012-09-08 04:58

Hi Tomas,

you will have your reasons to switch to it, but I still don't get the advantage(s) of initramfs.

To get a variable size of my initrd for FluxFlux I am using

count=$(($(du -s -b $INITRD_TREE | awk '{print $1}') * 3/2048))

in the relevant part of initrd_create. Thus I've always an initrd size of roughly 1,5 * extracted size of files in initrd.

The count variable gets used to inject the size into the several cfg scripts in /boot, too.

linuxpr 2012-09-08 04:59

This guy is doing what you want with slitaz :
http://godane.wordpress.com/2011/03/30/slitaz-core-20110329-release/

Tomas M 2012-09-08 10:51

The best advantage of initramfs is that it's not a filesystem, it's a cpio archive. I've pushed the latest updates to Linux Live github, I think I finished both the init script and cleanup script (regarding the cleanup procedure)

fanthom 2012-09-19 13:57

that makes sense now (fooling initramfs to behave like initrd)
and reminds me the same hack posted on the slax forum long time ago:
www.slax.org/forum.php?action=view&parentID=25297

Ahau used it in our unofficial ARM port which forces initramfs (combined with the kernel) due to the bootloader limitation.

your solution is a bit simpler and i like it.

stupid me that i have forgotten about it... will try to use it in the standard edition now :)

thanks

Jakub Neburka 2013-01-30 05:33

Hello Tomas

I would like to know, what is the problem about pivot_root under initramfs. Is that the initramfs / is not a real mount point? Have you tried to make it mounpoint with somethimg like: mount --bind / / ?