Solaris: UFS to ZFS, LiveUpgrade and Patching

This article gives a detailed overview, how we migrate our servers from UFS to ZFS boot 2-way mirros, how they are upgraded to Solaris™ 10u6 aka 10/08 with /var on a separate ZFS and finally how to accomplish "day-to-day" patching. The main stages are devided into:

Make sure, that on each stage all zones are running or at least bootable, and the environment variables show below are properly set. Also give your brain a chance and think before you blindly copy-and-paste any commands mentioned in this article! There is no guarantee, that the commands shown in this article match exactly your system and thus may damage it/cause data loss, if you do not adjust them to your needs!

The shown procedures have been successfully tested on several Sun Fire V240, 280R, 420R, V440, V490, T1000, X4500, X4600 and Sun Ultra 40s with zero or more running sparse zones.

setenv CD /net/install/pool1/install/sparc/Solaris_10_u6-ga1
setenv JUMPDIR /net/install/pool1/install/jumpstart
mount /local/misc
set path = ( /usr/bin /usr/sbin /local/misc/sbin )

Moving from UFS to ZFS boot

  1. update to S10u6 aka 10/08 via recommended/feature patching

    on pre U4 systems SUNWlucfg is probably missing:

    pkgadd -d $CD/Solaris_10/Product SUNWlucfg
    

    make sure, that all required patches are installed. E.g. for sparc:

    Also apply the following patches to avoid a lot of trouble (see LiveUpgrade Troubleshooting for more information):

    # Solaris
    gpatch -p0 -d / -b -z .orig < /local/misc/etc/lu-5.10.patch
    # Nevada
    gpatch -p0 -d / -b -z .orig < /local/misc/etc/lu-5.11.patch
    

    In case of unclarity one should consult the follwing docs:

  2. determine the HDD for the new root pool aka rpool

    echo | format

    In this example we use: c0t1d0

  3. format the disk so that the whole disk can be used by ZFS

    # on x86 first
    fdisk -B /dev/rdsk/c0t1d0p0
    
    # on sparc and x86 delete all slices and assign all blocks to s0
    format -d c0t1d0
    

    If you want to use mirroring, make sure, that s0 of HDD0 and HDD1 have finally the same size (use number of blocks when specifying its size)

    Part      Tag    Flag     Cylinders         Size            Blocks
      0       root    wm       0 - 29772       34.41GB    (29773/0/0) 72169752
      1 unassigned    wm       0                0         (0/0/0)            0
      2     backup    wu       0 - 29772       34.41GB    (29773/0/0) 72169752
      3 unassigned    wm       0                0         (0/0/0)            0
      4 unassigned    wm       0                0         (0/0/0)            0
      5 unassigned    wm       0                0         (0/0/0)            0
      6 unassigned    wm       0                0         (0/0/0)            0
      7 unassigned    wm       0                0         (0/0/0)            0
    
  4. move all UFS zones to ZFS mirror 'pool1' on HDD2 and HDD3

    This allows LU to use zone snapshots instead of copying the stuff - thus magnitudes faster. In our example, UFS zones are in /export/scratch/zones/ ; pool1 mountpoint is on /pool1 .

    One may use the following ksh snipplet (requires GNU sed!):

    ksh
    zfs create pool1/zones
    # adjust this and the following variable
    UFSZONES="zone1 zone2 ..."
    UFSZPATH="/export/scratch/zones"
    for ZNAME in $UFSZONES ; do
    	zlogin $ZNAME 'init 5'
    done
    echo 'verify
    commit
    ' >/tmp/zone.cmd
    for ZNAME in $UFSZONES ; do
    	# and wait, 'til $ZNAME is down
    	while true; do
    		zoneadm list | /usr/xpg4/bin/grep -q "^$ZNAME"'$'
    		[ $? -ne 0 ] && break
    	done
    	zfs create pool1/zones/$ZNAME
    	mv $UFSZPATH/$ZNAME/* /pool1/zones/$ZNAME/
    	chmod 700 /pool1/zones/$ZNAME
    	gsed -i \
    		-e "/zonepath=/ s,$UFSZPATH/$ZNAME,/pool1/zones/$ZNAME," \
    		/etc/zones/$ZNAME.xml
    	zonecfg -z $ZNAME -f /tmp/zone.cmd
    	zoneadm -z $ZNAME boot
    done
    exit
    
  5. create the rpool

    zpool create -f -o failmode=continue rpool c0t1d0s0
    # some bugs? require us to do the following manually
    zfs set mountpoint=/rpool rpool
    zfs create -o mountpoint=legacy rpool/ROOT
    zfs create -o canmount=noauto rpool/ROOT/zfs1008BE
    zfs create rpool/ROOT/zfs1008BE/var
    zpool set bootfs=rpool/ROOT/zfs1008BE rpool
    zfs set mountpoint=/ rpool/ROOT/zfs1008BE
    
  6. create the ZFS based Boot Environment (BE)

    lucreate -c ufs1008BE -n zfs1008BE -p rpool
    

    ~25min on V240

    At least here one probably ask itself, why we do not use pool1 for boot, then form a mirror of HDD0 and HDD1 and put another BE on the mirror. The answer is pretty simple: because some machines like the thumper aka X4500 can boot from 2 special disks, only (c5t0d0 and c5t4d0).

  7. move BE's /var to a separate ZFS within the BE

    zfs set mountpoint=/mnt rpool/ROOT/zfs1008BE
    zfs mount rpool/ROOT/zfs1008BE
    zfs create rpool/ROOT/zfs1008BE/mnt
    cd /mnt/var
    find . -depth -print | cpio -puvmdP@ /mnt/mnt/
    rm -rf /mnt/mnt/lost+found
    cd /mnt; rm -rf /mnt/var
    zfs rename rpool/ROOT/zfs1008BE/mnt rpool/ROOT/zfs1008BE/var
    zfs umount rpool/ROOT/zfs1008BE
    zfs set mountpoint=/ rpool/ROOT/zfs1008BE
    zfs set canmount=noauto rpool/ROOT/zfs1008BE/var
    

    ~7 min on V240

  8. activate the new ZFS based BE

    luactivate zfs1008BE
    

    copy the output of the command to a safe place, e.g. USB stick

  9. restart the machine

    init 6
  10. after reboot, check that everything is ok

    E.g.:

    df -h
    dmesg
    # PATH should be /pool1/zones/$zname-zfs1008BE for none-global zones
    zoneadm list -iv
    lustatus
    

Cleaning up

  1. destroy old UFS BE

    ludelete ufs1008BE
    

    One will get warnings about not beeing able to delete ZFSs for the old bootenv like /.alt.tmp.b-LN.mnt/pool1/zones/$zname - that's ok. One can promote its clones (e.g. /pool1/zones/$zname-zfs1008BE) later and than remove the old ones including their snapshots on desire.

  2. make sure, everything is still ok

    init 6
    
  3. move all remaining filesystems from HDD0 to the root pool

    Depending on the mount hierarchy, the following recipe needs to be adapted!

    a) check
    	df -h | grep c0t0d0
    b) stop all zones and processes, which use those UFS slices (remember to
       unshare those slices, if exported via NFS)
    	zlogin $ZNAME 'init 5'
    c) create appropriate ZFSs
    	foreach USLICE in $UFS_SLICES_FROM_a
    		zfs create rpool/mnt
    		# just to be sure
    		mount -o ro -o remount $USLICE.mntpoint
    		cd $USLICE.mntpoint 
    		find . -depth -print | cpio -puvmdP@ /rpool/mnt/
    		rm -rf /rpool/mnt/lost+found
    		umount $USLICE.mntpoint
    		# comment out the appropriate entry in /etc/vfstab
    		gsed -i -e "/^$USLICE/ s,^,#," /etc/vfstab
    		zfs rename -p  rpool/mnt rpool/$USLICE.mntpoint
    		# in case of NFS export, comment out entries in /etc/dfs/dfstab
    		# and apply to the ZFS itself
    		zfs set sharenfs='rw=sol:bsd:lnx,root=admhosts' rpool/mnt
    		# if the parent mountpoint is not appropriate
    		zfs set mountpoint=$USLICE.mntpoint rpool/$USLICE.mntpoint
    	done
    
  4. adjust /etc/lu/ICF.$NUM

    Deduce $NUM from /etc/lutab (e.g. with grep :`lucurr`: /etc/lutab | cut -f1 -d:), replace c0t0d0* entries with its zfs counter part and add all parents not yet part of that file. The order of these entries are important! lumount tries to mount the filesystems in the same order they appear in the file and thus may hide required directories (mountpoints)! E.g. our diff would be:

    zfs1008BE:-:/dev/zvol/dsk/rpool/swap:swap:4196352
    zfs1008BE:/:rpool/ROOT/zfs1008BE:zfs:0
    zfs1008BE:/var:rpool/ROOT/zfs1008BE/var:zfs:0
    zfs1008BE:/pool1:pool1:zfs:0
    zfs1008BE:/rpool:rpool:zfs:0
    zfs1008BE:/pool1/zones:pool1/zones:zfs:0
    -zfs1008BE:/var/log/web:/dev/dsk/c0t0d0s6:ufs:16780016
    +zfs1008BE:/rpool/var:rpool/var:zfs:0
    +zfs1008BE:/rpool/var/log:rpool/var/log:zfs:0
    +zfs1008BE:/var/log/web:rpool/var/log/web:zfs:0
    -zfs1008BE:/export/scratch:/dev/dsk/c0t0d0s7:ufs:16703368
    +zfs1008BE:/export:rpool/export:zfs:0
    +zfs1008BE:/export/scratch:rpool/export/scratch:zfs:0
    
  5. init 6
    
    # after reboot
    df -h | grep c0t0d0
    zpool status
    swap -l
    

    Destroy any zpool/zfs/volume, which is still assigned to c0t0d0 - if one is still in use, zfs/zpool will warn you.

  6. repartition HDD0 as one slice as described above

  7. attach HDD1 to HDD0 - form a ZFS 2-way mirror

    zpool attach rpool c0t1d0s0 c0t0d0s0
    

    Finally, to avoid an unbootable environment, check the ZFS Troubleshooting Guide to fix any known ZFS Boot Issues immediately.

Live Upgrade to S10 U6

See also:

  1. remove the old Live Upgrade packages

    pkgrm SUNWluu SUNWlur SUNWlucfg
    
  2. add the Live Upgrade packages from the release/update to install

    pkgadd -d $CD/Solaris_10/Product SUNWluu SUNWlur SUNWlucfg
    
  3. check, whether all required patches are installed

    E.g. wrt. Solaris™ Live Upgrade Software: Minimum Patch Requirements:

    checkpatches.sh -p 119081-25 124628-05 ...
    
    # Solaris
    gpatch -p0 -d / -b -z .orig < /local/misc/etc/lu-5.10.patch
    # Nevada
    gpatch -p0 -d / -b -z .orig < /local/misc/etc/lu-5.11.patch
    
  4. create the new root pool on HDD0

    This is usually not neccessary, if you already have a ZFS mirrored boot environment (in this case just use rpool instead of rpool0 in the following examples/scripts and omit this step). However, if e.g. s0 of HDD1 is smaller than s0 of HDD0, the latter one can not be attached to the former one. So we need to "swap" the situation it.

    zpool create -o failmode=continue rpool0 c0t0d0s0
    zpool status
    
  5. check and fix basic ownership/permissions

    pkgchk -v SUNWscpr
    
  6. speedup lu commands

    Some servers have a lot of filesystems, which are completely meaningless wrt. LiveUpgrade (e.g. the users home directories). That's why they should be ignored by the lu* commands and thus prevent a lot of unnecessar work and save a lot of time. E.g. excluding ~2200 ZFS on a X4600M2 (4xDualCore Opteron 8222 3GHz) saves about 40 min per lumount, luactivate, etc. command. So to exclude all user home directories in our case, we set an appropriate regular expression (see regexp(5)) into /etc/lu/fs2ignore.regex - for more information see LiveUpgrade Troubleshooting.

    echo '/export/home' > /etc/lu/fs2ignore.regex
    
  7. create a new boot environment for upgrade

    rmdir /.alt.lucopy.*
    lucreate -n s10u6 -p rpool0
    

    ~30 min on V240

  8. mount the new bootenv on /mnt - fix any errors

    Do not continue before it executes without any errors. If you got problems, have a look at LiveUpgrade Troubleshooting.

    lumount s10u6 /mnt
    
  9. determine patches, which would be removed by luupgrade

    can be used to re-apply them after upgrade, if necessary

    $CD/Solaris_10/Misc/analyze_patches -N $CD -R /mnt \
    	>/mnt/var/tmp/s10u6-rm.txt
    luumount s10u6
    

    ~4 min on V240

  10. create the profile to use for the upgrade

    We use:

    $JUMPDIR/mkProfile.sh -u $CD/Solaris_10
    
    # remove U6 zone poison SUNWdrr on sparc
    echo 'cluster SUNWC4u1 delete' >> /tmp/profile.orig
    echo 'cluster SUNWCcvc delete' >> /tmp/profile.orig
    cp /tmp/profile.orig /var/tmp/profile
    
    # just if one wanna know the intial setup of the system
    cp /var/tmp/profile /mnt/var/sadm/system/logs/profile.s10u6
    
  11. verify the profile (simulate upgrade)

    rm -f /var/tmp/upgrade.err /var/tmp/upgrade.out
    luupgrade -u -n s10u6 \
    	-l /var/tmp/upgrade.err -o /var/tmp/upgrade.out \
    	-s $CD -j /var/tmp/profile -D
    
  12. upgrade if /var/tmp/upgrade.err is empty

    rm -f /var/tmp/upgrade.err /var/tmp/upgrade.out
    luupgrade -u -n s10u6 \
    	-l /var/tmp/upgrade.err -o /var/tmp/upgrade.out \
    	-s $CD -j /var/tmp/profile
    

    ~95 min on V240

    NOTE: The last step - copying the failsafe miniroot - of luupgrade may fail. See luupgrade: Installing failsafe fails how to do this manually.

  13. make sure that zone pathes are not mounted

    # umount all zone ZFSs created by lu*, e.g.:
    zfs umount /pool1/zones/*-zfs1008BE-s10u6
    

    If zonepathes are mounted, lumount/luactivate and friends will usually fail!

  14. check infos and errors and fix them if neccessary, re-apply your changes

    lumount s10u6 /mnt
    gpatch -p0 -d /mnt -b -z .orig < /local/misc/etc/lu-`uname -r`.patch
    cd /mnt/var/sadm/system/data/
    less upgrade_failed_pkgadds upgrade_cleanup locales_installed vfstab.unselected
    cleanup4humans.sh /mnt
    

    BTW: We use the script cleanup4humans.sh to get this job done faster and in a more reliable manner. It prepares basic command lines, which one probably needs to use to decide, whether to copy back replaced files or to replace files with the version suggested by the upgrade package.

  15. apply new patches

    We use PCA to find out, which patches are available/recommended for the new BE and its zones and finally apply them using pca -R ...

    However, to get reproducable results on all systems, we modified the script to have the patch download directory preset, to use always /usr/bin/perl no matter, how the PATH variable is set, and finally to invoke a postinstall script (if available on the system), which automatically fixes questionable changes made by patches. For your convinience, you can download this Patch and edit/adapt it to your needs.

    Of course you may try to accomplish the same with the smpatch command, however IMHO this kind of bogus, poorly designed software is probably one reason, why some people think, Java is a bad thing. That's why we don't install the SUNWCpmgr, SUNWCswup, SUNWswupclr, SUNWswupcl junkware on any system.

    OK, let's start with creating patch lists for each zone using the convinience script lupatch.sh (download and adjust it to your needs if you don't have it already). BTW: This command works also for the current BE, when one adds the '-R /' option.

    # download all potential patches and show available patch lists when finished
    lupatch.sh -d
    

    Now one should study the READMEs of all potential new patches for the zones by invoking the command shown below. BTW: This command works also for the current BE, when one adds the '-R /' option.

    lupatch.sh -r
    

    After that one should customize the available patch lists (per default /var/tmp/patchList.*) by removing the patch list at all or by removing the lines of patches, which should not be installed.

    Finally one should apply the patches to the zones either using 'luupgrade -t -r /mnt patch ...' or the following command, which uses 'pca -i -R /mnt/$zonepath patch ...' internally to patch each zone using the patch lists from mentioned before (and makes sure, that the global zone gets patched first).

    lupatch.sh -i
    

    Make sure to always patch the 'global' zone FIRST!!!

  16. unmount the BE

    cd /
    luumount s10u6
    
  17. activate the new BE

    luactivate s10u6
    
  18. boot into the new BE

    init 6
    # after reboot, check whether you still have a swap device enabled
    swap -l
    # if not, add the one of from that pool, where the current BE lives and
    # add an appropriate entry to /etc/vfstab:
    swap -a /dev/zvol/dsk/rpool0/swap
    echo '/dev/zvol/dsk/rpool0/swap	-	-	swap	-	no	-' >>/etc/vfstab
    
  19. adjust your backup settings

    javaws -viewer
    

Cleanup - Phase 2

In this section it is shown, how one may transfer remaining ZFSs from HDD1 aka rpool with the old ZFS BE (zfs1008BE) to HDD0 aka rpool0 where the current BE (s10u6) lives. This is usually not neccessary, when one was able to attach s0 of HDD0 to s0 of HDD1 before doing the upgrade to the "new" OS. However, for some reasons this might have not been possible at the time of the upgrade and that's why this case including pretty detailed troubleshooting hints are covered here.

To get the picture, we have now the following situation on our server:

Boot Environment           Is       Active Active    Can    Copy
Name                       Complete Now    On Reboot Delete Status
-------------------------- -------- ------ --------- ------ ----------
zfs1008BE                  yes      no     no        yes    -
s10u6                      yes      yes    yes       no     -

Filesystem             Size   Used  Available Capacity  Mounted on
rpool0/ROOT/s10u6       33G   4.7G        23G    18%    /
rpool0/ROOT/s10u6/var   33G   2.2G        23G     9%    /var
rpool/export            34G    20K        21G     1%    /export
rpool/export/scratch    34G  1021M        21G     5%    /export/scratch
pool1                   67G    22K        66G     1%    /pool1
pool1/home              67G    39M        66G     1%    /pool1/home
pool1/web               67G    27K        66G     1%    /pool1/web
pool1/web/iws2          67G   284K        66G     1%    /pool1/web/iws2
pool1/web/iws2/sites    67G   850K        66G     1%    /pool1/web/iws2/sites
pool1/web/theo2         67G    25K        66G     1%    /pool1/web/theo2
pool1/web/theo2/sites   67G    24K        66G     1%    /pool1/web/theo2/sites
pool1/zones             67G    29K        66G     1%    /pool1/zones
rpool                   34G    21K        21G     1%    /rpool
rpool/var               34G    19K        21G     1%    /rpool/var
rpool/var/log           34G    18K        21G     1%    /rpool/var/log
rpool0                  33G    21K        23G     1%    /rpool0
rpool0/ROOT             33G    18K        23G     1%    /rpool0/ROOT
rpool/var/log/web       34G  1012K        21G     1%    /var/log/web
pool1/zones/sdev-zfs1008BE-s10u6    67G   732M        66G     2%    /pool1/zones/sdev-zfs1008BE-s10u6

NAME                               USED  AVAIL  REFER  MOUNTPOINT
pool1                             1.10G  65.8G    22K  /pool1
pool1/home                        39.3M  65.8G  39.1M  /pool1/home
pool1/home@home                    196K      -  39.1M  -
pool1/web                         1.18M  65.8G  27.5K  /pool1/web
pool1/web/iws2                    1.11M  65.8G   284K  /pool1/web/iws2
pool1/web/iws2/sites               850K  65.8G   850K  /pool1/web/iws2/sites
pool1/web/theo2                     50K  65.8G  25.5K  /pool1/web/theo2
pool1/web/theo2/sites             24.5K  65.8G  24.5K  /pool1/web/theo2/sites
pool1/zones                       1.06G  65.8G    29K  /pool1/zones
pool1/zones/sdev                   942M  65.8G   942M  /pool1/zones/sdev
pool1/zones/sdev@zfs1008BE         264K      -   942M  -
pool1/zones/sdev-zfs1008BE        35.5M  65.8G   943M  /pool1/zones/sdev-zfs1008BE
pool1/zones/sdev-zfs1008BE@s10u6  7.00M      -   942M  -
pool1/zones/sdev-zfs1008BE-s10u6   104M  65.8G   732M  /pool1/zones/sdev-zfs1008BE-s10u6
rpool                             12.5G  21.2G    21K  /rpool
rpool/ROOT                        7.52G  21.2G    18K  legacy
rpool/ROOT/zfs1008BE              7.52G  21.2G  4.05G  /.alt.zfs1008BE
rpool/ROOT/zfs1008BE/var          3.47G  21.2G  3.47G  /.alt.zfs1008BE/var
rpool/dump                        2.01G  21.2G  2.01G  -
rpool/export                      1021M  21.2G    20K  /export
rpool/export/scratch              1021M  21.2G  1021M  /export/scratch
rpool/swap                        2.00G  23.2G  15.7M  -
rpool/var                         1.02M  21.2G    19K  /rpool/var
rpool/var/log                     1.01M  21.2G    18K  /rpool/var/log
rpool/var/log/web                 1012K  21.2G  1012K  /var/log/web
rpool0                            10.9G  22.6G  21.5K  /rpool0
rpool0/ROOT                       6.89G  22.6G    18K  /rpool0/ROOT
rpool0/ROOT/s10u6                 6.89G  22.6G  4.72G  /
rpool0/ROOT/s10u6/var             2.17G  22.6G  2.17G  /var
rpool0/dump                       2.01G  22.6G  2.01G  -
rpool0/swap                       2.00G  24.6G    16K  -

      Dump content: kernel pages
       Dump device: /dev/zvol/dsk/rpool0/dump (dedicated)
Savecore directory: /var/crash/joker
  Savecore enabled: yes

swapfile             dev  swaplo blocks   free
/dev/zvol/dsk/rpool0/swap 256,4      16 4196336 4196336
  1. This step is recommended, since if lumount fails, ludelete would fail as well. So fixing it early makes troubleshooting easier.

    lumount zfs1008BE /mnt
    

    And ooops, we get the following error:

    ERROR: cannot mount '/mnt/var': directory is not empty
    ERROR: cannot mount mount point </mnt/var> device <rpool/ROOT/zfs1008BE/var>
    ERROR: failed to mount file system <rpool/ROOT/zfs1008BE/var> on </mnt/var>
    ERROR: unmounting partially mounted boot environment file systems
    ERROR: No such file or directory: error unmounting <rpool/ROOT/zfs1008BE>
    ERROR: umount: warning: rpool/ROOT/zfs1008BE not in mnttab
    umount: rpool/ROOT/zfs1008BE no such file or directory
    ERROR: cannot unmount <rpool/ROOT/zfs1008BE>
    ERROR: cannot mount boot environment by name <zfs1008BE>
    

    To check, what's going wrong, one may use the procedure described in Debugging lucreate, lumount, luumount, luactivate, ludelete and have a look at lumount.trc.zfs1008BE.log :

    CPU  PID   TIME          	COMMAND
      1  48923 12341952594063	lumount zfs1008BE /mnt
      0  48923 12341971394229	/etc/lib/lu/plugins/lupi_zones plugin
      1  48923 12341972471729	/etc/lib/lu/plugins/lupi_svmio plugin
      0  48923 12341991109479	/etc/lib/lu/plugins/lupi_bebasic plugin
      1  48923 12342011131729	metadb
      0  48923 12342087965979	zfs set mountpoint=/mnt rpool/ROOT/zfs1008BE
      1  48923 12343091202978	zfs get -Ho value mounted rpool/ROOT/zfs1008BE
      1  48923 12343115644562	zfs mount rpool/ROOT/zfs1008BE
      1  48923 12343186977645	zfs get -Ho value mounted rpool/ROOT/zfs1008BE/var
      0  48923 12343212715811	zfs mount rpool/ROOT/zfs1008BE/var
      0  48923 12344023001311	lockfs -f /mnt/
      0  48923 12344027045227	umount -f /mnt/
      0  48923 12344031500477	umount -f /mnt
      1  48923 12344561966727	umount rpool/ROOT/zfs1008BE
      1  48923 12344568052643	umount -f rpool/ROOT/zfs1008BE
    

    Here we can see: lumount does "nothing special" or "magic things". Mounting the the old BE's / succeeds, however mounting its /var FS fails. Because the mountpoint of the BEs / has been already set to /mnt by lumount, we do:

    zfs mount rpool/ROOT/zfs1008BE
    ls -alR /mnt/var
    
    /mnt/var:
    total 14
    drwxr-xr-x   4 root     root           4 Nov 27 05:09 .
    drwxr-xr-x  33 root     root          35 Dec  2 03:41 ..
    drwx------   3 root     root           3 Nov 27 05:09 log
    drwx------   2 root     root           4 Nov 27 05:30 run
    
    /mnt/var/log:
    total 9
    drwx------   3 root     root           3 Nov 27 05:09 .
    drwxr-xr-x   4 root     root           4 Nov 27 05:09 ..
    drwx------   2 root     root           2 Nov 27 05:09 web
    
    /mnt/var/log/web:
    total 6
    drwx------   2 root     root           2 Nov 27 05:09 .
    drwx------   3 root     root           3 Nov 27 05:09 ..
    
    /mnt/var/run:
    total 10
    drwx------   2 root     root           4 Nov 27 05:30 .
    drwxr-xr-x   4 root     root           4 Nov 27 05:09 ..
    -rw-r--r--   1 root     root           4 Nov 29 06:30 bootadm.lock
    -rw-r--r--   1 root     root           3 Nov 29 06:30 ipmon.pid
    

    Because /var/run is usually mounted on swap and all other directories are empty, we can conclude, that we probably see relicts from the upgrade process but don't really care about it, because not needed anymore. So we solve our problem and check again using the commands shown below.

    rm -rf /mnt/var/*
    zfs umount /mnt
    lumount zfs1008BE /mnt
    luumount zfs1008BE
    

    Well, now all this works and we are able to continue with the next step.

  2. delete the old BE

    Never delete a BE, where an error occured during lucreate when snapshoting relevant datasets. E.g.

    Creating snapshot for <pool1/zones/sdev-zfs1008BE-s10u6> on \
    	<pool1/zones/sdev-zfs1008BE-s10u6@s10u6_20081203>
    ERROR: cannot create snapshot \
    	'pool1/zones/sdev-zfs1008BE-s10u6@s10u6_20081203': dataset is busy
    ERROR: Unable to snapshot <pool1/zones/sdev-zfs1008BE-s10u6> on \
    	<pool1/zones/sdev-zfs1008BE-s10u6@s10u6_20081203>
    cannot open 'pool1/zones/sdev-zfs1008BE-s10u6_20081203': dataset does not exist
    cannot open 'pool1/zones/sdev-zfs1008BE-s10u6_20081203': dataset does not exist
    cannot open 'pool1/zones/sdev-zfs1008BE-s10u6_20081203': dataset does not exist
    Population of boot environment <s10u6_20081203> successful.
    

    First create another snapshot of the original ZFS (here pool1/zones/sdev-zfs1008BE-s10u6@stillused) to prevent ludelete from destroying it (here pool1/zones/sdev-zfs1008BE-s10u6). It's probably a bug and should not happen, however, it happend - the whole zone from the current BE was gone as well (even so not reproducable)!

    ludelete zfs1008BE
    lustatus
    
    umount: warning: /.alt.tmp.b-c0.mnt/pool1/zones/sdev-zfs1008BE not in mnttab
    umount: /.alt.tmp.b-c0.mnt/pool1/zones/sdev-zfs1008BE not mounted
    Deleting ZFS dataset <pool1/zones/sdev-zfs1008BE>.
    ERROR: cannot destroy 'pool1/zones/sdev-zfs1008BE': filesystem has dependent clones
    use '-R' to destroy the following datasets:
    pool1/zones/sdev-zfs1008BE-s10u6
    ERROR: Unable to delete ZFS dataset <pool1/zones/sdev-zfs1008BE>.
    Determining the devices to be marked free.
    Updating boot environment configuration database.
    Updating boot environment description database on all BEs.
    Updating all boot environment configuration databases.
    Boot environment <zfs1008BE> deleted.
    
    Boot Environment           Is       Active Active    Can    Copy      
    Name                       Complete Now    On Reboot Delete Status    
    -------------------------- -------- ------ --------- ------ ----------
    s10u6                      yes      yes    yes       no     -         
    

    As we can see, the old zfs1008BE with its ZFSs rpool/ROOT/zfs1008BE, rpool/ROOT/zfs1008BE/var, pool1/zones/sdev-zfs1008BE and pool1/zones/sdev@zfs1008BE could be deleted successfully, but not without any "errors". Of course we do not want to have pool1/zones/sdev-zfs1008BE-s10u6 destroyed, since it is still used as the base of the zone sdev. So
    DON'T do, what ludelete actually suggests!

    What we do instead is to promote this ZFS clone of the pool1/zones/sdev@zfs1008BE ZFS snapshot (see zfs(1M)). Note, that this is not really required (because on pool1), but since the original BE doesn't exist anymore and the old zone (at least if it is a sparse zone) would use ZFS from the new BE, a rollback to the old zone would not make sense, because then it would be more or less inconsistent and cause sooner or later trouble. So free its resources aka disk space!

  3. promote zone ZFSs, which depend on the old BE

    # check potential candidates
    zoneadm list -cp | grep -v :global: | cut -f2,4 -d:
    # promote apropriate ZFSs
    zfs promote pool1/zones/*-zfs1008BE-s10u6
    # make sure, the VALUE of the origin is '-'
    zfs get origin pool1/zones/*-zfs1008BE-s10u6
    
  4. destroy snapshots and old parents of promoted ZFSs

    zfs destroy pool1/zones/sdev-zfs1008BE-s10u6@zfs1008BE
    zfs destroy pool1/zones/sdev
    
  5. transfer remaining ZFSs

    So what is left?

    zfs list | egrep '^rpool( |/)'
    ksh
    zoneadm list -cp | grep -v :global: | cut -f2 -d: | while read ZN ; do
    	echo $ZN
    	zonecfg -z $ZN info fs | /usr/xpg4/bin/egrep -E '[[:space:]](dir|special):'
    	zonecfg -z $ZN info zonepath | grep ': rpool/'
    done
    exit
    # not shown below
    swap -l
    dumpadm
    ls -al /rpool /export /rpool/var /rpool/var/log
    
    rpool                             5.01G  28.7G    21K  /rpool
    rpool/ROOT                          18K  28.7G    18K  legacy
    rpool/dump                        2.01G  28.7G  2.01G  -
    rpool/export                      1021M  28.7G    20K  /export
    rpool/export/scratch              1021M  28.7G  1021M  /export/scratch
    rpool/swap                        2.00G  30.7G  15.7M  -
    rpool/var                         1.02M  28.7G    19K  /rpool/var
    rpool/var/log                     1.01M  28.7G    18K  /rpool/var/log
    rpool/var/log/web                 1012K  28.7G  1012K  /var/log/web
    
    sdev
            dir: /var/log/httpd
            special: /var/log/web/iws2
            dir: /data/sites
            special: /pool1/web/iws2/sites
    

    So we can see, that the only ZFS we have to consider for transfer are rpool/var/log/web and rpool/export/scratch, because all others are either not used anymore or have no data in it (except the directory used as mountpoint for another ZFS). For obvious reasons we do not transfer /export/scratch but just destroy it and create a new one in rpool0. So /var/log/web is the only one left over.

    However, since it is used by the zone sdev, we have to shutdown the zone first, transfer the fs and finally restart the zone.

    # get rid of the old /export/scratch and create a new one on rpool0
    zfs destroy -rR rpool/export
    zfs create -o mountpoint=/export rpool0/export
    zfs create -o sharenfs='rw=sol:bsd:lnx,root=admhosts' \
    	-o quota=8G pool0/export/scratch
    chmod 1777 /export/scratch
    # transfer /var/log/web
    zlogin sdev 'init 5'
    sh -c 'while [ -n "`zoneadm list | grep sdev`" ]; do sleep 1; done'
    zfs umount rpool/var/log/web
    zfs snapshot rpool/var/log/web@xfer
    zfs send -R rpool/var/log/web@xfer | zfs receive -d rpool0
    # restart the zone
    zoneadm -z sdev boot
    # fix /etc/power.conf
    gsed -i 's,/rpool/,/rpool0/,g' /etc/power.conf
    pmconfig
    # optional: destroy the source of transfered ZFS
    zfs destroy rpool0/var/log/web@xfer
    zfs destroy -r rpool/var
    zfs destroy -r rpool/export
    
    Since the new ZFS inherits all properties from the old one (caused by the -R option), changing its mountpoint back to /var/log/web etc. is not needed - zfs will even mount it automatically on /var/log/web as long as this directory is empty and not in use.
  6. destroy the old pool

    Just to make sure, that not unintentionally valid data gets destroyed, one should have a look at the pool!

    zfs list | egrep '^rpool( |/)'
    zpool destroy rpool
    
    rpool                             4.01G  29.7G    20K  /rpool
    rpool/ROOT                          18K  29.7G    18K  legacy
    rpool/dump                        2.01G  29.7G  2.01G  -
    rpool/swap                        2.00G  31.7G  15.7M  -
    
  7. make lufslist and friends happy

    Since there is no lusync_icf one needs to do this manually. The helper below should make and display the proper changes, however one should of course verify, that the changes being made are correct.

    ksh
    BE=`lucurr`
    ICF=`grep :${BE}: /etc/lutab | awk -F: '{ print "/etc/lu/ICF." $1 }'`
    cp -p $ICF ${ICF}.bak
    gsed -i -r -e '/:rpool:/ d' -e 's,:(/?rpool)([:/]),:\10\2,g' $ICF
    diff -u ${ICF}.bak $ICF
    print "\nVERIFY that $ICF is correct - if not\ncp -p ${ICF}.bak $ICF\n"
    lufslist $BE
    exit
    
  8. format HDD1 and zpool attach to HDD0

    Since we want to the boot disk mirrored by ZFS, we have to make sure, that s0 of HDD1 has at least the same size in blocks as s0 of HDD0. Otherwise zfs will deny attaching it.

    # compare 'Sector Count' of 'Partition' 0
    prtvtoc /dev/rdsk/c0t0d0s2 | /usr/xpg4/bin/grep -E 'Count|^[[:space:]]*0'
    prtvtoc /dev/rdsk/c0t1d0s2 | /usr/xpg4/bin/grep -E 'Count|^[[:space:]]*0'
    # if size is not OK, adjust it via 'format' - see above
    # and finally attach the slice to form a 2-way mirror
    zpool attach rpool0 c0t0d0s0 c0t1d0s0
    zpool status
    sleep 900
    print "\n##### status #####\n"
    zpool status | grep scrub:
    
    * Partition  Tag  Flags    Sector     Count    Sector  Mount Directory
           0      2    00          0  71763164  71763163
    
    * Partition  Tag  Flags    Sector     Count    Sector  Mount Directory
           0      2    00          0  72169752  72169751
    
      pool: pool1
     state: ONLINE
     scrub: none requested
    config:
    
            NAME        STATE     READ WRITE CKSUM
            pool1       ONLINE       0     0     0
              mirror    ONLINE       0     0     0
                c0t2d0  ONLINE       0     0     0
                c0t3d0  ONLINE       0     0     0
    
    errors: No known data errors
    
      pool: rpool0
     state: ONLINE
    status: One or more devices is currently being resilvered.  The pool will
            continue to function, possibly in a degraded state.
    action: Wait for the resilver to complete.
     scrub: resilver in progress for 0h0m, 3.62% done, 0h23m to go
    config:
    
            NAME          STATE     READ WRITE CKSUM
            rpool0        ONLINE       0     0     0
              mirror      ONLINE       0     0     0
                c0t0d0s0  ONLINE       0     0     0
                c0t1d0s0  ONLINE       0     0     0
    
    errors: No known data errors
    
    ##### status #####
    
     scrub: none requested
     scrub: resilver completed after 0h14m with 0 errors on Wed Dec  3 06:05:06 2008
    
  9. Resolve ZFS Boot Issues

    see ZFS Boot Issues and Solaris 10 10/08 Release and Installation Collection >> Solaris 10 10/08 Release Notes >> 2. Solaris Runtime Issues >> File Systems.

    # ------------ on sparc ---------
    dd if=/dev/rdsk/c0t1d0s0 of=/tmp/bb bs=1b iseek=1 count=15
    dd if=/dev/rdsk/c0t1d0s0 of=/tmp/bb bs=1b iseek=1024 oseek=15 count=16
    cmp /tmp/bb /usr/platform/`uname -i`/lib/fs/zfs/bootblk
    # if they differ, fix it:
    installboot -F zfs /usr/platform/`uname -i`/lib/fs/zfs/bootblk \
    	/dev/rdsk/c0t1d0s0
    
    ls -al /rpool0/platform/`uname -m`/bootlst
    # if this failed
    mkdir -p /rpool0/platform/`uname -m`/
    cp -p /platform/`uname -m`/bootlst /rpool0/platform/`uname -m`/bootlst
    
    # ------------ on x86 (not yet tested) -----------
    installgrub /boot/grub/stage1 /boot/grub/stage2 /dev/rdsk/c0t1d0s0
    
  10. cleanup artifacts and have a beer

    rm -rf /.alt.*
    rmdir /rpool
    

day-to-day patching (ZFS)

When you have studied and understood all previous sections, you should have a pretty good understanding, how to do the day-to-day patching of a ZFS based Solaris system. However, for our convinience the summary.

+ tcsh
mount /local/misc
set path = ( /usr/bin /usr/sbin /local/misc/sbin )
setenv RPOOL `df / | grep -v ^Filesystem | cut -f1 -d/`
setenv NBE s10u8_`date '+%Y%m%d'`

lupatch.sh -d -R /
# stop here - if no new patches are available

lupatch.sh -r -R /
# edit patch lists for each zones if not all uninstalled patches should be
# installed:
# delete all unwanted patch lines - for permanent ignorance put something
# like 'ignore=123456-07' into /etc/pca.conf
vi /var/tmp/patchList.*

lucreate -n $NBE -p $RPOOL
umount /mnt
lumount $NBE /mnt
# do not continue before lumount above works - see lumount trouble
lupatch.sh -i
# fix possible LU updates: assumes 121430-50 (121431-51) or greater is installed
gpatch -p0 -d /mnt -b -z .orig < /local/misc/etc/lu-`uname -r`.patch
gsed -i -e '/^\/var\/mail/ s,^,#,' /mnt/etc/lu/synclist
# S10u8 includes zfs entries into /etc/vfstab - prevents boot: check & correct
cat /mnt/etc/vfstab
gsed -i -e '$d' /mnt/etc/vfstab
gsed -i -e '$d' /mnt/etc/vfstab

cd /
luumount $NBE

luactivate $NBE
# when the next maintainance window starts
init 6

Complete Examples

Copyright (C) 2008 Jens Elkner (jel+lu@cs.uni-magdeburg.de)