Solaris 11.1: Moving your zones

Because the offical documentation for moving zones around is IMHO a little bit sparse, partially wrong and lacks "real world" examples, I decided to write down, how we move zones from one bare metal box to another in an efficient way, reducing the time of the zone unavailability to an absolute minimum.

However, to understand the stuff below, one should keep the following simple rules in mind, we established to make the management of our zones and boxes a little bit easier:

All server zones are based on the group/system/solaris-small-server package minus some unneeded bloat and documentation (see manifest.xml and the "Minimize your zones" article).
The only software which gets installed in addition into global zones (GZ) are drivers, network and diag utils for troubleshooting as well as tools related to the attached storage (e.g. see ai.xml).
A global zone never shares any ZFS, neither via NFS nor CIFS/samba.
No zone uses a physical NIC directly. All (including the global zones) use virtual NICs (vnic) and exclusive IP-stacks, only.
The vnics have the same basename as the zone they are attached to. E.g. if a zone's name is www, the corresponding vnics are named www0, www1, ..., wwwN.
All vnics have their mac address assigned explicitly. The first 3 bytes are organization specific (default: 00:00:01 for private), the 4th and 5th byte or the 3rd and 4th byte of the IP address assigned to the primary interface of the global zone, and the last byte is a simple enumeration. It is tried to use zoneId * 16 + numberOfTheVnic wrt. to the attached zone. However, only the global zone has a static ID of 0, all others may change on reboot (e.g. if a zone got removed), so the last byte can't always be safely calculated this way. E.g. if the primary interface of the GZ has the IP address 192.168.31.12, the default mac address of the primary interface would be 00:00:01:1F:0C:00, the one of the first zone installed 00:00:01:1F:0C:10, the 2nd interface of that zone 00:00:01:1F:0C:11 and so on.
The name of a non-global zone (NGZ) should be the same as the simple aka unqualified hostname of its primary interface.
All NGZ zonepathes have the form /zones/${ZNAME}, whereby $ZNAME denotes the name of the zone.
Each zonepath is a ZFS having rpool/zones as its parent (i.e. rpool/zones mountpoint is /zones).
All important data are NOT stored in the zone's rpool (or related children). Instead a separate ZFS of the GZ gets inherited to the zone as a "virtual zpool" aka dataset aliased as pool1. The parent of this ZFS within the GZ is pool1/${ZNAME}. If more datasets are needed, the same schema should be used. E.g. if a 2nd dataset is needed for a zone named www, one would use the zonecfg commands: add dataset ; set name=pool2/www/pool2 ; alias=pool2 ; end . This makes it easier to backup the right data (avoid redundant OS/software bits) efficiently and re-use them (e.g. for testing) in a completely different zone (just copy the ZFS and don't waste time to search for the required data).
To be able to always login as soon as possible, all zone local accounts are placed into the rpool ZFS (i.e. GZ: rpool/${HOST}/local/home/${USER}, NGZs: rpool/zones/${ZNAME}/rpool/local/home/${USER} always mounted to the zone's /local/home/${USER}), the home entry in the/etc/passwd gets explicitly set to this directory and the dependency to the fs-autofs service of the ssh:default, system-log:default (and usually smtp:sendmail) services gets deleted.

The zinstall helper scripts mentioned in the "Minimize your zones" article help us to obey the rules shown above and avoid writing boiler plate code again and again.

Moving a zone

Within the following example, baremetal is the hostname of the GZ where the NGZ named www should be transfered to. To have a more or less generic receipt, we first set the environment variable ZHOST to the target GZ's hostname and the variable ZNAME to the NGZ's name:

setenv ZHOST baremetal
setenv ZNAME www

NOTE: We are currently logged in as admin on the GZ where the NGZ to transfer is still running. Furthermore we have the primary-admin package installed, which allows us to execute any command with root privileges by prefixing appropriate commands with pfexec(1M) or simply + (which is a symlink to pfexec). The target GZ has the same account with the "Primary Administrator" role assigned as well, and our public ssh key is in its .ssh/authorized_keys2 file.

What we will do is basically a) configure the zone on the target GZ, b) copying the corresponding ZFSs to the target GZ and finally c) attaching the zone.

The manual mentions several times, that the output of one zone related utility can be piped into the input of a related command. However, since those utilities are still a little bit buggy, this sometimes doesn't work and thus we use temporary files as a workaround.

Phase 1: Configure the new zone

According to the manual, the first step one should do, is to check, whether the target GZ is able to host the NGZ to transfer:

+ zoneadm -z $ZNAME detach -n | ssh $ZHOST cat > /tmp/zone.cfg
ssh $ZHOST + zoneadm attach -n /tmp/zone.cfg || echo 'Zone xfer will not work'

The manual doesn't explain in detail, what it actually does - the impression of the author is, that it just does a zonecfg, records all errors and finally unconfigures the zone. Anyway, as long as it doesn't annoy us with errors, we continue to configure the zone on the target GZ (note that we use gsed to fix the bogus output of the export):

zonecfg -z $ZNAME export | gsed -re '/bootargs=/ s,(=|$),\1",g' >/tmp/zone.cmd
cat /tmp/zone.cmd | ssh $ZHOST "+ zonecfg -z $ZNAME -f -"

Finally we check, whether all vnics to be used in the zone exist:

gsed -ne '/ physical=/ { s,.*=,dladm show-vnic ,;p }' /tmp/zone.cmd | \
    ssh $ZHOST 'cat > /tmp/vnics.sh'
ssh $ZHOST ksh /tmp/vnics.sh

If you getting something like "... object not found", create the missing vnics. To get a hint, ihow to do it properly, we usually use the getnewvnic.sh script from our zinstall helper archive:

ssh $ZHOST /net/software/export/install/server/etc/getnewvnic.sh $ZNAME

Phase 2: Copy over related ZFSs

To reduce the downtime of the zone, we first snapshot the related ZFS and transfer the snaphots to the target GZ while the zone is still running. When all snapshots are copied, we shutdown the zone properly, create an incremental snapshot wrt. to the first one and merge over the remaining bits, which have changed during the first snapshot transfer. Since this is a incremental snapshot, it should be much faster, than the initial copy.

So first create the initial snapshot:

+ zfs snapshot -r rpool/zones/${ZNAME}@move
+ zfs snapshot -r pool1/${ZNAME}@move

Now we check, whether the transfer to the target GZ is ok. See zfs(1M) for more information.

+ zfs send -rc rpool/zones/${ZNAME}@move | \
    ssh $ZHOST + zfs receive -nv -e rpool/zones
+ zfs send -rc pool1/${ZNAME}@move | \
    ssh $ZHOST + zfs receive -nv -e pool1

If you got a clue, what it does, you can type Ctrl+C to terminate the dry-run. If it didn't show, what you expected, adjust the related zfs command options and try again.

Now copy over the ZFSs in question by omitting the dry-run option -n.

+ zfs send -rc rpool/zones/${ZNAME}@move | \
    ssh $ZHOST + zfs receive -v -e rpool/zones
+ zfs send -rc pool1/${ZNAME}@move | \
    ssh $ZHOST + zfs receive -v -e pool1

When this is finally done, we orderly shutdown the zone, and wait until it has the state installed (don't use "zoneadm -z $ZNAME halt" as described in the manuals, since this is just like pulling the plug out of your box):

+ zlogin $ZNAME init 5
# repeat until the zone shows up
zoneadm list -pi | grep  "^-:${ZNAME}:installed:"

Now create the final snapshots and transfer the remaining bits by adding -i and poolName@move1 to the options used for zfs send on the initial copy:

+ zfs snapshot -r rpool/zones/${ZNAME}@move1
+ zfs snapshot -r pool1/${ZNAME}@move1
+ zfs send -rci rpool/zones/${ZNAME}@move rpool/zones/${ZNAME}@move1 | \
    ssh $ZHOST + zfs receive -v -e rpool/zones
+ zfs send -rci pool1/${ZNAME}@move pool1/${ZNAME}@move1 | \
    ssh $ZHOST + zfs receive -v -e pool1

If you are not satisfied with the speed of the transfer, you may enable the rsh service on the target GZ and use rsh instead of ssh (unfortunately the Cipher none is not available in recent ssh versions anymore :( ). Another option is to use mbuffer on both sides (for more information see http://computing.thayer.dartmouth.edu/blog/2012/11/09/zfs-sendreceive-accross-different-transport-mechanisms/).

Phase 3: Attach and Boot, Cleanup

To get to make the zone available again, we attach it to the GZ:

ssh $ZHOST + zoneadm -z $ZNAME attach
ssh $ZHOST zoneadm list -pi | grep "^-:${ZNAME}:installed:"

The last command should display the attached zone. If so, boot it and check, whether everything is ok:

ssh $ZHOST + zoneadm -z $ZNAME boot
ssh $ZHOST zoneadm list -pi | grep ":${ZNAME}:running:"

Sooner or later the zone should be running and thus the last command shown above should diplay your transfered zone.

What we should not forget is to cleanup the zone/ZFS on the current machine or at least disable its autostart! To do the later, issue a:

print "set autoboot=false\ncommit" | + zonecfg -z $ZNAME

The other alternative is to destroy the zone and optionally all the ZFS used by it:

+ zoneadm -z $ZNAME uninstall
+ zonecfg -z $ZNAME delete
+ zfs destroy -r pool1/$ZNAME
# usually the uninstall comand deletes the zone's rpool incl. children.
# But just in case:
+ zfs destroy -r rpool/zones/$ZNAME

Last but not least you may remove the snapshots used to transfer the ZFS:

ssh $ZHOST + zfs destroy -r rpool/zones/${ZNAME}@move1
ssh $ZHOST + zfs destroy -r rpool/zones/${ZNAME}@move
ssh $ZHOST + zfs destroy -r pool1/${ZNAME}@move1
ssh $ZHOST + zfs destroy -r pool1/${ZNAME}@move

That's it! Lay back and have a beer ☻