1) ISC DHCP Since SUN DHCP server is quite complex/cumbersome and hard to manage we (like many other admins) use the ISC DHCP server as the admins first choice dhcp server - some people still love the idea/paradigm "keep it small and simple". So the first hurdle to take was to determine the parameters required to get x86 jumpstart working. Since Sun JumpStart/NetworkInstall manuals always refer to its own defined option names (e.g. BootFile, BootSrvA) and gives no example, how one may do that with probably one of the most used dhcp servers aka ISC DHCP server (remember almost every Linux as well as *BSD system uses the ISC dhcp server as its default one). Furthermore the documentation is pretty confusing wrt. what is really required for x86 jumpstart. E.g. 817-5504 table 6-5 on page 105 lists vendor category options "required to enable a DHCP server to support Solaris installation clients". This might be the case for sparc based DHCP jumpstart, however for x86 jumpstart it turns out, that actually none of these options are required at all. Also the documentation explains a lot of ore or less useful stuff, but actually not, how it really works. I think, the following is, what one really needs to know, to get the picture, to be able to troubleshoot and finally to get it work (actually, don't know, whether this is really what happens, but this is my best guess adding 1 and 1): a) PXE tries to lookup via DHCP a tftpserver and the bootloader to download, which will be used for booting the [solaris] kernel via network. Wrt. ISC DHCP the tftp server is described by the parameter 'next-server' (for SUN DHCP by 'BootSrvA') and the bootloader by the parameter 'filename' (for SUN DHCP by 'BootFile'). For x86 jumpstart the bootloader file is usually pxegrub, which is automatically copied to /tftpboot/pxegrub.I86PC.Solaris_$SREL-$N and linked to /tftpboot/01$ETHERADDR of the client by add_install_client. So the appropriate section for ISC DHCP could look like this: group { # x86 jumpstart clients next-server 192.168.1.1; host dax { hardware ethernet 0:14:4f:aa:bb:cc; fixed-address dax; filename 0100144FAABBCC; } } subnet 192.168.1.0 netmask 255.255.255.0 { option domain-name-servers 192.168.1.2; option broadcast-address 192.168.1.255; option domain-name "deep.space.nine.net"; option routers 192.168.1.254; option time-servers 192.168.1.3; option ntp-servers 192.168.1.3; } b) PXE downloads pxegrub via tftp and starts it. c) pxegrub tries to obtain a list of parameters required to boot solaris via network (sparc OBP firmware does this by asking a bootparamd in the same network). These paremeters are listed in a text file named /tftpboot/menu.lst.01$ETHERADDR of the client, which gets automatically created by add_install_client using the -d and -e options. E.g.: /tftpboot/menu.lst.0100144FAABBCC d) pxegrub downloads the solaris kernel and miniroot described in the downloaded menu.lst.01* file via tftp. e) pxegrub loads and boots the kernel with the bootparameters given in the downloaded menu.lst.01* file f) The kernel takes over, starts the OS and usually the install process. 2) The next obstacle is, that something like: name_service=NIS+ {domain_name=deep.space.nine.net \ name_server=odo(192.168.1.3)} network_interface=PRIMARY {default_route=192.168.1.254 \ netmask=255.255.255.0 protocol_ipv6=no} does not work for x86 jumpstart (neither for u3 nor for b57). Used this for sparc based jumpstarts for more than 10 years and never had a problem with it. So I needed to manipulate the begin scripts to be able to troubleshoot, why the installer always invokes the network setup dialogs first. My guess is, that it doesn't get the hostname right (this one and the nis+ name server name are the only fields, which are not preset with an appropriate value). Since I couldn't find out, how to get it work this way, I switched over to 'network_interface=PRIMARY {dhcp protocol_ipv6=no}'. This worked (i.e. causes not firing up the network setup dialogs). 3) However 'network_interface=PRIMARY {dhcp protocol_ipv6=no}' causes the installer to not set the SI_HOSTNAME variable on a x4500 and to set it to 'localhost' on an Ultra 40. However the correct hostname is essential for our begin and finish scripts to adjust the common configuration for the specific machine. So a further (quick and dirty) modification of the scripts was required to fix this problem: SI_HOSTNAME=`nismatch -s, addr=$SI_HOSTADDRESS hosts.org_dir | cut -f1 -d,` 4) Furthermore 'network_interface=PRIMARY {dhcp protocol_ipv6=no}' causes the installer to setup the machine as a dhcp client, but we wanna have static IP-Addresses for all Solaris machines. It turned out, that just removing /a/etc/dhcp.* is not sufficient. So finish script required additional work to remove /a/etc/dhcp/*.dhc as well, to deterin the used NIC and to setup /a/etc/hostname.*, /a/etc/inet/hosts, /a/etc/inet/ipnodes and /a/etc/nodename "manually"... BTW: First time a got an jupstarted x86 and thus dhcp enabled machine, Solaris was unable to deterine its hostname as well. Pretty strange, since any none-Solaris machine (i.e. Linux, BSD and even Windows client) is able to get its hostname via ISC dhcp... 5) The next problem is, that one can't do something like chroot /a $script in the finish script anymore. E.g. [ -x /usr/bin/gconftool-2 ] returns true, but /usr/bin/gconftool-2 ... gives an exec error - strange. So I needed to create "invent" a mechanism, which starts the script, when the machine comes up the first time after its installation and of course at the right time in the boot sequence ... 6) Last but not least: Our install scripts (begin/finish) require sed/egrep with full regular regex support. Since SUNWxcu4 is missing in the miniroot, up to now (i.e. for sparc) we did a simple: pkgadd -R $DEST_PATH/$OS_RELEASE/Tools/Boot \ -d $DEST_PATH/$OS_RELEASE/Product SUNWxcu4 and everything was ok. For x86 it took some addtional time to find out, how to create a new miniroot, mainly because of the documentation bug in 817-5504 on page 127 point f). Finally I came up with: # on x86 SCRATCH=/export/scratch/$USER/i386 + rm -rf $SCRATCH ; + mkdir -p $SCRATCH + /boot/solaris/bin/root_archive unpackmedia /net/install/$DEST_PATH $SCRATCH + pkgadd -R $SCRATCH -d /net/install/$DEST_PATH/$OS_RELEASE/Product SUNWxcu4 DST=$HOME/tmp/miniroot/${OS_RELEASE}_$OS_SUBRELEASE + rm -rf $DST + mkdir -p $DST/${OS_RELEASE} + mkdir -p $DST/boot/grub + /boot/solaris/bin/root_archive packmedia $DST $SCRATCH # on sparc install server DST=$HOME/tmp/miniroot/${OS_RELEASE}_$OS_SUBRELEASE cd $DEST_PATH/$OS_RELEASE/Tools/Boot + mkdir ../Boot.orig find . | + cpio -puvmd ../Boot.orig cd $DEST_PATH/boot + mkdir ../boot.orig find . | + cpio -puvmd ../boot.orig cd $DST find . | + cpio -puvmd $DEST_PATH 7) last but not least I got to fix the x86 add_install_client, which prevents (if unmodified) the install server to come up on reboot, if the install dir resides on a ZFS. So now I use: + patch -b -u -d $DEST_PATH/$OS_RELEASE/Tools \ -i /pool1/install/jumpstart/misc/add_install_client_x86.patch whereby the patch is this one: #################### start patch ########################## --- add_install_client.orig Thu Aug 31 10:52:27 2006 +++ add_install_client Wed Feb 14 04:07:58 2007 @@ -1899,40 +1899,37 @@ if [ "${PGRP}" = "i86pc" ]; then - # lofs mount /boot directory under /tftpboot + # cp /boot directory under /tftpboot if [ ! -f ${IMAGE_PATH}/grub/pxegrub ]; then echo "${myname}: ${IMAGE_PATH}/grub/pxegrub does not exist," \ "invalid boot image" cleanup_and_exit 1 fi + if [ ! -f /tftpboot/vfstab ]; then + rm -rf /tftpboot/vfstab + echo '# Need this file to remember, where /tftpboot/I86PC.Solaris_10-* comes from. +# So do not delete it - maintained by x86 add_install_client itself' \ + >/tftpboot/vfstab + fi - # Check if it is already mounted - line=`grep "^${IMAGE_PATH}[ ]" /etc/vfstab` + # Check if it is already copied + line=`grep "^${IMAGE_PATH}[ ]" /tftpboot/vfstab` + DIR_EXISTS="" if [ $? = 0 ]; then mountpt=`echo $line | cut -d ' ' -f3` BootLofs=`basename "${mountpt}"` BootLofsdir=`dirname "${mountpt}"` - if [ ${BootLofsdir} != ${Bootdir} ]; then - printf "${myname}: ${IMAGE_PATH} mounted at" - printf " ${mountpt}\n" - printf "${myname}: retry after unmounting and deleting" - printf " entry form /etc/vfstab\n" - cleanup_and_exit 1 + if [ ${BootLofsdir} != ${Bootdir} -o ! -f $mountpt/multiboot ]; then + rm -rf "${mountpt}" + tf=`mktemp /tmp/vfstab.XXXXX` + grep -v "^${IMAGE_PATH}[ ]" /tftpboot/vfstab >$tf + cp $tf /tftpboot/vfstab + rm -f $tf + else + DIR_EXISTS="y" fi - - # Check to see if the mount is sane, if not, kick it. - # - # Note: One might think that the case when kicking the - # mounpoint won't work should then be handled, but - # if that were the case, the code path for no existing - # mount would have been taken resulting in a new - # mountpoint being created. - # - if [ ! -f $mountpt/multiboot ]; then - umount $mountpt - mount $mountpt - fi - else + fi + if [ -z "$DIR_EXISTS" ]; then # get a new directory name and mount IMAGE_PATH max=0 for i in ${Bootdir}/I86PC.${VERSION}* ; do @@ -1945,14 +1942,15 @@ BootLofs=I86PC.${VERSION}-${max} mkdir -p ${Bootdir}/${BootLofs} - mount -F lofs -o ro ${IMAGE_PATH} ${Bootdir}/${BootLofs} + cd ${IMAGE_PATH} + find . | cpio -puvmd ${Bootdir}/${BootLofs} if [ $? != 0 ]; then - echo "${myname}: failed to mount ${IMAGE_PATH} on" \ + echo "${myname}: failed to copy ${IMAGE_PATH} to" \ "${Bootdir}/${BootLofs}" cleanup_and_exit 1 fi - printf "${IMAGE_PATH} - ${Bootdir}/${BootLofs} " >> /etc/vfstab - printf "lofs - yes ro\n" >> /etc/vfstab + printf "${IMAGE_PATH} - ${Bootdir}/${BootLofs} " >> /tftpboot/vfstab + printf "lofs - yes ro\n" >> /tftpboot/vfstab fi # cleanup of lofs mount is done after Menufile setup #################### stop patch ####################################### 8) The last thing I still need to find out is, how to get nv b57 to install the correct bzipped boot_archives: Right now, it seems to pretty inefficient wrt / FS. I.e. to get usable boot_archives at all, I need to manually rm -f /a/platform/i86pc/boot_archive rm -f /a/platform/i86pc/amd64/boot_archive bootadm update-archive -R /a in the finish script. Braindeadly bootadm than creates these files again with same size AND UNCOMPRESSED!!! So when the system tries to boot the first time, one gets several "NOTICE: alloc: /: file system full" warnings and the system doesn't come up, but asking for Root password for system maintenance: # df -k Filesystem kbytes used avail capacity Mounted on /dev/dsk/c5t0d0s0 497312 483184 0 100% / # ls -al /platform/i86pc/boot_archive /platform/i86pc/amd64/boot_archive -rw-r--r-- 1 root root 112349184 Feb 22 05:43 /platform/i86pc/amd64/boot_archive -rw-r--r-- 1 root root 112349184 Feb 22 05:43 /platform/i86pc/boot_archive Hello, a / partition of 512 MB is not enough? That's rediculous! IMHO 256 MB is still pretty large and should be sufficient for x86 Solaris as well! To fix the problem, I need to setup an /a/etc/rcS.d/S80firsttime script, which recreates the boot archives early in the boot process: # rm /platform/i86pc/boot_archive /platform/i86pc/amd64/boot_archive # bootadm update-archive # df -k Filesystem kbytes used avail capacity Mounted on /dev/dsk/c5t0d0s0 497312 302576 145005 68% / # ls -al /platform/i86pc/boot_archive /platform/i86pc/amd64/boot_archive -rw-r--r-- 1 root root 20664320 Feb 22 05:59 /platform/i86pc/amd64/boot_archive -rw-r--r-- 1 root root 19175424 Feb 22 05:59 /platform/i86pc/boot_archive So this thing is pretty "unique" and not wanted. Just remember, on a Linux-Box with a / fs of 256 MB one is able to host about 10 kernels incl. its full blown driver archive aka /lib/modules/2.6.* ... 9) More or less minor: x86 probe_rootdisk does not find a boot disk on a X4500, which actually causes a div by zero awk error. So here one needs to do additional work again, to be able to do SI_ROOTDISK related tasks in jumpstart...