Re: Nodes Install then fail to boot on successive boots


jprorama@gmail.com
 

Did some additional debugging.

The problem seems to relate to this stanza in the dhcpd.conf:

if exists user-class and option user-class = "iPXE" {
    filename "http://192.168.10.1/WW/ipxe/cfg/${mac}";
} else {
    if option architecture-type = 00:0B {
        filename "/warewulf/ipxe/bin-arm64-efi/snp.efi";
    } elsif option architecture-type = 00:0A {
        filename "/warewulf/ipxe/bin-arm32-efi/placeholder.efi";
    } elsif option architecture-type = 00:09 {
        filename "/warewulf/ipxe/bin-x86_64-efi/ipxe.efi";
    } elsif option architecture-type = 00:07 {
        filename "/warewulf/ipxe/bin-x86_64-efi/ipxe.efi";
    } elsif option architecture-type = 00:06 {
        filename "/warewulf/ipxe/bin-i386-efi/ipxe.efi";
    } elsif option architecture-type = 00:00 {
        filename "/warewulf/ipxe/bin-i386-pcbios/undionly.kpxe";
    }
}


When the virtualbox iPXE runs after a cold start (power cycle), the user-class = iPXE value appears to be set and dhcpd correctly provides the client filename "http://192.168.10.1/WW/ipxe/cfg/${mac}".  We can see it load in the httpd log files.  The client boots successfully.

When the virtualbox is reset (no power cycle), this first conditional test on the user-class variable fails. The conditions fall through to the last one:

    } elsif option architecture-type = 00:00 {
        filename "/warewulf/ipxe/bin-i386-pcbios/undionly.kpxe";
    }

Because this is not an http uri, a tftp request is started for the /warewulf/ipxe/bin-i386-pcbios/undionly.kpxe file.  We see that in the journalctl log.  It reads the file but can't boot using it.  Oddly the architecture-type isn't even correct, should be x86_64.

This seems like some state is stale/invalid in the iPXE fireware when a virtualbox image is reset versus when it is started from the off state.  This causes it's dhcp requests to not include all values expected by the dhcpd server and hence it fails.

The simple work around is to power cycle the vm.

It's odd, however, that it behaves this way.

Join users@lists.openhpc.community to automatically receive all group messages.