New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PXE/Grub network problems after version 1.16 #113
Comments
For reference, this issue is listed as one of the issues blocking some automated deployment/provisioning scenarios (using MAAS) as explained here: https://discourse.maas.io/t/raspberry-pi-4-provisioning-and-kvm-pod-setup/3607
|
I think I just fixed this.
|
I think we also want
|
I just tested the code. It behaves slightly differently from version 1.16 because it still prints some "couldn’t send network packet" errors, but after a while it boots correctly. So, I think the patch fixes this issue. This is the serial log:
|
Thanks for testing this. The posted patch should remove all those errors: https://edk2.groups.io/g/devel/topic/patch_bug_0_2_rpi_fix_pxe/82125865?p=,,,20,0,0,0::recentpostdate%2Fsticky,,,20,2,40,82125865 Particularly: https://edk2.groups.io/g/devel/topic/patch_1_2/82125863?p=,,,20,0,0,0::recentpostdate%2Fsticky,,,20,2,40,82125863 |
Under normal circumstances GenetSimpleNetworkTransmit won't be called unless the rest of the network stack detects the link is up. So, during normal operation when the adapter is initialized the link naturally transitions to link up, and then is ready for activity later in the boot sequence. If that hasn't happened by the time PXE runs then it will itself wait for the link. OTOH, the normal distro PXE sequence involves PXE loading shim which in turn loads grub, which tries to read machine specific configs, modules, and grub.cfg in order to prepare the boot menu. Then, once a grub selection is picked, it might try to load the kernel+initrd. In this sequence the network stack is shutdown and restarted multiple times. Grub though, starts up, notices its been network booted, reads saved network parameters and immediately tries to transmit data assuming the link is still up. When that happens grub will print "couldn’t send network packet" and if that lasts long enough it fails to load grub.cfg and the user gets dropped to the grub prompt because no one in the path bothers to assure the link state has transitioned back up. For reference: pftf/RPi4#113 Signed-off-by: Jeremy Linton <jeremy.linton@arm.com>
Under normal circumstances GenetSimpleNetworkTransmit won't be called unless the rest of the network stack detects the link is up. So, during normal operation when the adapter is initialized the link naturally transitions to link up, and then is ready for activity later in the boot sequence. If that hasn't happened by the time PXE runs then it will itself wait for the link. OTOH, the normal distro PXE sequence involves PXE loading shim which in turn loads grub, which tries to read machine specific configs, modules, and grub.cfg in order to prepare the boot menu. Then, once a grub selection is picked, it might try to load the kernel+initrd. In this sequence the network stack is shutdown and restarted multiple times. Grub though, starts up, notices its been network booted, reads saved network parameters and immediately tries to transmit data assuming the link is still up. When that happens grub will print "couldn’t send network packet" and if that lasts long enough it fails to load grub.cfg and the user gets dropped to the grub prompt because no one in the path bothers to assure the link state has transitioned back up. For reference: pftf/RPi4#113 This patch fixes that by polling the link state via GenericPhyUpdateConfig() for ten seconds in the transmit path when the link is down. If the link recovers within this timeout the state machine is transitioned and we continue data transmission. If the 10 seconds expires without the link resuming we will fail as before. While full link negotiation can be fast, it frequently can take a second or two, or longer depending on the remote peer on the other end of the Ethernet cable. It seems auto MDX can slow this down, and certain vendors products seem to be slower than the norm. Ten seconds may not cover some of these possibilities, but the user should validate cabling and the switch/peer's port configuration if resuming the link is taking > 10 seconds. Picking a longer timeout is a tradeoff between the machine appearing to hang for extended periods of time (due to grub retries) if the link is actually down vs generally providing enough time for most endpoints to complete the negotiation. Signed-off-by: Jeremy Linton <jeremy.linton@arm.com> Reviewed-by: Jared McNeill <jmcneill@invisible.ca> Reviewed-by: Andrei Warkentin <awarkentin@vmware.com>
I've got a similar issue with a different SoC (BE-M1000) and network driver. Perhaps the problem should be fixed in grub (instead of adding work-around to every driver)? |
The link down issue? This particular issue is I would say mostly a genet bug rather than grub. The link is resuming properly the problem is that the genet transmit path doesn't update its own state properly to reflect that. So there isn't really anything grub should do in the case but retry, which it is. AFAIK of course... If your referring to the second "fix" in my series, which adjusts the rpi DMA constraints, then that's a double bug. The DMA constraints are wrong, but grub is also doing something wrong and not calling the UEFI map/unmap correctly somewhere (I haven't had a chance to find that). BTW: I don't remember hearing about that SoC it looks pretty interesting. |
Under normal circumstances GenetSimpleNetworkTransmit won't be called unless the rest of the network stack detects the link is up. So, during normal operation when the adapter is initialized the link naturally transitions to link up, and then is ready for activity later in the boot sequence. If that hasn't happened by the time PXE runs then it will itself wait for the link. OTOH, the normal distro PXE sequence involves PXE loading shim which in turn loads grub, which tries to read machine specific configs, modules, and grub.cfg in order to prepare the boot menu. Then, once a grub selection is picked, it might try to load the kernel+initrd. In this sequence the network stack is shutdown and restarted multiple times. Grub though, starts up, notices its been network booted, reads saved network parameters and immediately tries to transmit data assuming the link is still up. When that happens grub will print "couldn’t send network packet" and if that lasts long enough it fails to load grub.cfg and the user gets dropped to the grub prompt because no one in the path bothers to assure the link state has transitioned back up. For reference: pftf/RPi4#113 This patch fixes that by polling the link state via GenericPhyUpdateConfig() for ten seconds in the transmit path when the link is down. If the link recovers within this timeout the state machine is transitioned and we continue data transmission. If the 10 seconds expires without the link resuming we will fail as before. While full link negotiation can be fast, it frequently can take a second or two, or longer depending on the remote peer on the other end of the Ethernet cable. It seems auto MDX can slow this down, and certain vendors products seem to be slower than the norm. Ten seconds may not cover some of these possibilities, but the user should validate cabling and the switch/peer's port configuration if resuming the link is taking > 10 seconds. Picking a longer timeout is a tradeoff between the machine appearing to hang for extended periods of time (due to grub retries) if the link is actually down vs generally providing enough time for most endpoints to complete the negotiation. Signed-off-by: Jeremy Linton <jeremy.linton@arm.com> Reviewed-by: Jared McNeill <jmcneill@invisible.ca> Reviewed-by: Andrei Warkentin <awarkentin@vmware.com>
Under normal circumstances GenetSimpleNetworkTransmit won't be called unless the rest of the network stack detects the link is up. So, during normal operation when the adapter is initialized the link naturally transitions to link up, and then is ready for activity later in the boot sequence. If that hasn't happened by the time PXE runs then it will itself wait for the link. OTOH, the normal distro PXE sequence involves PXE loading shim which in turn loads grub, which tries to read machine specific configs, modules, and grub.cfg in order to prepare the boot menu. Then, once a grub selection is picked, it might try to load the kernel+initrd. In this sequence the network stack is shutdown and restarted multiple times. Grub though, starts up, notices its been network booted, reads saved network parameters and immediately tries to transmit data assuming the link is still up. When that happens grub will print "couldn't send network packet" and if that lasts long enough it fails to load grub.cfg and the user gets dropped to the grub prompt because no one in the path bothers to assure the link state has transitioned back up. For reference: pftf/RPi4#113 This patch fixes that by polling the link state via GenericPhyUpdateConfig() for ten seconds in the transmit path when the link is down. If the link recovers within this timeout the state machine is transitioned and we continue data transmission. If the 10 seconds expires without the link resuming we will fail as before. While full link negotiation can be fast, it frequently can take a second or two, or longer depending on the remote peer on the other end of the Ethernet cable. It seems auto MDX can slow this down, and certain vendors products seem to be slower than the norm. Ten seconds may not cover some of these possibilities, but the user should validate cabling and the switch/peer's port configuration if resuming the link is taking > 10 seconds. Picking a longer timeout is a tradeoff between the machine appearing to hang for extended periods of time (due to grub retries) if the link is actually down vs generally providing enough time for most endpoints to complete the negotiation. Signed-off-by: Jeremy Linton <jeremy.linton@arm.com> Reviewed-by: Jared McNeill <jmcneill@invisible.ca> Reviewed-by: Andrei Warkentin <awarkentin@vmware.com>
This should be fixed with the 1.30 release. |
The network seems to no work properly in grub from version 1.17. I tried with a pendrive with only grub on it, and most of the commands related to network (like net_dhcp) show the error “couldn’t send network packet”. However, version 1.16 works. I also recompiled the version 1.21 with the 1.16 network drivers and all seems to work properly. I think this commit, introduced in version 1.17, broke something.
The text was updated successfully, but these errors were encountered: