Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PXE/Grub network problems after version 1.16 #113

Closed
lory696 opened this issue Nov 20, 2020 · 9 comments
Closed

PXE/Grub network problems after version 1.16 #113

lory696 opened this issue Nov 20, 2020 · 9 comments
Assignees
Labels
genet Pi4 NIC driver work items

Comments

@lory696
Copy link

lory696 commented Nov 20, 2020

The network seems to no work properly in grub from version 1.17. I tried with a pendrive with only grub on it, and most of the commands related to network (like net_dhcp) show the error “couldn’t send network packet”. However, version 1.16 works. I also recompiled the version 1.21 with the 1.16 network drivers and all seems to work properly. I think this commit, introduced in version 1.17, broke something.

@samerhaj samerhaj added the genet Pi4 NIC driver work items label Nov 25, 2020
@samerhaj
Copy link
Member

@andreiw

@jlinton jlinton changed the title Grub network problems after version 1.16 PXE/Grub network problems after version 1.16 Feb 22, 2021
@samerhaj
Copy link
Member

For reference, this issue is listed as one of the issues blocking some automated deployment/provisioning scenarios (using MAAS) as explained here: https://discourse.maas.io/t/raspberry-pi-4-provisioning-and-kvm-pod-setup/3607

Versions after 1.16 do not work. Apparently, a commit related to version 1.17 broke the network: the uefi firmware can download the grub bootloader, but then It always shows the error message “couldn’t send network packet”. In grub most of the commands related to network (like net_dhcp) show that error. However, version 1.16 works, and it managed to download kernel files. I opened an issue 21.

@jlinton
Copy link
Member

jlinton commented Apr 14, 2021

I think I just fixed this.

@@ -589,10 +617,16 @@ GenetSimpleNetworkTransmit (
   }

   if (!Genet->SnpMode.MediaPresent) {
-    //
-    // Don't bother transmitting if there's no link.
-    //
-    return EFI_NOT_READY;
+       Status = GenericPhyUpdateConfig (&Genet->Phy);^M
+       if (EFI_ERROR (Status)) {^M
+               //^M
+               // Don't bother transmitting if there's no link.^M
+               //^M
+               DEBUG ((DEBUG_ERROR, "%a: no link\n", __FUNCTION__));^M
+               return EFI_NOT_READY;^M
+       } else {^M
+               Genet->SnpMode.MediaPresent = TRUE;^M
+       }^M
   }

   if (HeaderSize != 0) {

@jlinton
Copy link
Member

jlinton commented Apr 15, 2021

I think we also want

+++ b/Platform/RaspberryPi/RPi4/RPi4.dsc
@@ -719,7 +719,7 @@
   Silicon/Broadcom/Drivers/Net/BcmGenetDxe/BcmGenetDxe.inf {
     <PcdsFixedAtBuild>
       gEmbeddedTokenSpaceGuid.PcdDmaDeviceOffset|0x00000000
-      gEmbeddedTokenSpaceGuid.PcdDmaDeviceLimit|0xffffffff
+      gEmbeddedTokenSpaceGuid.PcdDmaDeviceLimit|0xffffffffff^M
   }

   #

@lory696
Copy link
Author

lory696 commented Apr 16, 2021

I just tested the code. It behaves slightly differently from version 1.16 because it still prints some "couldn’t send network packet" errors, but after a while it boots correctly. So, I think the patch fixes this issue. This is the serial log:

>>Start PXE over IPv4InstallProtocolInterface: 245DCA21-FB7B-11D3-8F01-00A0C969723B 35386190
.InstallProtocolInterface: 41D94CD2-35B6-455A-8258-D4E51334AADD 3535C6A0
InstallProtocolInterface: 3AD9DF29-4501-478D-B1F8-7F7FE70E50F3 3535D138
InstallProtocolInterface: F4B427BB-BA21-4F16-BC4E-43E416AB619C 3535BEB0

  Station IP address is 192.168.10.105

  Server IP address is 192.168.10.1
  NBP filename is grubaa64.efi
  NBP filesize is 2157568 BytesInstallProtocolInterface: 245DCA21-FB7B-11D3-8F01-00A0C969723B 35386190
InstallProtocolInterface: 245DCA21-FB7B-11D3-8F01-00A0C969723B 35386190

 Downloading NBP file...

  NBP file downloaded successfully.
[Bds] Expand MAC(DCA632E22291,0x1)/IPv4(0.0.0.0,0x0,DHCP,0.0.0.0,0.0.0.0,0.0.0.0) -> MAC(DCA632E22291,0x1)/IPv4(0.0.0.0,0x0,DHCP,0.0.0.0,0.0.0.0,0.0.0.0)
[Security] 3rd party image[0] can be loaded after EndOfDxe: MAC(DCA632E22291,0x1)/IPv4(0.0.0.0,0x0,DHCP,0.0.0.0,0.0.0.0,0.0.0.0).
InstallProtocolInterface: 5B1B31A1-9562-11D2-8E3F-00A0C969723B 3535AD40
Loading driver at 0x00033768000 EntryPoint=0x00033768400
Loading driver at 0x00033768000 EntryPoint=0x00033768400
FSOpen: Open 'RPI_EFI.FD' Success
Variables dumped!
InstallProtocolInterface: BC62157E-3E33-4FEC-9920-2D3B36D750DF 35359F98
ProtectUefiImageCommon - 0x3535AD40
  - 0x0000000033768000 - 0x000000000020EC00
Unknown command `#'.
Try `help' for usage
Unknown command `#'.
Try `help' for usage

GenericPhyDetect: PHY detected at address 0x01 (PHYIDR1=0x600D, PHYIDR2=0x84A2)
GenetSimpleNetworkTransmit: no link
error: couldn't send network packet.
GenetSimpleNetworkTransmit: no link
error: couldn't send network packet.
GenetSimpleNetworkTransmit: no link
error: couldn't send network packet.
GenetSimpleNetworkTransmit: no link
error: couldn't send network packet.
GenetSimpleNetworkTransmit: no link
error: couldn't send network packet.
GenetSimpleNetworkTransmit: no link
error: couldn't send network packet.
GenetSimpleNetworkTransmit: no link
error: couldn't send network packet.
GenetSimpleNetworkTransmit: no link
error: couldn't send network packet.
GenericPhyGetConfig: Link speed 1000 Mbps, full-duplex
Booting under MAAS direction...  [ grub.cfg-dc:a6:32:e2  710B  100%  10.52B/s ]
EFI stub: Booting Linux Kernel...    [ boot-initrd  89.13MiB  100%  3.84MiB/s ]
EFI stub: Using DTB from configuration table
EFI stub: Exiting boot services and installing virtual address map...
XhcClearBiosOwnership: called to clear BIOS ownership
SetUefiImageMemoryAttributes - 0x0000000037140000 - 0x0000000000040000 (0x0000000000000008)
SetUefiImageMemoryAttributes - 0x00000000370F0000 - 0x0000000000040000 (0x0000000000000008)
SetUefiImageMemoryAttributes - 0x00000000370B0000 - 0x0000000000030000 (0x0000000000000008)
SetUefiImageMemoryAttributes - 0x0000000033A60000 - 0x00000000000B0000 (0x0000000000000008)
SetUefiImageMemoryAttributes - 0x0000000037070000 - 0x0000000000030000 (0x0000000000000008)
SetUefiImageMemoryAttributes - 0x0000000037030000 - 0x0000000000030000 (0x0000000000000008)
SetUefiImageMemoryAttributes - 0x0000000033A20000 - 0x0000000000030000 (0x0000000000000008)
[    0.000000] Booting Linux on physical CPU 0x0000000000 [0x410fd083]
[    0.000000] Linux version 5.8.0-34-generic

@jlinton
Copy link
Member

jlinton commented Apr 16, 2021

jlinton pushed a commit to jlinton/edk2-platforms that referenced this issue Apr 16, 2021
Under normal circumstances GenetSimpleNetworkTransmit won't be
called unless the rest of the network stack detects the link is
up. So, during normal operation when the adapter is initialized
the link naturally transitions to link up, and then is ready for
activity later in the boot sequence. If that hasn't happened by
the time PXE runs then it will itself wait for the link.

OTOH, the normal distro PXE sequence involves PXE loading shim
which in turn loads grub, which tries to read machine specific
configs, modules, and grub.cfg in order to prepare the boot menu.
Then, once a grub selection is picked, it might try to load the
kernel+initrd.

In this sequence the network stack is shutdown and restarted
multiple times. Grub though, starts up, notices its been network
booted, reads saved network parameters and immediately tries to
transmit data assuming the link is still up.

When that happens grub will print "couldn’t send network packet"
and if that lasts long enough it fails to load grub.cfg and the
user gets dropped to the grub prompt because no one in the path
bothers to assure the link state has transitioned back up.

For reference: pftf/RPi4#113

Signed-off-by: Jeremy Linton <jeremy.linton@arm.com>
jlinton pushed a commit to jlinton/edk2-platforms that referenced this issue May 18, 2021
Under normal circumstances GenetSimpleNetworkTransmit won't be
called unless the rest of the network stack detects the link is
up. So, during normal operation when the adapter is initialized
the link naturally transitions to link up, and then is ready for
activity later in the boot sequence. If that hasn't happened by
the time PXE runs then it will itself wait for the link.

OTOH, the normal distro PXE sequence involves PXE loading shim
which in turn loads grub, which tries to read machine specific
configs, modules, and grub.cfg in order to prepare the boot menu.
Then, once a grub selection is picked, it might try to load the
kernel+initrd.

In this sequence the network stack is shutdown and restarted
multiple times. Grub though, starts up, notices its been network
booted, reads saved network parameters and immediately tries to
transmit data assuming the link is still up.

When that happens grub will print "couldn’t send network packet"
and if that lasts long enough it fails to load grub.cfg and the
user gets dropped to the grub prompt because no one in the path
bothers to assure the link state has transitioned back up.

For reference: pftf/RPi4#113

This patch fixes that by polling the link state via
GenericPhyUpdateConfig() for ten seconds in the transmit path
when the link is down. If the link recovers within this timeout
the state machine is transitioned and we continue data
transmission. If the 10 seconds expires without the link
resuming we will fail as before. While full link negotiation
can be fast, it frequently can take a second or two, or longer
depending on the remote peer on the other end of the
Ethernet cable. It seems auto MDX can slow this down,
and certain vendors products seem to be slower than the
norm. Ten seconds may not cover some of these possibilities,
but the user should validate cabling and the switch/peer's
port configuration if resuming the link is taking > 10
seconds. Picking a longer timeout is a tradeoff between the
machine appearing to hang for extended periods of time
(due to grub retries) if the link is actually down vs
generally providing enough time for most endpoints to
complete the negotiation.

Signed-off-by: Jeremy Linton <jeremy.linton@arm.com>
Reviewed-by: Jared McNeill <jmcneill@invisible.ca>
Reviewed-by: Andrei Warkentin <awarkentin@vmware.com>
@asheplyakov
Copy link

I've got a similar issue with a different SoC (BE-M1000) and network driver. Perhaps the problem should be fixed in grub (instead of adding work-around to every driver)?

@jlinton
Copy link
Member

jlinton commented Jun 18, 2021

The link down issue? This particular issue is I would say mostly a genet bug rather than grub. The link is resuming properly the problem is that the genet transmit path doesn't update its own state properly to reflect that. So there isn't really anything grub should do in the case but retry, which it is. AFAIK of course...

If your referring to the second "fix" in my series, which adjusts the rpi DMA constraints, then that's a double bug. The DMA constraints are wrong, but grub is also doing something wrong and not calling the UEFI map/unmap correctly somewhere (I haven't had a chance to find that).

BTW: I don't remember hearing about that SoC it looks pretty interesting.

jlinton pushed a commit to jlinton/edk2-platforms that referenced this issue Jul 2, 2021
Under normal circumstances GenetSimpleNetworkTransmit won't be
called unless the rest of the network stack detects the link is
up. So, during normal operation when the adapter is initialized
the link naturally transitions to link up, and then is ready for
activity later in the boot sequence. If that hasn't happened by
the time PXE runs then it will itself wait for the link.

OTOH, the normal distro PXE sequence involves PXE loading shim
which in turn loads grub, which tries to read machine specific
configs, modules, and grub.cfg in order to prepare the boot menu.
Then, once a grub selection is picked, it might try to load the
kernel+initrd.

In this sequence the network stack is shutdown and restarted
multiple times. Grub though, starts up, notices its been network
booted, reads saved network parameters and immediately tries to
transmit data assuming the link is still up.

When that happens grub will print "couldn’t send network packet"
and if that lasts long enough it fails to load grub.cfg and the
user gets dropped to the grub prompt because no one in the path
bothers to assure the link state has transitioned back up.

For reference: pftf/RPi4#113

This patch fixes that by polling the link state via
GenericPhyUpdateConfig() for ten seconds in the transmit path
when the link is down. If the link recovers within this timeout
the state machine is transitioned and we continue data
transmission. If the 10 seconds expires without the link
resuming we will fail as before. While full link negotiation
can be fast, it frequently can take a second or two, or longer
depending on the remote peer on the other end of the
Ethernet cable. It seems auto MDX can slow this down,
and certain vendors products seem to be slower than the
norm. Ten seconds may not cover some of these possibilities,
but the user should validate cabling and the switch/peer's
port configuration if resuming the link is taking > 10
seconds. Picking a longer timeout is a tradeoff between the
machine appearing to hang for extended periods of time
(due to grub retries) if the link is actually down vs
generally providing enough time for most endpoints to
complete the negotiation.

Signed-off-by: Jeremy Linton <jeremy.linton@arm.com>
Reviewed-by: Jared McNeill <jmcneill@invisible.ca>
Reviewed-by: Andrei Warkentin <awarkentin@vmware.com>
ardbiesheuvel pushed a commit to tianocore/edk2-platforms that referenced this issue Aug 16, 2021
Under normal circumstances GenetSimpleNetworkTransmit won't be
called unless the rest of the network stack detects the link is
up. So, during normal operation when the adapter is initialized
the link naturally transitions to link up, and then is ready for
activity later in the boot sequence. If that hasn't happened by
the time PXE runs then it will itself wait for the link.

OTOH, the normal distro PXE sequence involves PXE loading shim
which in turn loads grub, which tries to read machine specific
configs, modules, and grub.cfg in order to prepare the boot menu.
Then, once a grub selection is picked, it might try to load the
kernel+initrd.

In this sequence the network stack is shutdown and restarted
multiple times. Grub though, starts up, notices its been network
booted, reads saved network parameters and immediately tries to
transmit data assuming the link is still up.

When that happens grub will print "couldn't send network packet"
and if that lasts long enough it fails to load grub.cfg and the
user gets dropped to the grub prompt because no one in the path
bothers to assure the link state has transitioned back up.

For reference: pftf/RPi4#113

This patch fixes that by polling the link state via
GenericPhyUpdateConfig() for ten seconds in the transmit path
when the link is down. If the link recovers within this timeout
the state machine is transitioned and we continue data
transmission. If the 10 seconds expires without the link
resuming we will fail as before. While full link negotiation
can be fast, it frequently can take a second or two, or longer
depending on the remote peer on the other end of the
Ethernet cable. It seems auto MDX can slow this down,
and certain vendors products seem to be slower than the
norm. Ten seconds may not cover some of these possibilities,
but the user should validate cabling and the switch/peer's
port configuration if resuming the link is taking > 10
seconds. Picking a longer timeout is a tradeoff between the
machine appearing to hang for extended periods of time
(due to grub retries) if the link is actually down vs
generally providing enough time for most endpoints to
complete the negotiation.

Signed-off-by: Jeremy Linton <jeremy.linton@arm.com>
Reviewed-by: Jared McNeill <jmcneill@invisible.ca>
Reviewed-by: Andrei Warkentin <awarkentin@vmware.com>
@pbatard
Copy link
Member

pbatard commented Sep 1, 2021

This should be fixed with the 1.30 release.

@pbatard pbatard closed this as completed Sep 1, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
genet Pi4 NIC driver work items
Projects
None yet
Development

No branches or pull requests

5 participants