[OpenWrt-Devel] Problems with link down/up on boards with FSL P1010/P1020 (Gianfar driver) at 100 Mbit/s

Jonas Eymann J.Eymann at gmx.net
Tue Feb 6 13:46:09 EST 2018


Hi all,

I am running an OpenWrt/LEDE based system on two custom boards, one with Freescale P1010 and Realtek 8211 phy and another one with P1020 and Marvell 88E111 phy. With an older base using kernel 3.10.49 everything works fine, but with LEDE 17.01.3 (kernel 4.4.89) I experience some network issues: after setting the network interface down and setting it up again, the network interface sometimes does not work anymore. Drop counters etc. look fine, but no packets are reaching the peer. However, this only occurs when the connection runs at 100 Mbit/s - 10 Mbit/s and 1 Gbit/s work fine.

As both boards have different SoCs and phys, I suspect it might be a problem with the Gianfar driver. I have seen NETDEV WATCHDOG warnings a few times (see end of mail) and also similar issues were mentioned in some older Linux kernel commits to the Gianfar driver. I tried hard to debug the problem, though so far without any success.

Maybe someone else in possession of a board with similar SoC could try to reproduce this to see whether it might be a more general issue with the Gianfar driver or some other parts (e.g. phy subsystem).

My test setup is pretty simple. The board under test is directly connected to another system with a 1 Gbit/s capable interface:

-------                                      -----------
|     |eth1                            eth1  |         |
| DUT |--------------------------------------| Peer    |
|     |11.1.1.2                     11.1.1.1 |         |
-------                                      -----------

On the peer, I configure the interface with a static IP address and different transmission speeds / duplex via ethtool.

On the DUT:

# stop OpenWrt network configuration so it does not interfere
/etc/init.d/network stop
# make sure no bridges are configured (delete any if there are...)
brctl show
# configure eth1 interface with static IP address
ifconfig eth1 11.1.1.2

# Check connectivity to peer
ping 11.1.1.1
PING 11.1.1.1 (11.1.1.1) 56(84) bytes of data.
64 bytes from 11.1.1.1: icmp_req=1 ttl=64 time=0.936 ms
...

# Set interface down and wait few seconds
ip link set dev eth1 down   # or 'ifconfig eth1 down'

# Set interface up and wait a few seconds (until the line similar to
# fsl-gianfar soc at ffe00000:ethernet at b1000 eth1: Link is Up – 100Mbps/Full - flow control rx/tx
# appears)
ip link set dev eth1 up   # or 'ifconfig eth1 up'

ping 11.1.1.1
2 packets transmitted, 0 received, 100% packet loss, time 999ms


10 Mbit/s half/full duplex and 1 Gbit/s works, but for 100 Mbit/s (either half or full duplex) the ping does not work in about 50% of the cases. On the P1010/RTL8211 system, unplugging the cable and plugging it in again or changing the speed fixes the issue (until the next down/up cycle at 100 Mbit/s). On the P1020 board it sometimes seems more 'stuck' so that only a reboot can resolve it. We also connected a 100 Mbit/s switch where I had the same issue, so the peer is not the problem (actually we discovered the problem using the switch).


NETDEV warning occurs sometimes after a while:

# ping 11.1.1.1
PING 11.1.1.1 (11.1.1.1) 56(84) bytes of data.
[ 1814.129669] NETDEV WATCHDOG: eth1 (fsl-gianfar): transmit queue 0 timed out
[ 1814.136772] ------------[ cut here ]------------
[ 1814.141397] WARNING: at net/sched/sch_generic.c:306
[ 1814.146278] Modules linked in: ath9k ath9k_common pppoe ppp_async option iptable_nat ebtable_nat ebtable_filter ebtable_broute ath9k_hw ath usb_wwan pppox ppp_generic pl2303 nf_nat_ipv4 c
[ 1814.264249] CPU: 0 PID: 0 Comm: swapper Not tainted 4.4.89 #0
[ 1814.270005] task: c0415300 ti: c043c000 task.ti: c043c000
[ 1814.275411] NIP: c0281380 LR: c0281380 CTR: c01ab92c
[ 1814.280382] REGS: c043dd00 TRAP: 0700   Not tainted  (4.4.89)
[ 1814.286133] MSR: 00029000 <CE,EE,ME>  CR: 28002242  XER: 00000000
[ 1814.292254] 
GPR00: c0281380 c043ddb0 c0415300 0000003f c041aedc c041c1cc c04504f0 000001a2 
GPR08: 00000007 00000000 00000000 00000004 24002844 00000000 c043c000 00000000 
GPR16: c0420000 0005c689 c0311f1c 00000088 c041b7bc c041b6bc c041b5bc c0420000 
GPR24: c041b4bc c043ddf8 c041b8c4 c041b0a0 c0430000 c0420000 00000000 c7288000 
[ 1814.322089] NIP [c0281380] dev_watchdog+0x148/0x230
[ 1814.326976] LR [c0281380] dev_watchdog+0x148/0x230
[ 1814.331770] Call Trace:
[ 1814.334221] [c043ddb0] [c0281380] dev_watchdog+0x148/0x230 (unreliable)
[ 1814.340862] [c043ddd0] [c0055ef4] call_timer_fn.isra.7+0x28/0x80
[ 1814.346883] [c043ddf0] [c00561f0] run_timer_softirq+0x16c/0x1ac
[ 1814.352819] [c043de40] [c002920c] __do_softirq+0xa4/0x1e4
[ 1814.358241] [c043dea0] [c0008a94] timer_interrupt+0x34/0x4c
[ 1814.363831] [c043deb0] [c000d49c] ret_from_except+0x0/0x18
[ 1814.369334] --- interrupt: 901 at arch_cpu_idle+0x24/0x60
[ 1814.369334]     LR = arch_cpu_idle+0x24/0x60
[ 1814.379013] [c043df70] [c0309f94] schedule_preempt_disabled+0x10/0x20 (unreliable)
[ 1814.386604] [c043df80] [c00489fc] cpu_startup_entry+0xa4/0x10c
[ 1814.392458] [c043dfb0] [c03f09a8] start_kernel+0x348/0x35c
[ 1814.397956] [c043dff0] [c0000394] set_ivor+0x120/0x15c
[ 1814.403099] Instruction dump:
[ 1814.406069] 3bde0001 38e70078 4200ffac 48000048 7fe3fb78 4bfe7185 7fc6f378 7c651b78 
[ 1814.413851] 3c60c03c 7fe4fb78 386317dc 4808b665 <0fe00000> 39200001 993c60c5 813f012c 
[ 1814.421809] ---[ end trace 1803236190cd5b79 ]---
[ 1814.766510] fsl-gianfar soc at ffe00000:ethernet at b1000 eth1: Link is Up - 100Mbps/Half - flow control off
^C
--- 11.1.1.1 ping statistics ---
14 packets transmitted, 0 received, 100% packet loss, time 12999ms


Any help or ideas how to further debug this are appreciated!

Best regards
Jonas
_______________________________________________
openwrt-devel mailing list
openwrt-devel at lists.openwrt.org
https://lists.openwrt.org/cgi-bin/mailman/listinfo/openwrt-devel


More information about the openwrt-devel mailing list