[OpenWrt-Devel] [RFC] lantiq: SMP interrupts and ethernet driver backport from vanilla v5

Petr Cvek petrcvekcz at gmail.com
Sat Feb 2 05:28:49 EST 2019


Hi Yaroslav,


Dne 01. 02. 19 v 16:31 Yaroslav Petrov napsal(a):
> Hi Petr,
>  
> I test it on vgv7510kw22 (100Mbit ports):
>  
> 1. The backport of the vanilla eth driver path doesnt work (interfaces
> are DOWN) -> no Network

As there is phy interface in openwrt version it is possible it is not compatible with your device I think I've had to change the iteration over phy OF nodes to fit these two variants together. Anyway that isn't a critical section, the critical section is xmit, poll and rx parts, which are really ineffective in openwrt version. 

> 2. ICU patch give is quite interesting result: If rx/tx are balanced
> betwin CPUs, I have c.a. 88Mbit/sec (max 65% sirq load) and c.a. 92Mbit
> without balancing (max 50% sirq load):

I have a theory about that I was fiddling with /proc/irq/72/smp_affinity and /proc/irq/73/smp_affinity and it seems there is a correlation with SMP affinity of IRQ and communicating process.

With this test:

host (server, receiving):
	nc -l -p 4321 | pv > /dev/null
lantiq (client, sending):
	cat /dev/zero | nc 10.0.0.1 4321

I can get in pv up to 9.3 MiByte/s when both irqs are on the same VPE. When IRQs are on different VPEs I can get about 8.3 MiByte/s. When I quickly switch both IRQ on the other VPE I get about 7.4 MiByte/s for few seconds until it reaches again 9.3 MiByte/s. That probably scheduler moving netcat process on the same VPE as interrupts are.

So it seems there is some overhead between two VPEs on sending data and receiving TCP handshakes. So the ethernet seems to be more effective to have both IRQ on the same VPE. But other peripherals could use this for the similar speedup. If all ethernet IRQs would be on one VPE and all wifi IRQs (+ wpa_supplicant) on the other, the wpa_supplicant should be faster in a similar fashion as netcat is faster there.

my iperf3 tests (iperf3 -s on lantiq): all 5 patches, vrx200_rx on CPU0, vrx200_tx on CPU1, no irqbalance, TD-W9980B (2x 1G native phy, 2x 1G external phy; lan 20m 5e UTP connected in port "lan 2" - on external phy), rootfs over NFS, no wifi, no DSL, no USB, fixed kernel warnings about full TX fifo in backported ethernet driver

iperf3 -c 10.0.0.80
[ ID] Interval           Transfer     Bandwidth       Retr
[  4]   0.00-10.00  sec   354 MBytes   297 Mbits/sec  456             sender
[  4]   0.00-10.00  sec   353 MBytes   296 Mbits/sec                  receiver

iperf3 -c 10.0.0.80 -R
[ ID] Interval           Transfer     Bandwidth       Retr
[  4]   0.00-10.00  sec   115 MBytes  96.5 Mbits/sec    0             sender
[  4]   0.00-10.00  sec   115 MBytes  96.5 Mbits/sec                  receiver

iperf3 -c 10.0.0.80 -u -b 150M
[ ID] Interval           Transfer     Bandwidth       Jitter    Lost/Total Datagrams
[  4]   0.00-10.00  sec   177 MBytes   149 Mbits/sec  715.205 ms  789/929 (85%)  
[  4] Sent 929 datagrams
... in lantiq console:
[  5]   0.00-10.00  sec  1.09 MBytes   917 Kbits/sec  715.205 ms  789/929 (85%)  receiver

iperf3 -c 10.0.0.80 -u -b 150M -R
[ ID] Interval           Transfer     Bandwidth       Jitter    Lost/Total Datagrams
[  4]   0.00-10.00  sec   179 MBytes   150 Mbits/sec  0.157 ms  0/22887 (0%)  
[  4] Sent 22887 datagrams
... in lantiq console
[  5]   0.00-10.00  sec   179 MBytes   150 Mbits/sec  0.000 ms  0/22887 (0%)  sender

iperf3 -c 10.0.0.80 -u -b 1000M
[ ID] Interval           Transfer     Bandwidth       Jitter    Lost/Total Datagrams
[  4]   0.00-10.00  sec  1.01 GBytes   865 Mbits/sec  0.284 ms  131600/132024 (1e+02%)  
[  4] Sent 132024 datagrams
... in lantiq console
[  5]   0.00-10.00  sec  3.31 MBytes  2.78 Mbits/sec  0.284 ms  131600/132024 (1e+02%)  receiver

iperf3 -c 10.0.0.80 -u -b 1000M -R
[ ID] Interval           Transfer     Bandwidth       Jitter    Lost/Total Datagrams
[  4]   0.00-10.00  sec   248 MBytes   208 Mbits/sec  0.022 ms  0/31752 (0%)  
[  4] Sent 31752 datagrams
... in lantiq console
[  5]   0.00-10.00  sec   248 MBytes   208 Mbits/sec  0.000 ms  0/31752 (0%)  sender

===========================

BTW before backporting the whole vanilla ethernet driver I've come to some optimizations in openwrt version

static int xrx200_poll_rx(struct napi_struct *napi, int budget)
...
        if (complete || !rx) {
                napi_complete(&ch->napi);
                ltq_dma_enable_irq(&ch->dma);
        }

if changed to:

        if (complete || !rx) {
                if (napi_complete_done(&ch->napi, rx)) {
	                ltq_dma_enable_irq(&ch->dma);
                }
        }

will reduce irq load (irq will generate only after work completion)

Another place at:

static void xrx200_tx_housekeeping(unsigned long ptr)
...
        for (i = 0; i < XRX200_MAX_DEV && ch->devs[i]; i++)
                netif_wake_queue(ch->devs[i]);

any tasklet will try to wake both queues (the second one wasn't even used in my device!). Reducing the driver to only a single TX queue increased the TX bitrate.

(IMO there is a timeout problem with the vanilla driver, where xrx200_start_xmit() can stop the queue if all descriptors are filled, but there is no queue waking up in xrx200_tx_housekeeping().


best regards
Petr

_______________________________________________
openwrt-devel mailing list
openwrt-devel at lists.openwrt.org
https://lists.openwrt.org/mailman/listinfo/openwrt-devel


More information about the openwrt-devel mailing list