[OpenWrt-Devel] [RFC] lantiq: SMP interrupts and ethernet driver backport from vanilla v5
Petr Cvek
petrcvekcz at gmail.com
Sat Feb 2 05:28:49 EST 2019
Hi Yaroslav,
Dne 01. 02. 19 v 16:31 Yaroslav Petrov napsal(a):
> Hi Petr,
>
> I test it on vgv7510kw22 (100Mbit ports):
>
> 1. The backport of the vanilla eth driver path doesnt work (interfaces
> are DOWN) -> no Network
As there is phy interface in openwrt version it is possible it is not compatible with your device I think I've had to change the iteration over phy OF nodes to fit these two variants together. Anyway that isn't a critical section, the critical section is xmit, poll and rx parts, which are really ineffective in openwrt version.
> 2. ICU patch give is quite interesting result: If rx/tx are balanced
> betwin CPUs, I have c.a. 88Mbit/sec (max 65% sirq load) and c.a. 92Mbit
> without balancing (max 50% sirq load):
I have a theory about that I was fiddling with /proc/irq/72/smp_affinity and /proc/irq/73/smp_affinity and it seems there is a correlation with SMP affinity of IRQ and communicating process.
With this test:
host (server, receiving):
nc -l -p 4321 | pv > /dev/null
lantiq (client, sending):
cat /dev/zero | nc 10.0.0.1 4321
I can get in pv up to 9.3 MiByte/s when both irqs are on the same VPE. When IRQs are on different VPEs I can get about 8.3 MiByte/s. When I quickly switch both IRQ on the other VPE I get about 7.4 MiByte/s for few seconds until it reaches again 9.3 MiByte/s. That probably scheduler moving netcat process on the same VPE as interrupts are.
So it seems there is some overhead between two VPEs on sending data and receiving TCP handshakes. So the ethernet seems to be more effective to have both IRQ on the same VPE. But other peripherals could use this for the similar speedup. If all ethernet IRQs would be on one VPE and all wifi IRQs (+ wpa_supplicant) on the other, the wpa_supplicant should be faster in a similar fashion as netcat is faster there.
my iperf3 tests (iperf3 -s on lantiq): all 5 patches, vrx200_rx on CPU0, vrx200_tx on CPU1, no irqbalance, TD-W9980B (2x 1G native phy, 2x 1G external phy; lan 20m 5e UTP connected in port "lan 2" - on external phy), rootfs over NFS, no wifi, no DSL, no USB, fixed kernel warnings about full TX fifo in backported ethernet driver
iperf3 -c 10.0.0.80
[ ID] Interval Transfer Bandwidth Retr
[ 4] 0.00-10.00 sec 354 MBytes 297 Mbits/sec 456 sender
[ 4] 0.00-10.00 sec 353 MBytes 296 Mbits/sec receiver
iperf3 -c 10.0.0.80 -R
[ ID] Interval Transfer Bandwidth Retr
[ 4] 0.00-10.00 sec 115 MBytes 96.5 Mbits/sec 0 sender
[ 4] 0.00-10.00 sec 115 MBytes 96.5 Mbits/sec receiver
iperf3 -c 10.0.0.80 -u -b 150M
[ ID] Interval Transfer Bandwidth Jitter Lost/Total Datagrams
[ 4] 0.00-10.00 sec 177 MBytes 149 Mbits/sec 715.205 ms 789/929 (85%)
[ 4] Sent 929 datagrams
... in lantiq console:
[ 5] 0.00-10.00 sec 1.09 MBytes 917 Kbits/sec 715.205 ms 789/929 (85%) receiver
iperf3 -c 10.0.0.80 -u -b 150M -R
[ ID] Interval Transfer Bandwidth Jitter Lost/Total Datagrams
[ 4] 0.00-10.00 sec 179 MBytes 150 Mbits/sec 0.157 ms 0/22887 (0%)
[ 4] Sent 22887 datagrams
... in lantiq console
[ 5] 0.00-10.00 sec 179 MBytes 150 Mbits/sec 0.000 ms 0/22887 (0%) sender
iperf3 -c 10.0.0.80 -u -b 1000M
[ ID] Interval Transfer Bandwidth Jitter Lost/Total Datagrams
[ 4] 0.00-10.00 sec 1.01 GBytes 865 Mbits/sec 0.284 ms 131600/132024 (1e+02%)
[ 4] Sent 132024 datagrams
... in lantiq console
[ 5] 0.00-10.00 sec 3.31 MBytes 2.78 Mbits/sec 0.284 ms 131600/132024 (1e+02%) receiver
iperf3 -c 10.0.0.80 -u -b 1000M -R
[ ID] Interval Transfer Bandwidth Jitter Lost/Total Datagrams
[ 4] 0.00-10.00 sec 248 MBytes 208 Mbits/sec 0.022 ms 0/31752 (0%)
[ 4] Sent 31752 datagrams
... in lantiq console
[ 5] 0.00-10.00 sec 248 MBytes 208 Mbits/sec 0.000 ms 0/31752 (0%) sender
===========================
BTW before backporting the whole vanilla ethernet driver I've come to some optimizations in openwrt version
static int xrx200_poll_rx(struct napi_struct *napi, int budget)
...
if (complete || !rx) {
napi_complete(&ch->napi);
ltq_dma_enable_irq(&ch->dma);
}
if changed to:
if (complete || !rx) {
if (napi_complete_done(&ch->napi, rx)) {
ltq_dma_enable_irq(&ch->dma);
}
}
will reduce irq load (irq will generate only after work completion)
Another place at:
static void xrx200_tx_housekeeping(unsigned long ptr)
...
for (i = 0; i < XRX200_MAX_DEV && ch->devs[i]; i++)
netif_wake_queue(ch->devs[i]);
any tasklet will try to wake both queues (the second one wasn't even used in my device!). Reducing the driver to only a single TX queue increased the TX bitrate.
(IMO there is a timeout problem with the vanilla driver, where xrx200_start_xmit() can stop the queue if all descriptors are filled, but there is no queue waking up in xrx200_tx_housekeeping().
best regards
Petr
_______________________________________________
openwrt-devel mailing list
openwrt-devel at lists.openwrt.org
https://lists.openwrt.org/mailman/listinfo/openwrt-devel
More information about the openwrt-devel
mailing list