Testing network / NAT performance
Rafał Miłecki
zajec5 at gmail.com
Sun Jun 12 12:58:42 PDT 2022
During last years NAT performance on Northstar (bcm53xx) changed
multiple times. Noone keeps a close eye on this and Northstar testing
results also seem very unstable. During last 2 months I probably tested
over a hundred of OpenWrt commits going back to 2015.
I decided to do testing with -falign-functions=32 and at some point I
disabled CONFIG_SMP. I also did some tests without rtcache patch which
was dropped later anyway. Below I'm sharing my notes.
1. afafbc0d7454 ("kernel: bgmac: add more DMA related fixes")
This commit introduced varying speeds across testing sessions. It seems
that could be caused by the removal of dma_sync_single_for_cpu() which
could make rps_cpus actually work as expected.
2. 39f115707531 ("bcm53xx: switch to kernel 4.4")
Kernel 4.2 introduced commit 66e5133f19e9 ("vlan: Add GRO support for
non hardware accelerated vlan") which lowered Northstar / bgmac
performance as it introduced csum_partial() calls in new code paths [1].
Regression can be workarounded with:
ethtool -K eth0 gro off
(note: DSA requires disabling GRO also for switch ports)
3. 916e33fa1e14 ("netifd: update to the latest version, rewrite RPS/XPS handling")
This affected setting rps_cpus and xps_cpus default values. It affected
networking depending on amount of device CPUs and setup.
4. 50c6938b95a0 ("bcm53xx: add v5.4 support")
This commit actually switched bcm53xx from kernel 4.14 to 4.19 which
somehow dropped network speed by 5%. It could be actual net subsystem
change or just something unrelated. Too small difference to make whole
debugging worth it.
5. 7125323b81d7 ("bcm53xx: switch to kernel 5.4")
Improved network speed by 25% (256 Mb/s → 320 Mb/s).
I didn't have time to bisect this *improvement* to a single kernel
commit. I tried profiling but it isn't obvious to me what caused that
improvement.
Kernel 4.19:
11.94% ksoftirqd/0 [kernel.kallsyms] [k] v7_dma_inv_range
7.06% ksoftirqd/0 [kernel.kallsyms] [k] l2c210_inv_range
3.37% ksoftirqd/0 [kernel.kallsyms] [k] v7_dma_clean_range
2.80% ksoftirqd/0 [kernel.kallsyms] [k] l2c210_clean_range
2.67% ksoftirqd/0 [kernel.kallsyms] [k] bgmac_poll
2.63% ksoftirqd/0 [kernel.kallsyms] [k] __dev_queue_xmit
2.43% ksoftirqd/0 [kernel.kallsyms] [k] __netif_receive_skb_core
2.13% ksoftirqd/0 [kernel.kallsyms] [k] bgmac_start_xmit
1.82% ksoftirqd/0 [kernel.kallsyms] [k] nf_hook_slow
1.54% ksoftirqd/0 [kernel.kallsyms] [k] ip_forward
1.50% ksoftirqd/0 [kernel.kallsyms] [k] dma_cache_maint_page
Kernel 5.4:
14.53% ksoftirqd/0 [kernel.kallsyms] [k] v7_dma_inv_range
8.02% ksoftirqd/0 [kernel.kallsyms] [k] l2c210_inv_range
3.32% ksoftirqd/0 [kernel.kallsyms] [k] bgmac_poll
3.28% ksoftirqd/0 [kernel.kallsyms] [k] v7_dma_clean_range
3.12% ksoftirqd/0 [kernel.kallsyms] [k] __netif_receive_skb_core
2.70% ksoftirqd/0 [kernel.kallsyms] [k] l2c210_clean_range
2.46% ksoftirqd/0 [kernel.kallsyms] [k] __dev_queue_xmit
2.26% ksoftirqd/0 [kernel.kallsyms] [k] bgmac_start_xmit
1.73% ksoftirqd/0 [kernel.kallsyms] [k] __dma_page_dev_to_cpu
1.72% ksoftirqd/0 [kernel.kallsyms] [k] nf_hook_slow
6. ba72ed537c4a ("kernel: backport GRO improvements")
Improved network speed by 10%.
7. 17576b1b2aea ("kernel: drop the conntrack rtcache patch")
Dropped network speed by 15%.
8. f55f1dbaad33 ("bcm53xx: switch to the kernel 5.10")
Kernel bump that introduced upstream commit 8c7da63978f1 ("bgmac:
configure MTU and add support for frames beyond 8192 byte size") which
dropped speed by 49%.
9. e9672b1a8fa4 ("bcm53xx: switch to the upstream DSA-based b53 driver")
At first it seemed like a decrease of network performance by 5%.
Profiling has revealed it was caused by an added csum_partial() call.
Further debugging showed it was tcp4_gro_receive() that started calling
ti.
Long story short: with DSA GRO needs disabling on all switch interfaces.
After some further testing it seems DSA actually bumped network speed
from 404 Mb/s to 445 Mb/s. From profiling it again isn't clear why.
swconfig:
13.46% ksoftirqd/0 [kernel.kallsyms] [k] v7_dma_inv_range
7.39% ksoftirqd/0 [kernel.kallsyms] [k] l2c210_inv_range
3.27% ksoftirqd/0 [kernel.kallsyms] [k] v7_dma_clean_range
2.74% ksoftirqd/0 [kernel.kallsyms] [k] __netif_receive_skb_core.constprop.0
2.72% ksoftirqd/0 [kernel.kallsyms] [k] l2c210_clean_range
2.71% ksoftirqd/0 [kernel.kallsyms] [k] bgmac_poll
2.56% ksoftirqd/0 [kernel.kallsyms] [k] bgmac_start_xmit
2.31% ksoftirqd/0 [kernel.kallsyms] [k] fib_table_lookup
1.91% ksoftirqd/0 [kernel.kallsyms] [k] ip_route_input_slow
1.86% ksoftirqd/0 [kernel.kallsyms] [k] __dev_queue_xmit
DSA:
11.88% ksoftirqd/0 [kernel.kallsyms] [k] v7_dma_inv_range
6.59% ksoftirqd/0 [kernel.kallsyms] [k] l2c210_inv_range
3.91% ksoftirqd/0 [kernel.kallsyms] [k] __netif_receive_skb_core.constprop.0
3.68% ksoftirqd/0 [kernel.kallsyms] [k] v7_dma_clean_range
3.25% ksoftirqd/0 [kernel.kallsyms] [k] l2c210_clean_range
2.88% ksoftirqd/0 [kernel.kallsyms] [k] fib_table_lookup
2.61% ksoftirqd/0 [kernel.kallsyms] [k] bgmac_start_xmit
2.20% ksoftirqd/0 [kernel.kallsyms] [k] bgmac_poll
1.74% ksoftirqd/0 [kernel.kallsyms] [k] fib_rules_lookup
1.72% ksoftirqd/0 [kernel.kallsyms] [k] __dev_queue_xmit
[1] https://www.mail-archive.com/netdev@vger.kernel.org/msg293153.html
More information about the openwrt-devel
mailing list