Testing network / NAT performance

Rafał Miłecki zajec5 at gmail.com
Sun Jun 12 12:58:42 PDT 2022


During last years NAT performance on Northstar (bcm53xx) changed
multiple times. Noone keeps a close eye on this and Northstar testing
results also seem very unstable. During last 2 months I probably tested
over a hundred of OpenWrt commits going back to 2015.

I decided to do testing with -falign-functions=32 and at some point I
disabled CONFIG_SMP. I also did some tests without rtcache patch which
was dropped later anyway. Below I'm sharing my notes.


1. afafbc0d7454 ("kernel: bgmac: add more DMA related fixes")

This commit introduced varying speeds across testing sessions. It seems
that could be caused by the removal of dma_sync_single_for_cpu() which
could make rps_cpus actually work as expected.


2. 39f115707531 ("bcm53xx: switch to kernel 4.4")

Kernel 4.2 introduced commit 66e5133f19e9 ("vlan: Add GRO support for
non hardware accelerated vlan") which lowered Northstar / bgmac
performance as it introduced csum_partial() calls in new code paths [1].

Regression can be workarounded with:

ethtool -K eth0 gro off

(note: DSA requires disabling GRO also for switch ports)


3. 916e33fa1e14 ("netifd: update to the latest version, rewrite RPS/XPS handling")

This affected setting rps_cpus and xps_cpus default values. It affected
networking depending on amount of device CPUs and setup.


4. 50c6938b95a0 ("bcm53xx: add v5.4 support")

This commit actually switched bcm53xx from kernel 4.14 to 4.19 which
somehow dropped network speed by 5%. It could be actual net subsystem
change or just something unrelated. Too small difference to make whole
debugging worth it.


5. 7125323b81d7 ("bcm53xx: switch to kernel 5.4")

Improved network speed by 25% (256 Mb/s → 320 Mb/s).

I didn't have time to bisect this *improvement* to a single kernel
commit. I tried profiling but it isn't obvious to me what caused that
improvement.

Kernel 4.19:
     11.94%  ksoftirqd/0      [kernel.kallsyms]       [k] v7_dma_inv_range
      7.06%  ksoftirqd/0      [kernel.kallsyms]       [k] l2c210_inv_range
      3.37%  ksoftirqd/0      [kernel.kallsyms]       [k] v7_dma_clean_range
      2.80%  ksoftirqd/0      [kernel.kallsyms]       [k] l2c210_clean_range
      2.67%  ksoftirqd/0      [kernel.kallsyms]       [k] bgmac_poll
      2.63%  ksoftirqd/0      [kernel.kallsyms]       [k] __dev_queue_xmit
      2.43%  ksoftirqd/0      [kernel.kallsyms]       [k] __netif_receive_skb_core
      2.13%  ksoftirqd/0      [kernel.kallsyms]       [k] bgmac_start_xmit
      1.82%  ksoftirqd/0      [kernel.kallsyms]       [k] nf_hook_slow
      1.54%  ksoftirqd/0      [kernel.kallsyms]       [k] ip_forward
      1.50%  ksoftirqd/0      [kernel.kallsyms]       [k] dma_cache_maint_page

Kernel 5.4:
     14.53%  ksoftirqd/0      [kernel.kallsyms]       [k] v7_dma_inv_range
      8.02%  ksoftirqd/0      [kernel.kallsyms]       [k] l2c210_inv_range
      3.32%  ksoftirqd/0      [kernel.kallsyms]       [k] bgmac_poll
      3.28%  ksoftirqd/0      [kernel.kallsyms]       [k] v7_dma_clean_range
      3.12%  ksoftirqd/0      [kernel.kallsyms]       [k] __netif_receive_skb_core
      2.70%  ksoftirqd/0      [kernel.kallsyms]       [k] l2c210_clean_range
      2.46%  ksoftirqd/0      [kernel.kallsyms]       [k] __dev_queue_xmit
      2.26%  ksoftirqd/0      [kernel.kallsyms]       [k] bgmac_start_xmit
      1.73%  ksoftirqd/0      [kernel.kallsyms]       [k] __dma_page_dev_to_cpu
      1.72%  ksoftirqd/0      [kernel.kallsyms]       [k] nf_hook_slow


6. ba72ed537c4a ("kernel: backport GRO improvements")

Improved network speed by 10%.


7. 17576b1b2aea ("kernel: drop the conntrack rtcache patch")

Dropped network speed by 15%.


8. f55f1dbaad33 ("bcm53xx: switch to the kernel 5.10")

Kernel bump that introduced upstream commit 8c7da63978f1 ("bgmac:
configure MTU and add support for frames beyond 8192 byte size") which
dropped speed by 49%.


9. e9672b1a8fa4 ("bcm53xx: switch to the upstream DSA-based b53 driver")

At first it seemed like a decrease of network performance by 5%.
Profiling has revealed it was caused by an added csum_partial() call.
Further debugging showed it was tcp4_gro_receive() that started calling
ti.

Long story short: with DSA GRO needs disabling on all switch interfaces.

After some further testing it seems DSA actually bumped network speed
from 404 Mb/s to 445 Mb/s. From profiling it again isn't clear why.

swconfig:
     13.46%  ksoftirqd/0      [kernel.kallsyms]    [k] v7_dma_inv_range
      7.39%  ksoftirqd/0      [kernel.kallsyms]    [k] l2c210_inv_range
      3.27%  ksoftirqd/0      [kernel.kallsyms]    [k] v7_dma_clean_range
      2.74%  ksoftirqd/0      [kernel.kallsyms]    [k] __netif_receive_skb_core.constprop.0
      2.72%  ksoftirqd/0      [kernel.kallsyms]    [k] l2c210_clean_range
      2.71%  ksoftirqd/0      [kernel.kallsyms]    [k] bgmac_poll
      2.56%  ksoftirqd/0      [kernel.kallsyms]    [k] bgmac_start_xmit
      2.31%  ksoftirqd/0      [kernel.kallsyms]    [k] fib_table_lookup
      1.91%  ksoftirqd/0      [kernel.kallsyms]    [k] ip_route_input_slow
      1.86%  ksoftirqd/0      [kernel.kallsyms]    [k] __dev_queue_xmit

DSA:
     11.88%  ksoftirqd/0      [kernel.kallsyms]    [k] v7_dma_inv_range
      6.59%  ksoftirqd/0      [kernel.kallsyms]    [k] l2c210_inv_range
      3.91%  ksoftirqd/0      [kernel.kallsyms]    [k] __netif_receive_skb_core.constprop.0
      3.68%  ksoftirqd/0      [kernel.kallsyms]    [k] v7_dma_clean_range
      3.25%  ksoftirqd/0      [kernel.kallsyms]    [k] l2c210_clean_range
      2.88%  ksoftirqd/0      [kernel.kallsyms]    [k] fib_table_lookup
      2.61%  ksoftirqd/0      [kernel.kallsyms]    [k] bgmac_start_xmit
      2.20%  ksoftirqd/0      [kernel.kallsyms]    [k] bgmac_poll
      1.74%  ksoftirqd/0      [kernel.kallsyms]    [k] fib_rules_lookup
      1.72%  ksoftirqd/0      [kernel.kallsyms]    [k] __dev_queue_xmit



[1] https://www.mail-archive.com/netdev@vger.kernel.org/msg293153.html



More information about the openwrt-devel mailing list