Testing network / NAT performance

Hauke Mehrtens hauke at hauke-m.de
Fri Jun 17 04:47:16 PDT 2022


Hi Rafal,

Thank you for your detailed analyses and also for the detailed report. 
This is very helpful when I ran into this problem.

Can we somehow automate it so that we get notified a day after a bad 
change was committed about performance regression and not one year after?

On 6/14/22 15:16, Rafał Miłecki wrote:
> On 12.06.2022 21:58, Rafał Miłecki wrote:
>> 5. 7125323b81d7 ("bcm53xx: switch to kernel 5.4")
>>
>> Improved network speed by 25% (256 Mb/s → 320 Mb/s).
>>
>> I didn't have time to bisect this *improvement* to a single kernel
>> commit. I tried profiling but it isn't obvious to me what caused that
>> improvement.
>>
>> Kernel 4.19:
>>      11.94%  ksoftirqd/0      [kernel.kallsyms]       [k] 
>> v7_dma_inv_range
>>       7.06%  ksoftirqd/0      [kernel.kallsyms]       [k] 
>> l2c210_inv_range
>>       3.37%  ksoftirqd/0      [kernel.kallsyms]       [k] 
>> v7_dma_clean_range
>>       2.80%  ksoftirqd/0      [kernel.kallsyms]       [k] 
>> l2c210_clean_range
>>       2.67%  ksoftirqd/0      [kernel.kallsyms]       [k] bgmac_poll
>>       2.63%  ksoftirqd/0      [kernel.kallsyms]       [k] 
>> __dev_queue_xmit
>>       2.43%  ksoftirqd/0      [kernel.kallsyms]       [k] 
>> __netif_receive_skb_core
>>       2.13%  ksoftirqd/0      [kernel.kallsyms]       [k] 
>> bgmac_start_xmit
>>       1.82%  ksoftirqd/0      [kernel.kallsyms]       [k] nf_hook_slow
>>       1.54%  ksoftirqd/0      [kernel.kallsyms]       [k] ip_forward
>>       1.50%  ksoftirqd/0      [kernel.kallsyms]       [k] 
>> dma_cache_maint_page
>>
>> Kernel 5.4:
>>      14.53%  ksoftirqd/0      [kernel.kallsyms]       [k] 
>> v7_dma_inv_range
>>       8.02%  ksoftirqd/0      [kernel.kallsyms]       [k] 
>> l2c210_inv_range
>>       3.32%  ksoftirqd/0      [kernel.kallsyms]       [k] bgmac_poll
>>       3.28%  ksoftirqd/0      [kernel.kallsyms]       [k] 
>> v7_dma_clean_range
>>       3.12%  ksoftirqd/0      [kernel.kallsyms]       [k] 
>> __netif_receive_skb_core
>>       2.70%  ksoftirqd/0      [kernel.kallsyms]       [k] 
>> l2c210_clean_range
>>       2.46%  ksoftirqd/0      [kernel.kallsyms]       [k] 
>> __dev_queue_xmit
>>       2.26%  ksoftirqd/0      [kernel.kallsyms]       [k] 
>> bgmac_start_xmit
>>       1.73%  ksoftirqd/0      [kernel.kallsyms]       [k] 
>> __dma_page_dev_to_cpu
>>       1.72%  ksoftirqd/0      [kernel.kallsyms]       [k] nf_hook_slow
> 
> Riddle solved. Change to bless/blame: 4e0c54bc5bc8 ("kernel: add support
> for kernel 5.4").
> 
> First of all bcm53xx uses
> CONFIG_CC_OPTIMIZE_FOR_PERFORMANCE=y
> 
> 
> OpenWrt's kernel Makefile in kernel 4.19:
> 
> ifdef CONFIG_CC_OPTIMIZE_FOR_SIZE
> KBUILD_CFLAGS    += -Os $(EXTRA_OPTIMIZATION)
> else
> KBUILD_CFLAGS   += -O2 -fno-reorder-blocks -fno-tree-ch 
> $(EXTRA_OPTIMIZATION)
> endif
> 
> 
> OpenWrt's kernel Makefile in 5.4:
> 
> ifdef CONFIG_CC_OPTIMIZE_FOR_PERFORMANCE
> KBUILD_CFLAGS += -O2 $(EXTRA_OPTIMIZATION)
> else ifdef CONFIG_CC_OPTIMIZE_FOR_PERFORMANCE_O3
> KBUILD_CFLAGS += -O3 $(EXTRA_OPTIMIZATION)
> else ifdef CONFIG_CC_OPTIMIZE_FOR_SIZE
> KBUILD_CFLAGS += -Os -fno-reorder-blocks -fno-tree-ch $(EXTRA_OPTIMIZATION)
> endif
> 
> 
> As you can see 4e0c54bc5bc8 has accidentally moved -fno-reorder-blocks
> from !CONFIG_CC_OPTIMIZE_FOR_SIZE to CONFIG_CC_OPTIMIZE_FOR_SIZE.

This looks like an accident to me.
All targets except mediatek/mt7629 are setting 
CONFIG_CC_OPTIMIZE_FOR_PERFORMANCE in master. In Openwrt 21.02 the 
ARCHS38 target set CONFIG_CC_OPTIMIZE_FOR_PERFORMANCE_O3, but now it is 
also to normal performance.

We should probably switch mediatek/mt7629 to 
CONFIG_CC_OPTIMIZE_FOR_PERFORMANCE, does anyone have such a device and 
could test a patch?

> I've noticed problem with -fno-reorder-blocks long time ago, see:
> [PATCH RFC] kernel: drop -fno-reorder-blocks
> https://patchwork.ozlabs.org/project/openwrt/patch/20190409093046.13401-1-zajec5@gmail.com/ 
> 
> 
> It should really get sorted out...

I would suggest to remove the -fno-reorder-blocks -fno-tree-ch options 
as they are not used.


The next step could be Profile-guided optimization:
https://lwn.net/Articles/830300/
If the toolchain works properly I expect there big improvements as 
routing, forwarding and NAT is completely in the kernel and we use 
devices with small caches. Profile-guided optimization should be able to 
avoid many cache misses by better packaging the binary.

Hauke



More information about the openwrt-devel mailing list