Testing network / NAT performance

Ansuel Smith ansuelsmth at gmail.com
Fri Jun 17 05:19:47 PDT 2022


Il giorno ven 17 giu 2022 alle ore 13:51 Hauke Mehrtens
<hauke at hauke-m.de> ha scritto:
>
> Hi Rafal,
>
> Thank you for your detailed analyses and also for the detailed report.
> This is very helpful when I ran into this problem.
>
> Can we somehow automate it so that we get notified a day after a bad
> change was committed about performance regression and not one year after?
>
> On 6/14/22 15:16, Rafał Miłecki wrote:
> > On 12.06.2022 21:58, Rafał Miłecki wrote:
> >> 5. 7125323b81d7 ("bcm53xx: switch to kernel 5.4")
> >>
> >> Improved network speed by 25% (256 Mb/s → 320 Mb/s).
> >>
> >> I didn't have time to bisect this *improvement* to a single kernel
> >> commit. I tried profiling but it isn't obvious to me what caused that
> >> improvement.
> >>
> >> Kernel 4.19:
> >>      11.94%  ksoftirqd/0      [kernel.kallsyms]       [k]
> >> v7_dma_inv_range
> >>       7.06%  ksoftirqd/0      [kernel.kallsyms]       [k]
> >> l2c210_inv_range
> >>       3.37%  ksoftirqd/0      [kernel.kallsyms]       [k]
> >> v7_dma_clean_range
> >>       2.80%  ksoftirqd/0      [kernel.kallsyms]       [k]
> >> l2c210_clean_range
> >>       2.67%  ksoftirqd/0      [kernel.kallsyms]       [k] bgmac_poll
> >>       2.63%  ksoftirqd/0      [kernel.kallsyms]       [k]
> >> __dev_queue_xmit
> >>       2.43%  ksoftirqd/0      [kernel.kallsyms]       [k]
> >> __netif_receive_skb_core
> >>       2.13%  ksoftirqd/0      [kernel.kallsyms]       [k]
> >> bgmac_start_xmit
> >>       1.82%  ksoftirqd/0      [kernel.kallsyms]       [k] nf_hook_slow
> >>       1.54%  ksoftirqd/0      [kernel.kallsyms]       [k] ip_forward
> >>       1.50%  ksoftirqd/0      [kernel.kallsyms]       [k]
> >> dma_cache_maint_page
> >>
> >> Kernel 5.4:
> >>      14.53%  ksoftirqd/0      [kernel.kallsyms]       [k]
> >> v7_dma_inv_range
> >>       8.02%  ksoftirqd/0      [kernel.kallsyms]       [k]
> >> l2c210_inv_range
> >>       3.32%  ksoftirqd/0      [kernel.kallsyms]       [k] bgmac_poll
> >>       3.28%  ksoftirqd/0      [kernel.kallsyms]       [k]
> >> v7_dma_clean_range
> >>       3.12%  ksoftirqd/0      [kernel.kallsyms]       [k]
> >> __netif_receive_skb_core
> >>       2.70%  ksoftirqd/0      [kernel.kallsyms]       [k]
> >> l2c210_clean_range
> >>       2.46%  ksoftirqd/0      [kernel.kallsyms]       [k]
> >> __dev_queue_xmit
> >>       2.26%  ksoftirqd/0      [kernel.kallsyms]       [k]
> >> bgmac_start_xmit
> >>       1.73%  ksoftirqd/0      [kernel.kallsyms]       [k]
> >> __dma_page_dev_to_cpu
> >>       1.72%  ksoftirqd/0      [kernel.kallsyms]       [k] nf_hook_slow
> >
> > Riddle solved. Change to bless/blame: 4e0c54bc5bc8 ("kernel: add support
> > for kernel 5.4").
> >
> > First of all bcm53xx uses
> > CONFIG_CC_OPTIMIZE_FOR_PERFORMANCE=y
> >
> >
> > OpenWrt's kernel Makefile in kernel 4.19:
> >
> > ifdef CONFIG_CC_OPTIMIZE_FOR_SIZE
> > KBUILD_CFLAGS    += -Os $(EXTRA_OPTIMIZATION)
> > else
> > KBUILD_CFLAGS   += -O2 -fno-reorder-blocks -fno-tree-ch
> > $(EXTRA_OPTIMIZATION)
> > endif
> >
> >
> > OpenWrt's kernel Makefile in 5.4:
> >
> > ifdef CONFIG_CC_OPTIMIZE_FOR_PERFORMANCE
> > KBUILD_CFLAGS += -O2 $(EXTRA_OPTIMIZATION)
> > else ifdef CONFIG_CC_OPTIMIZE_FOR_PERFORMANCE_O3
> > KBUILD_CFLAGS += -O3 $(EXTRA_OPTIMIZATION)
> > else ifdef CONFIG_CC_OPTIMIZE_FOR_SIZE
> > KBUILD_CFLAGS += -Os -fno-reorder-blocks -fno-tree-ch $(EXTRA_OPTIMIZATION)
> > endif
> >
> >
> > As you can see 4e0c54bc5bc8 has accidentally moved -fno-reorder-blocks
> > from !CONFIG_CC_OPTIMIZE_FOR_SIZE to CONFIG_CC_OPTIMIZE_FOR_SIZE.
>
> This looks like an accident to me.
> All targets except mediatek/mt7629 are setting
> CONFIG_CC_OPTIMIZE_FOR_PERFORMANCE in master. In Openwrt 21.02 the
> ARCHS38 target set CONFIG_CC_OPTIMIZE_FOR_PERFORMANCE_O3, but now it is
> also to normal performance.
>
> We should probably switch mediatek/mt7629 to
> CONFIG_CC_OPTIMIZE_FOR_PERFORMANCE, does anyone have such a device and
> could test a patch?
>
> > I've noticed problem with -fno-reorder-blocks long time ago, see:
> > [PATCH RFC] kernel: drop -fno-reorder-blocks
> > https://patchwork.ozlabs.org/project/openwrt/patch/20190409093046.13401-1-zajec5@gmail.com/
> >
> >
> > It should really get sorted out...
>
> I would suggest to remove the -fno-reorder-blocks -fno-tree-ch options
> as they are not used.
>
>
> The next step could be Profile-guided optimization:
> https://lwn.net/Articles/830300/
> If the toolchain works properly I expect there big improvements as
> routing, forwarding and NAT is completely in the kernel and we use
> devices with small caches. Profile-guided optimization should be able to
> avoid many cache misses by better packaging the binary.
>

PGO would be a dream to accomplish but it's a nightmare to actually use it.
The kernel size grow a lot and it needs to be done correctly...
Also AFAIK it's not that easy to add support for it and it's
problematic for some
device to generate the profile data.

> Hauke
>
> _______________________________________________
> openwrt-devel mailing list
> openwrt-devel at lists.openwrt.org
> https://lists.openwrt.org/mailman/listinfo/openwrt-devel



More information about the openwrt-devel mailing list