Optimizing kernel compilation / alignments for network performance

Rafał Miłecki zajec5 at gmail.com
Wed Apr 27 05:04:54 PDT 2022


I noticed years ago that kernel changes touching code - that I don't use
at all - can affect network performance for me.

I work with home routers based on Broadcom Northstar platform. Those
are SoCs with not-so-powerful 2 x ARM Cortex-A9 CPU cores. Main task of
those devices is NAT masquerade and that is what I test with iperf
running on two x86 machines.


Example of such unused code change:
ce5013ff3bec ("mtd: spi-nor: Add support for XM25QH64A and XM25QH128A").
It lowered my NAT speed from 381 Mb/s to 367 Mb/s (-3,5%).

I first reported that issue it in the e-mail thread:
ARM router NAT performance affected by random/unrelated commits

Back then it was commit 5b0890a97204 ("flow_dissector: Parse batman-adv
unicast headers")
that increased my NAT speed from 741 Mb/s to 773 Mb/s (+4,3%).


It appears Northstar CPUs have little cache size and so any change in
location of kernel symbols can affect NAT performance. That explains why
changing unrelated code affects anything & it has been partially proven
aligning some of cache-v7.S code.

My question is: is there a way to find out & force an optimal symbols

Adding .align 5 to the cache-v7.S is a partial success. I'd like to find
out what other functions are worth optimizing (aligning) and force that
(I guess  __attribute__((aligned(32))) could be used).

I can't really draw any conclusions from comparing System.map before and
after above commits as they relocate thousands of symbols in one go.

Optimizing is pretty important for me for two reasons:
1. I want to reach maximum possible NAT masquerade performance
2. I need stable performance across random commits to detect regressions

More information about the openwrt-devel mailing list