Optimizing kernel compilation / alignments for network performance

Rafał Miłecki zajec5 at gmail.com
Thu May 5 08:42:56 PDT 2022

On 29.04.2022 16:49, Arnd Bergmann wrote:
> On Wed, Apr 27, 2022 at 7:31 PM Rafał Miłecki <zajec5 at gmail.com> wrote:
>> On 27.04.2022 14:56, Alexander Lobakin wrote:
>> Thank you Alexander, this appears to be helpful! I decided to ignore
>> manually.
>> 1. Without ce5013ff3bec and with -falign-functions=32
>> 387 Mb/s
>> 2. Without ce5013ff3bec and with -falign-functions=64
>> 377 Mb/s
>> 3. With ce5013ff3bec and with -falign-functions=32
>> 384 Mb/s
>> 4. With ce5013ff3bec and with -falign-functions=64
>> 377 Mb/s
>> So it seems that:
>> 1. -falign-functions=32 = pretty stable high speed
>> 2. -falign-functions=64 = very stable slightly lower speed
>> I'm going to perform tests on more commits but if it stays so reliable
>> as above that will be a huge success for me.
> Note that the problem may not just be the alignment of a particular
> function, but also how different function map into your cache.
> The Cortex-A9 has a 4-way set-associative L1 cache of 16KB, 32KB or
> 64KB, with a line size of 32 bytes. If you are unlucky and you get
> five different functions that are frequently called and are a multiple
> functions are exactly the wrong spacing that they need more than
> four ways, calling them in sequence would always evict the other
> ones. The same could of course happen if the problem is the D-cache
> or the L2.
> Can you try to get a profile using 'perf record' to see where most
> time is spent, in both the slowest and the fastest versions?
> If the instruction cache is the issue, you should see how the hottest
> addresses line up.

Your explanation sounds sane of course.

If you take a look at my old e-mail
ARM router NAT performance affected by random/unrelated commits

you'll see that most used functions are:

Is there a way to optimize kernel for optimal cache usage of selected
(above) functions?

Meanwhile I was testing -fno-reorder-blocks which some OpenWrt folks
reported as worth trying. It's another randomness. It stabilizes NAT
performance across some commits and breaks stability across others.

More information about the openwrt-devel mailing list