Reduced throughput with mt7621 and DSA

I am currently performing some performance measurements, comparing the
(wired) routing throughput (WAN <-> LAN) of 19.07, 21.02 and master on
mt7621 (ZBT WG-3526). I have connected one client to my LAN and one to
the WAN, and use iperf3 to measure. I create parallel flows (in order
to take advantage of the multiple CPU cores), use TCP and let iperf3
run for 30 sec. per test.

Based on my measurements, the throughput is reduced by ~50% going from
19.07 and to 21.02/master (~450Mbit/s vs. ~900Mbit/s). I do not have a
particular commit I can point to, but I believe the regressions is
caused by the introduction of DSA. Restoring the old swconfig driver,
brings my 21.02/master throughput up to roughly the same level as

I am able to alleviate the reduction in throughput by enabling flow
offloading, but there are several cases where flow offloading does not
have an effect. When performing a similar measurement to the one above
over a Wireguard-tunnel, I see a similar reduction in performance (and
no help from flow offloading).

Does anyone know what could be the reason and if there is anything
that can be done to improve the performance when using DSA? Are there
for example any out of tree/not yet accepted patches that I should

Thanks in advance for any help,

