[OpenWrt-Devel] [RFC] [PATCH v4] lantiq: IRQ balancing, ethernet driver, wave300

Hauke Mehrtens hauke at hauke-m.de
Mon Mar 25 19:45:14 EDT 2019


On 3/26/19 12:24 AM, Hauke Mehrtens wrote:
> Hi Petr
> 
> On 3/14/19 6:46 AM, Petr Cvek wrote:
>> Hello again,
>>
>> I've managed to enhance few drivers for lantiq platform. They are still
>> in ugly commented form (ethernet part especially). But I need some hints
>> before the final version. The patches are based on a kernel 4.14.99.
>> Copy them into target/linux/lantiq/patches-4.14 (cleaned from any of my
>> previous patch).
> 
> Thanks for working on this.
> 
>> The eth+irq speedup is up to 360/260 Mbps (the vanilla was 170/80 on my
>> setup). The iperf3 benchmark (2 passes for both vanilla and changed
>> versions) altogether with script are in the attachment.
>>
>> 1) IRQ with SMP and balancing support:
>>
>> 	0901-add-icu-smp-support.patch
>> 	0902-enable-external-irqs-for-second-vpe.patch
>> 	0903-add-icu1-node-for-smp.patch
>>
>> As requested I've changed the patch heavily. The original locking from
>> k3b source code (probably from UGW) didn't work and in heavy load the
>> system could have froze (smp affinity change during irq handling). This
>> version has this fixed by using generic raw spinlocks with irq.
>>
>> The SMP IRQ now works in a way that before every irq_enable (serves as
>> unmask too) the VPE will be switched. This can be limited by writing
>> into /proc/irq/X/smp_affinity (it can be possibly balanced from
>> userspace too).
>>
>> I've rewritten the device tree reg fields so there are only 2 arrays
>> now. One per an icu controller. The original one per module was
>> redundant as the ranges were continuous. The modules of a single ICU are
>> now explicitly computed in a macro:
>>
>> 	ltq_w32((x), ltq_icu_membase[vpe] + m*0x28 + (y))
>> 	ltq_r32(ltq_icu_membase[vpe] + m*0x28 + (x))
>>
>> before there was a pointer for every 0x28 block (there shouldn't be
>> speed downgrade, only a multiplication and an addition for every
>> register access).
>>
>> Also I've simplified register names from LTQ_ICU_IM0_ISR to LTQ_ICU_ISR
>> as "IM0" (module) was confusing (the real module number 0-4 was a part
>> of the macro).
>>
>> The code is written in a way it should work fine on a uniprocessor
>> configuration (as the for_each_present_cpu etc macros will cycle on a
>> single VPE on uniprocessor). I didn't test the no CONFIG_SMP yet, but I
>> did check it with "nosmp" kernel parameter. It works.
>>
>> Anyway please test if you have the board where the second VPE is used
>> for FXS.
>>
>> The new device tree structure is now incompatible with an old version of
>> the driver (and old device tree with the new driver too). It seems icu
>> driver is used in Danube, AR9, AmazonSE and Falcon chipset too. I don't
>> know the hardware for these boards so before a final patch I would like
>> to know if they have a second ICU too (at 0x80300 offset).
> 
> Normally the device tree should stay stable, but I already though about
> the same change and I am not aware that any device ships a U-Boot with
> an embedded device tree, so this should be fine.
> 
> The Amazon and Amazon SE only have one ICU block because they only have
> one CPU with one VPE.
> The Danube SoC has two ICU blocks one for each CPU, each CPU only has
> one VPE. The CPUs are not cache coherent, SMP is not possible.
> 
> Falcon, AR9, VR9, AR10, ARX300, GRX300, GRX330 have two ICU blocks one
> for each VPE of the single CPU.
> GRX350 uses a MIPS InterAptiv CPU with a MIPS GIC.
> 
>> More development could be done with locking probably. As only the
>> accesses in a single module (= 1 set of registers) would cause a race
>> condition. But as the most contented interrupts are in the same module
>> there won't be much speed increase IMO. I can add it if requested (just
>> spinlock array and some lookup code).
> 
> I do not think that this improves the performance significantly, I
> assume that the CPUs only have to wait there in rare conditions anyway.
> 
>> 2) Reworked lantiq xrx200 ethernet driver:
>>
>> 	0904-backport-vanilla-eth-driver.patch
>> 	0905-increase-dma-descriptors.patch
>> 	0906-increase-dma-burst-size.patch
>>
>> The code is still ugly, but stable now. There is a fragmented skb
>> support and napi polling. DMA ring buffer was increased so it handle
>> faster speeds and I've fixed some code weirdness. A can split the
>> changes in the future into separate patches.
> 
> It would be nice if you could also do the same changes to the upstream
> driver in mainline Linux kernel and send this for inclusion to mainline
> Linux.
> 
>> I didn't test the ICU and eth patches separate, but I've tested the
>> ethernet driver on a single VPE only (by setting smp affinity and
>> nosmp). This version of the ethernet driver was used for root over NFS
>> on the debug setup for like two weeks (without problems).
>>
>> Tell me if we should pursue the way for the second DMA channel to PPE so
>> both VPEs can send frames at the same time.
> 
> I think it should be ok to use both DMA channels for the CPU traffic.
> 
>> 3) WAVE300
>>
>> In the two past weeks I've tried to fix a mash together various versions
>> of wave300 wifi driver (there are partial version in GPL sources from
>> router vendors). And I've managed to put the driver into "not
>> immediately crashing" mode. If you are interested in the development,
>> there is a thread in openwrt forum. The source repo here:
>>
>> https://repo.or.cz/wave300.git
>> https://repo.or.cz/wave300_rflib.git
>>
>> (the second one must be copied into the first one)
>>
>> The driver will often crash when meeting an unknown packet, request for
>> encryption (no encryption support), unusual combination of configuration
>> or just by module unloading. The code is _really_ ugly and it will
>> server only as hardware specification for better GPL driver development.
>> If you want to help or you have some tips you can join the forum (there
>> are links for firmwares and intensive research of available source codes
>> from vendors).
>>
>> Links:
>> https://forum.openwrt.org/t/support-for-wave-300-wi-fi-chip/24690/129
>> https://forum.openwrt.org/t/how-can-we-make-the-lantiq-xrx200-devices-faster/9724/70
>> https://forum.openwrt.org/t/xrx200-irq-balancing-between-vpes/29732/25
>>
>> Petr
> Hauke

It would be nice if you could send your patches as single mails and
inline so I can easily comment on them.

The DMA handling in the OpenWrt Ethernet driver is only more flexible to
handle arbitrary number of DMA channels, but I think this is not needed.

The DMA memory is already 16 byte aligned, see the byte_offset variable
in xmit, so it should not be a problem to use the 4W DMA mode, I assume
that the hardware also takes care of this.

Why are the changes in arch/mips/kernel/smp-mt.c needed? this looks
strange to me.

Changing LTQ_DMA_CPOLL could affect the latency of the system, but I
think your increase should not harm significantly.

Hauke

_______________________________________________
openwrt-devel mailing list
openwrt-devel at lists.openwrt.org
https://lists.openwrt.org/mailman/listinfo/openwrt-devel



More information about the openwrt-devel mailing list