mvebu: armada 3720 cpufreq reverts

Robert Marko robert.marko at sartura.hr
Wed Jun 30 10:21:15 PDT 2021


On Wed, Jun 30, 2021 at 7:07 PM Marek Behún <marek.behun at nic.cz> wrote:
>
> On Wed, 30 Jun 2021 17:51:24 +0200
> Robert Marko <robert.marko at sartura.hr> wrote:
>
> > On Wed, Jun 30, 2021 at 3:19 PM Marek Behún <marek.behun at nic.cz>
> > wrote:
> > >
> > > Hello Robert,
> > >
> > > I am writing regarding commit
> > >   mvebu: 5.10 fix DVFS caused random boot crashes
> > >   https://git.openwrt.org/?p=openwrt/openwrt.git;a=commit;h=080a0b74e39d159eecf69c468debec42f28bf4d8
> > > in OpenWRT.
> > >
> > > This commit reverts the one patch of a3720 cpufreq driver, but not
> > > the subsequent ones.
> > >
> > > Your commit message says that some 1.2 GHz SOCs are unstable with
> > > the fix. Did you also test this with the subsequent patches, which
> > > are now in stable kernels? I guess the answer is yes, because all
> > > these patches were backported to 5.10.37.
> >
> > Hi Marek,
> >
> > Yes, the rest of the patches were there as well.
> > >
> > > I am of the opinion that a better approach would be to
> > > - either disable cpufreq for 1.2 GHz variants
> > > - fix a3720 cpufreq driver to only scale up to 1 GHz on 1.2 GHz
> > > variant
> >
> > I would prefer limiting it to 1GHz as that would not cause
> > performance issues, but 1GHz models could have the same issue as well.
> > This is because the voltages that are set as a minimum are from the
> > testing that Pali and the Turris guys did, but it really depends on
> > the SoC batch you receive.
>
> The thing is you cannot limit it to 1 GHz in kernel, because when the
> device is booted to 1.2 GHz the dividers are {1, 2, 4, 6}, so the
> available frequencies are 1200 MHz, 600 MHz, 300 MHz, 200 MHz.
>
> If you want to limit it to 1 GHz, you need to build the flash-image.bin
> with CLOCKSPRESET=CPU_1000_DDR_800 and reflash the device.

This is an issue and the reason why I have devices running old ATF+U-boot
as the customer deployed more than a thousand of these and I can't really
pull the devices for reflashing.

>
> With your revert the cpufreq scaling may be stable, but the CPU clock
> switches to TBG-A-P, which is 750 MHz.
> The result is that you are scaling, but you are scaling between
>   750 MHz, 375 MHz, 187.5 MHz, 125 MHz
>
> Which is even worse than 1 GHz variant, where the top frequecny with
> your revert is 800 MHz.

Yes, I gathered that from the commit itself as previously they were running at
750/800 MHz and that hid the whole voltage issue for a while.
>
> > >
> > > Since the approach you've taken now (reverting the patch) basically
> > > changes the CPU parnet clock to DDR clock, which is just wrong.
> > > Worse is that you are doing this for everybody, not just for the 1.2
> > > GHz variants.
> > >
> > > What do you think?
> >
> > I understand that it was not the best solution, but something had to
> > be done as I was not able to even finish booting on multiple boards
> > before crashing. It just reverted the things back to the previous
> > state.
> >
> > I really could not figure a proper solution even after being in touch
> > with Pali, and contacting
> > GlobalScale.
> >
> > This is an issue caused by Marvell simply ignoring the issue and
> > refusing to publish
> > a fix or release the OTP and AVS docs as they all have a validated
> > voltage in the OTP
> > somewhere.
>
> I have sent patch to upstream kernel disabling cpufreq on 1.2 GHz
> models. I think this is the most sane solution for now, since we
> simply do not know how to scale properly on this variant.
>
> Once the patch is accepted, would you please remove your revert?

Sure, not an issue.
Hopefully, Marvell will finally step up and provide some clarity.

Regards,
Robert
>
> Marek



-- 
Robert Marko
Staff Embedded Linux Engineer
Sartura Ltd.
Lendavska ulica 16a
10000 Zagreb, Croatia
Email: robert.marko at sartura.hr
Web: www.sartura.hr



More information about the openwrt-devel mailing list