ARM BCM53573 SoC hangs/lockups caused by locks/clock/random changes

Rafał Miłecki zajec5 at gmail.com
Wed Nov 29 13:20:49 PST 2023


Hi,

it's a late reply but I didn't find enough determination earlier.

On 8.09.2023 10:10, Linus Walleij wrote:
> On Mon, Sep 4, 2023 at 10:34 AM Rafał Miłecki <zajec5 at gmail.com> wrote:
> 
>> I'm clueless at this point.
>> Maybe someone can come up with an idea of actual issue & ideally a
>> solution.
> 
> Damn this is frustrating.
> 
>> 2. Clock (arm,armv7-timer)
>>
>> While comparing main clock in Broadcom's SDK with upstream one I noticed
>> a tiny difference: mask value. I don't know it it makes any sense but
>> switching from CLOCKSOURCE_MASK(56) to CLOCKSOURCE_MASK(64) in
>> arm_arch_timer.c (to match SDK) increases average uptime (time before a
>> hang/lockup happens) from 4 minutes to 36 minutes.
> 
> This could be related to how often the system goes to idle.
> 
>> +       if (cpu_idle_force_poll == 1234)
>> +               arch_cpu_idle();
>> +       if (cpu_idle_force_poll == 5678)
>> +               arch_cpu_idle();
>> +       if (cpu_idle_force_poll == 1234)
>> +               arch_cpu_idle();
>> +       if (cpu_idle_force_poll == 5678)
>> +               arch_cpu_idle();
>> +       if (cpu_idle_force_poll == 1234)
>> +               arch_cpu_idle();
>> +       if (cpu_idle_force_poll == 5678)
>> +               arch_cpu_idle();
>> +       if (cpu_idle_force_poll == 1234)
>> +               arch_cpu_idle();
> 
> Idle again.
> 
> I would have tried to see what arch_cpu_idle() is doing.
> 
> arm_pm_idle() or cpu_do_idle()?

In my case arm_pm_idle is NULL.


> What happens if you just put return in arch_cpu_idle()
> so it does nothing?

Doesn't help. I also tried putting:
udelay(10);
and
udelay(1000);
at the arch_cpu_idle() beginning. None helped.


Here comes more interesting experiment though. Putting there:

if (!(foo++ % 10000)) {
	pr_info("[%s] arm_pm_idle:%ps\n", __func__, arm_pm_idle);
}

doesn't seem to help.


Putting following however seems to make kernel/device stable:

if (!(foo++ % 100)) {
	pr_info("[%s] arm_pm_idle:%ps\n", __func__, arm_pm_idle);
}


I think I'm just going to assume those chipsets are simply hw broken.



More information about the openwrt-devel mailing list