[FS#4158] Master regression - boot loop due to kernel panic (mt7621) (Attachment added)

OpenWrt Bugs openwrt-bugs at lists.openwrt.org
Sat Nov 27 05:34:32 PST 2021


THIS IS AN AUTOMATED MESSAGE, DO NOT REPLY.

A new Flyspray task has been opened.  Details are below. 

User who did this - d.souza (dsouza) 

Attached to Project - OpenWrt/LEDE Project
Summary - Master regression - boot loop due to kernel panic (mt7621)
Task Type - Bug Report
Category - Kernel
Status - Unconfirmed
Assigned To - 
Operating System - All
Severity - Critical
Priority - Very Low
Reported Version - Trunk
Due in Version - Undecided
Due Date - Undecided
Details - I just did a build yesterday for the Archer C6 v3.2 from master, and now it is not booting anymore due to a boot loop issue (kernel panic).

Running OpenWRT SNAPSHOT custom build from master (r18195-d1c7df9c4b)

To reproduce, just install the build and reboot the device.

Attaching a UART I can see the kernel panic error below:


(...)
[    0.797651] CPU 1 Unable to handle kernel paging request at virtual address 5050404, epc == 80588ef8, ra == 801fe360
[    0.808162] Oops[#1]:
[    0.810387] CPU: 1 PID: 1 Comm: swapper/0 Not tainted 5.4.159 #0
[    0.816345] $ 0   : 00000000 00000001 8fc304a4 00000108
[    0.821525] $ 4   : 05050404 80621000 8064d18f 00000061
[    0.826712] $ 8   : fffffffc 80594b3c 00000045 006d6873
[    0.831893] $12   : 015ede76 08fca8f3 9715a5ed 5c2e1039
[    0.837079] $16   : 8ff9cc00 8fc2523c 05050404 8064d184
[    0.842261] $20   : 0000000b 8fc06e00 8ff9cc8c 806ebd24
[    0.847448] $24   : 00000010 76ec2f43
[    0.852629] $28   : 8fc40000 8fc41d50 38e38e39 801fe360
[    0.857816] Hi    : 00000000
[    0.860665] Lo    : 006c0400
[    0.863546] epc   : 80588ef8 strlen+0x0/0x2c
[    0.867771] ra    : 801fe360 insert_header+0x140/0x4f8
[    0.872847] Status: 11007c03 KERNEL EXL IE
[    0.876997] Cause : 40800008 (ExcCode 02)
[    0.880969] BadVA : 05050404
[    0.883821] PrId  : 0001992f (MIPS 1004Kc)
[    0.887882] Modules linked in:
[    0.890910] Process swapper/0 (pid: 1, threadinfo=(ptrval), task=(ptrval), ts=00000000)
[    0.898940] Stack : 08fca8f3 00000000 2ab4a599 00000dc0 00000000 8fc06e30 8f06e00 801fc880
[    0.907235]         80862190 00000000 00000000 8fe57007 00000000 8ff9cc00 8f06e00 80860000
[    0.915528]         8fc06e00 00000001 00000000 801feaec 806f0000 80830000 8024aec 00000000
[    0.923822]         8064d008 8fd64a00 806eb18c 8fc06e00 8063ab2c 8063ab50 0000001 8fc06e00
[    0.932118]         8fe57007 806ebc4c 8fe57000 806ebc04 00000001 806eb18c 8030000 80830000
[    0.940412]         ...
[    0.942832] Call Trace:
[    0.945261] [] strlen+0x0/0x2c
[    0.949146] [] insert_header+0x140/0x4f8
[    0.953897] [] __register_sysctl_table+0x30c/0x630
[    0.959516] [] __register_sysctl_paths+0xf4/0x1e8
[    0.965067] [] ipc_sysctl_init+0x14/0x24
[    0.969793] [] do_one_initcall+0x50/0x1a8
[    0.974641] [] kernel_init_freeable+0x1ec/0x2d0
[    0.979997] [] kernel_init+0x10/0xf8
[    0.984398] [] ret_from_kernel_thread+0x14/0x1c
[    0.989755] Code: a066ffff  1000fff7  00000000  10400007  00000000 00801025  80430001  1460fffe
[    0.999424]
[    1.000995] ---[ end trace d1818afedd9795ac ]---
```


Reverting to a build I did a couple of weeks ago (also from master) solves the problem.

I believe I have already identified the root cause.

It seems that the amount of RAM memory sometimes is not correctly identified. When the boot fails, **the boot loader seems to be identifying a "HighMem" memory that does not exist in this device**:

**Wrong HighMem Memory Detected causes Kernel Panic during boot:**

(...)
[    0.000000] MIPS secondary cache 256kB, 8-way, linesize 32 bytes.
[    0.000000] Zone ranges:
[    0.000000]   Normal   [mem 0x0000000000000000-0x000000000fffffff]
[    0.000000]   HighMem  [mem 0x0000000010000000-0x0000000023ffffff]
[    0.000000] Movable zone start for each node
[    0.000000] Early memory node ranges
[    0.000000]   node   0: [mem 0x0000000000000000-0x000000001bffffff]
[    0.000000]   node   0: [mem 0x0000000020000000-0x0000000023ffffff]
(...)
[    0.000000] Memory: 510004K/524288K available (5739K kernel code, 200K rwdata, 1196K rodata, 1236K init, 226K bss, 14284K reserved, 0K cma-reserved, 262144K highmem)


Per log above the identified amount of RAM memory is 512MB, when in fact this device has only 128MB of RAM. When the above situation happens the boot fails with kernel panic.

After a couple of power cycles, the memory is correctly identified as 128MB (no HighMem) per below and the device boots OK:

**Correct Memory Size Detected boots OK:**

(...)
[    0.000000] MIPS secondary cache 256kB, 8-way, linesize 32 bytes.
[    0.000000] Zone ranges:
[    0.000000]   Normal   [mem 0x0000000000000000-0x0000000007ffffff]
[    0.000000]   HighMem  empty
[    0.000000] Movable zone start for each node
[    0.000000] Early memory node ranges
[    0.000000]   node   0: [mem 0x0000000000000000-0x0000000007ffffff]
(...)
[    0.000000] Memory: 120916K/131072K available (5739K kernel code, 200K rwdata, 1196K rodata, 1236K init, 226K bss, 10156K reserved, 0K cma-reserved, 0K highmem)
(...)


Full log is attached.

One or more files have been attached.

More information can be found at the following URL:
https://bugs.openwrt.org/index.php?do=details&task_id=4158

You are receiving this message because you have requested it from the Flyspray bugtracking system.  If you did not expect this message or don't want to receive mails in future, you can change your notification settings at the URL shown above.



More information about the openwrt-bugs mailing list