[FS#3099] ipq806x: kernel 5.4 crash related to CPU frequency scaling

OpenWrt Bugs openwrt-bugs at lists.openwrt.org
Thu May 27 11:38:43 PDT 2021


THIS IS AN AUTOMATED MESSAGE, DO NOT REPLY.

The following task has a new comment added:

FS#3099 - ipq806x: kernel 5.4 crash related to CPU frequency scaling
User who did this - Shane (digitalcircuit)

----------
Update that I feel warrants a new comment/email notification:

I have added a hardware serial connection to my router [[https://openwrt.org/toh/zyxel/nbg6817#serial|according to the OpenWRT wiki page on the NBG6817]], and recorded two reboots during the SFTP backup process.

====Hardware serial console logs====

For the first, nothing shows in the kernel log at all over the serial console (the invalid VHT happened some time before reboot):


[   53.792100] IPv6: ADDRCONF(NETDEV_CHANGE): wlan0-2: link becomes ready
[   82.627616] ath10k_pci 0000:01:00.0: Invalid peer id 1 or peer stats buffer, peer: fc03a643  sta: 00000000
[ 2239.209010] ath10k_pci 0000:01:00.0: htt tx: fixing invalid VHT TX rate code 0xff
[ 2262.956279] ath10k_pci 0000:01:00.0: Invalid VHT mcs 15 peer stats
[59332.200618] ath10k_pci 0001:01:00.0: wmi: fixing invalid VHT TX rate code 0xff
�

U-Boot 2012.07 [Standard IPQ806X.LN,unknown] (Oct 03 2018 - 18:59:17)

([[https://zorro.casa/sync/Hosting/Utilities/Development/OpenWRT/issues/3099/2021-05-25%2020-33-18-public.log.txt|Full log available here: 2021-05-25 20-33-18-public.log.txt]])

For the second reboot, "rcu_shed" complained about hung tasks at some point before reboot:


[   54.350758] IPv6: ADDRCONF(NETDEV_CHANGE): wlan0-2: link becomes ready
[   55.250942] ath10k_pci 0001:01:00.0: Invalid peer id 1 or peer stats buffer, peer: fa53385a  sta: 00000000
[   70.277303] ath10k_pci 0001:01:00.0: Invalid VHT mcs 15 peer stats
[  954.621190] ath10k_pci 0000:01:00.0: htt tx: fixing invalid VHT TX rate code 0xff
[25315.944981] rcu: INFO: rcu_sched detected stalls on CPUs/tasks:
[25315.945024] rcu: 	1-...0: (1 GPs behind) idle=8f2/1/0x40000000 softirq=431343/431344 fqs=1050 
[25315.949715] 	(detected by 0, t=2102 jiffies, g=2009085, q=250)
[25315.958391] Sending NMI from CPU 0 to CPUs 1:
^@

U-Boot 2012.07 [Standard IPQ806X.LN,unknown] (Oct 03 2018 - 18:59:17)


([[https://zorro.casa/sync/Hosting/Utilities/Development/OpenWRT/issues/3099/2021-05-26%2022-22-26-public.log.txt|Full log available here: 2021-05-26 22-22-26-public.log.txt]])

(//I switched the serial connection to a spare machine with an older version of screen, etc, hence different encoding of non-ASCII characters.//)

====Monitoring methods for serial console, SFTP connection====

Method of recording serial console output, run via a computer connected [[https://www.adafruit.com/product/954|to the USB serial console]]:

mkdir --parents "$HOME/Desktop/logs/serial/"
sudo chown root:$(id --group --name) /dev/ttyUSB0
screen -L -Logfile "$HOME/Desktop/logs/serial/$(date "+%F %H-%M-%S").log" /dev/ttyUSB0 115200
# Alternative for Ubuntu 16.04 version of screen without "-Logfile" option:
# LOGFILE="$HOME/Desktop/logs/serial/$(date "+%F %H-%M-%S").log"
# screen -L /dev/ttyUSB0 115200
# mv screenlog.0 "$LOGFILE"


Method of monitoring router for OpenSSH SFTP connection drops, run via the default dropbear root user:

while true; do
	echo
	echo "$(date -R): Watching usbdrive SFTP connection..."
	while ps | grep -v "grep" | grep -q "sshd: usbdrive"; do
	echo "$(date -I): usbdrive SFTP active - $(uptime)"
	sleep 10m
	done
	echo "-/!\---------------------------------------------"
	echo "$(date -I): usbdrive SFTP disconnected! - $(uptime)"
	echo "-------------------------------------------------"
	echo -n "Waiting for connection.."
	while ! ps | grep -v "grep" | grep -q "sshd: usbdrive"; do
	echo -n "."
	sleep 10m
	done
	echo " connected!"
done


====What to do next..?====

At this point, I'm unsure of what exactly could be causing this problem.  If I should file a new bug report, let me know!  Or if there's any particular testing, I'm willing to try more drastic steps, including a "git bisect / build / flash / test" of the OpenWRT repository changes from 21.02 branch-off point to May 18th snapshot, if that's even possible given the multi-repository layout with pulling in extra kernel modules via opkg.

It's particularly strange that this worked reliably on 19.07.x and the snapshots, but not 21.02.0rc1.  I'm tempted to retry the snapshot, but it was only May 18th since I had 6/6 consecutively successful backups.  I've not had //any// consecutive successful backups on 21.02.0rc1 even with the [[https://github.com/openwrt/openwrt/commit/861b82d36ae43efec8d16e61b82482e38996af92|/etc/init.d/cpufreq]] script, let alone six.

I apologize for the lack of clear information; I had sincerely hoped that a hardware serial console would be much more insightful, only to get almost nothing from it so far (except for the fun of successfully modding an out-of-warranty device).

In the meantime, I will continue testing, and I'll keep updating my previous comment with the backup (currently, only 1/19 successful) on 21.02.0rc1 while using the [[https://github.com/openwrt/openwrt/commit/861b82d36ae43efec8d16e61b82482e38996af92|/etc/init.d/cpufreq script]].

Thank you very much for your time!
----------

More information can be found at the following URL:
https://bugs.openwrt.org/index.php?do=details&task_id=3099#comment9729

You are receiving this message because you have requested it from the Flyspray bugtracking system.  If you did not expect this message or don't want to receive mails in future, you can change your notification settings at the URL shown above.



More information about the openwrt-bugs mailing list