Packages buildbot is erratic, both master and 23.05 packages fail often

Hannu Nyman hannu.nyman at iki.fi
Sat Jun 3 01:27:03 PDT 2023


Petr Štetiar kirjoitti 2.6.2023 klo 22.07:
> So having following in buildbot log:
>
>   2023-06-01 23:53:12+0000 [-] command timed out: 3600 seconds without output running [b'make', b'-j7', b'IGNORE_ERRORS=n m y', b'BUILD_LOG=1', b'CONFIG_AUTOREMOVE=y', b'CONFIG_SIGNED_PACKAGES='], attempting to kill
>   2023-06-01 23:53:12+0000 [-] trying to kill process group 1528179
>
> I've looked at the system logs around that time and found following:
>
>   Jun 01 22:23:19 audit[3844576]: AVC apparmor="DENIED" operation="mkdir" info="Failed name lookup - name too long"
>                   error=-36 profile="docker-default"
> 		 name="/shared-workdir/build/sdk/build_dir/hostpkg/gettext-0.21.1/gettext-tools/confdir3/confdir3/confdir3/confdir3...[snip very long repeating pattern]...
> 		 confdir3/confdir3/confdir3/confdir3/confdir3" pid=3844576 comm="conftest" requested_mask="c" denied_mask="c" fsuid=1000 ouid=1000
>   Jun 01 22:23:45 kernel: conftest[3855174]: segfault at 0 ip 00007fe9581067e7 sp 00007ffd94ca2118 error 4 in libc-2.31.so[7fe958085000+159000]
>   ...
>
> Since the host is shared with other 3 build workers I can't be sure, that it
> originated from that timeouted build.
>

Looking at that observation about gettext and recursive "confdir3/", it is 
plausible that gettext has problem that manifests in some builds, or trouble 
with parallelism on some occasions.

Gettext was heavily reorganised in May, near the same time as the buildbot 
code was revamped. So, this might quite well be related to the gettext 
package and not the new buildbot code.


Looking at one failing build log, (4030 lines long):

https://buildbot.staging.openwrt.org/master/packages/#/builders/14/builds/16/steps/24/logs/stdio

* gettext-full host build starts on line 1102 and completes on 1675

*gettext-full normal compile starts on line 1676 and never completes



1102 make[3] -C feeds/base/package/libs/gettext-full host-compile
...

1675 make[3] -C feeds/base/package/libs/gettext-full clean-build
1676 make[3] -C feeds/base/package/libs/gettext-full compile
...
4025 make[3] -C feeds/packages/admin/zabbix compile
4026
4027 command timed out: 3600 seconds without output running [b'make', b'-j7', 
b'IGNORE_ERRORS=n m y', b'BUILD_LOG=1', b'CONFIG_AUTOREMOVE=y', 
b'CONFIG_SIGNED_PACKAGES='], attempting to kill
4028 process killed by signal 9
4029 program finished with exit code -1
4030 elapsedTime=49718.148698

No gettext completion before the final timeout error.  Hunderds of other 
packages were compiled in the time when gettext was was being recursively 
compiled?


Context for Christian and Michael:


Hannu Nyman kirjoitti 1.6.2023 klo 19.11:
> Looks like the new buildbot code and new instances (also for 23.05) are not 
> yet quite stable...
>
> Packages of some popular architectures like aarch64_cortex-a53 for mt7622 
> and ipq807x have not been built for a week in master.
>
> There has been many timeouts of "3600 seconds without output" in master, 
> and quite too many "out of space" errors in the 23.05 packages buildbot.
>
> ...
>
> Much too many buildbot-specifc errors compared to proper build failures due 
> to source code...
> Something strange/unstabilized in the buildbot ?
> Or just some newly updated problematic packages causing havoc?
>
> ...
>
> https://buildbot.staging.openwrt.org/master/packages/#/builders





More information about the openwrt-devel mailing list