Buildbot infrastructure upgrade

Hannu Nyman hannu.nyman at iki.fi
Fri Mar 19 07:12:35 GMT 2021


Petr Štetiar kirjoitti 19.3.2021 klo 8.39:
> Hannu Nyman <hannu.nyman at iki.fi> [2021-03-18 19:52:18]:
>
>> Petr Štetiar kirjoitti 18.3.2021 klo 12.12:
>>> I'm still not that happy with the round-robin scheduler[1], but it's
>>> better then the previous state, so I'm going to deploy it soon to all
>>> masters.
>>>
>>> ...
>>>
>>> 1. https://github.com/buildbot/buildbot/issues/4592#issuecomment-801163587
>> I noticed that the master packages buildbot just started a new
>> mips64_octeonplus build, but only removed one of the pending build requests
>> (9 hours old) from the queue. The newer buildrequest 1344 that is 41 minutes
>> old, is still in the queue.
>>
>> https://buildbot.openwrt.org/master/packages/#/builders/4
>> https://buildbot.openwrt.org/master/packages/#/pendingbuildrequests
>>
>> The same seems to have happened to i386_pentium-mmx (while I was writing this).
>>
>> So, a started build for a target does not always clear the build request
>> queue, as intended.
> it looks like the issue in the scheduler/database update I've referenced and reported:
>
>   2021-03-18 17:32:12+0000 [-] prioritizeBuilders:    mips64_octeonplus complete_at: 2021-03-16 12:50:30+00:00
>   2021-03-18 17:32:13+0000 [-] starting build <Build mips64_octeonplus number:None results:success> using worker <WorkerForBuilder builder='mips64_octeonplus' worker='fsf-dock-22' state=AVAILABLE>
>   2021-03-18 17:32:18+0000 [-] starting build <Build mips64_octeonplus number:7 results:success>.. pinging the worker <WorkerForBuilder builder='mips64_octeonplus' worker='fsf-dock-22' state=BUILDING>
>   2021-03-18 17:56:30+0000 [-] prioritizeBuilders:    mips64_octeonplus complete_at: 2021-03-16 12:50:30+00:00
>   2021-03-19 00:23:21+0000 [-]  <Build mips64_octeonplus number:7 results:success>: build finished
>
> here previous build finishes, so the next complete_at should return time of
> 00:23:21, but it actually still returns the old timestamp:
>
>   2021-03-19 00:23:22+0000 [-] prioritizeBuilders:    mips64_octeonplus complete_at: 2021-03-16 12:50:30+00:00
>
> so the build is considered oldest and scheduled for build:
>
>   2021-03-19 00:23:24+0000 [-] starting build <Build mips64_octeonplus number:None results:success> using worker <WorkerForBuilder builder='mips64_octeonplus' worker='fsf-dock-22' state=AVAILABLE>
>   2021-03-19 00:23:31+0000 [-] starting build <Build mips64_octeonplus number:8 results:success>.. pinging the worker <WorkerForBuilder builder='mips64_octeonplus' worker='fsf-dock-22' state=BUILDING>
>
> Cheers,
>
> Petr


I think that this might the problem that rjarry tried to overcome with 
"cooldown_seconds" defined and set to 4 seconds in the discussion in the 
upstream buildbot issue you referenced. I think that he made the queue 
evaluation to wait for 4 seconds before actually starting, so that all 
asynchronous updates would have been written first.  (I didn't look into 
logic too deeply, but that was my impression at the first glance.)

We might try something similar, even set a bit longer waiting time.






More information about the openwrt-devel mailing list