[RFC PATCH 2/3] clk: sunxi-ng: Implement precalculated NKM rate selection

Sun May 28 10:12:05 PDT 2023

Hi Julian,

On 2023-05-29 at 01:32:02 +1000, Julian Calaby <julian.calaby at gmail.com> wrote:
> Hi Frank,
>
> On Sun, May 28, 2023 at 8:10 PM Frank Oltmanns <frank at oltmanns.dev> wrote:
>>
>> Hi Julian,
>>
>> On 2023-05-28 at 09:19:36 +1000, Julian Calaby <julian.calaby at gmail.com> wrote:
>> > Hi Frank,
>> >
>> > On Sat, May 27, 2023 at 11:37 PM Frank Oltmanns <frank at oltmanns.dev> wrote:
>> >>
>> >> Add a new precalculation method for NKM clock rate selection in the
>> >> sunxi-ng clock driver. Introduce ccu_nkm_find_best_precalc which uses a
>> >> precalculated table of valid NKM combinations (struct clk_nkm_table and
>> >> struct clk_nkm_combo) to find the best rate. This approach provides
>> >> faster rate selection by searching a table of valid combinations rather
>> >> than calculating for all possible combinations.
>> >>
>> >> The table of NKM combinations needs to be initialized with meaningful
>> >> combinations only, i.e. removing redundant combinations that result in
>> >> the same rate.
>> >>
>> >> Keep the existing ccu_nkm_find_best function in place and use it as a
>> >> fallback if no precalculated table is provided.
>> >>
>> >> Signed-off-by: Frank Oltmanns <frank at oltmanns.dev>
>> >> ---
>> >>  drivers/clk/sunxi-ng/ccu_nkm.c | 84 +++++++++++++++++++++++++++-------
>> >>  drivers/clk/sunxi-ng/ccu_nkm.h | 26 +++++++++++
>> >>  2 files changed, 94 insertions(+), 16 deletions(-)
>> >>
>> >> diff --git a/drivers/clk/sunxi-ng/ccu_nkm.c b/drivers/clk/sunxi-ng/ccu_nkm.c
>> >> index 94d2a83992b2..9652f6df17bd 100644
>> >> --- a/drivers/clk/sunxi-ng/ccu_nkm.c
>> >> +++ b/drivers/clk/sunxi-ng/ccu_nkm.c
>> >> @@ -54,6 +54,49 @@ static unsigned long ccu_nkm_find_best(unsigned long parent, unsigned long rate,
>> >>         return best_rate;
>> >>  }
>> >>
>> >> +static unsigned long ccu_nkm_find_best_precalc(unsigned long parent,
>> >> +                                              unsigned long rate,
>> >> +                                              struct _ccu_nkm *nkm,
>> >> +                                              struct clk_nkm_table *table)
>> >> +{
>> >> +       unsigned long best_rate = 0, best_diff = ULONG_MAX;
>> >> +       unsigned long best_n = 0, best_k = 0, best_m = 0;
>> >> +       int start = 0, end = table->num - 1, mid;
>> >> +
>> >> +       while (start <= end) {
>> >> +               unsigned long tmp_rate;
>> >> +               unsigned long tmp_diff;
>> >> +
>> >> +               mid = (start + end) / 2;
>> >> +
>> >> +               tmp_rate = parent * table->combos[mid].n * table->combos[mid].k /
>> >> +                          table->combos[mid].m;
>> >> +
>> >> +               tmp_diff = abs(rate - tmp_rate);
>> >> +
>> >> +               if (tmp_diff < best_diff) {
>> >> +                       best_rate = tmp_rate;
>> >> +                       best_diff = tmp_diff;
>> >> +                       best_n = table->combos[mid].n;
>> >> +                       best_k = table->combos[mid].k;
>> >> +                       best_m = table->combos[mid].m;
>> >> +                       if (best_diff == 0)
>> >> +                               goto out;
>> >> +               }
>> >
>>
>> Thank you for your feedback!
>>
>> In my proposal, the code performs a binary search by
>>  1. taking the element in the middle (mid)
>>  2. calculating the rate of the element (tmp_rate)
>>  3. calculating the difference to the requested rate (tmp_diff)
>>  4. if the diff is better than the best_diff making it the new best
>>     n-k-m-combo (the if block)
>
> I'm so sorry, I thought that this was still doing a linear search as
> it's so close to the original code.
>
>>
>> > If the table was sorted by n * k / m, this could just be a process of
>>
>> Please note, the table already has to be sorted for the function to
>> work, as is the nature of a binary search. I should definitely add
>> comments. I'm sorry, the code was intended more as a basis to discuss
>> the general idea that I described in the cover letter. I should have
>> made that clearer.
>>
>> > searching through until we either:
>> > - find that the first rate in the table is too high
>>
>> I could see that I could add two steps in the beginning, before the loop:
>>  - Take the first element and see if its rate is greater than the
>>    requested rate, if so immediatly return it
>>  - Take the last element and see if its rate is less than the requested
>>    rate, if so immediatly return it
>>
>> Is that what you mean? I'd have to run some simulations to see, if this
>> is a real improvement, because we would need two additional rate
>> calculations. Worst case would therefore be 2+log(n) calculations
>> instead of log(n) and the code would be slightly more complicated in my
>> opinion. But if we run this function with all possible parents rate (as
>> suggested in the end of my cover letter) these two special cases could
>> very well be often applicable. Thanks!
>>
>> > - find an exact rate
>>
>> What do you mean by "exact rate"? Do you mean a rate that matches the
>> requested rate exactly. This is what the code is already trying to do.
>> But, as this is not always possible, in cases where it does not find an
>> exact match, it takes the closest match instead.
>>
>> > - go above the requested rate, then there's only two to compare: our
>> > current rate and the previous one
>>
>> Sorry, you've lost me here. How would I go above the requested rate? You
>> would have to do the binary search to find that rate, but then why not
>> search the closest rate directly (as the code does) instead of searching
>> the closest rate above the requested (as you proposed). I feel like
>> either one of us is missing something. :)
>
> What we're missing is that I'm not explaining this well.
>
> Let's take a very simple table: (value = parent * n * k / m)
>
> 0. 100
> 1. 200
> 2. 300
> 3. 400
>
> If we search for 50, our closest is the first rate, so index 0: this
> is the "find that the first rate in the table is too high" case.
>
> If we search for 300, we'll converge on index 2: this is the "exact
> rate" situation.
>
> If we search for 275, then we'll converge on either 200 or 300: this
> is the "two to compare" situation: if we converge until we get to the
> lowest rate above our target, we only need to check the rate
> immediately before it in the table and the one we converged on to find
> the closest.
>
> So in pseudo-code, we'd end up with something like this:
>
> --------
>
> start = 0;
>
> cur_rate = parent * table[start].n * table[start].k / table[start].m;
>
> if (cur_rate >= target)
>     return table[start];
>
> while (start <= end) {
>     mid = (start + end) / 2;

Thanks for the thorough explanation!

This needs to be (start + end + 1) / 2

Otherwise, if we extend your hypothetical list above with another item,
let's say 500 and look for 199, this would result in the loop finishing
with mid = 0, if I'm not mistaken, and hence an access to table[-1] when
calculating prev_rate below. Not good.

But I *think*, with (start + end + 1) / 2 it works in all cases.

>
>     cur_rate = parent * table[mid].n * table[mid].k / table[mid].m;
>
>     if (cur_rate == target)
>         return table[mid];
>
>    if (target < cur_rate)
>        end = mid - 1;
>    else
>        start = mid + 1;
> }
>
> prev_rate = parent * table[mid - 1].n * table[mid - 1].k / table[mid - 1].m;
>
> if (abs(target - prev_rate) < abs(target - cur_rate))
>     return table[mid - 1];
>
> return table[mid];
>
> --------
>
> Which seems simpler to my eye and moves all the difference
> calculations out of the loop so they only have to be done once,
> effectively trading a difference calculation on each checked rate for
> a rate calculation, and dropping some variables in the process.

At least it's shorter. I'm not sure it's simpler (after all it contained
a mistake, I think ;-)). Still, it looks neat, so I might still use your
(revised) algorithm.

Thanks,
  Frank
>
> Thanks,