[OpenWrt-Devel] [RFC] build-system: NAND: Concerns around bad-block reservation and kernel / image size

Jeff Kletsky lede at allycomm.com
Mon Nov 11 15:13:57 EST 2019


TL;DR

   NAND-resident kernels seem likely to have bad blocks in the partition.

   `KERNEL_SIZE := 2048k` seems likely to overflow a 2 MB partition
   that has even a single bad block

   The ath79-nand kernel is already over 1,900,000 bytes

   What should the bad-block reservation be for a 2-MB partition?
   A 4-MB partition?

   Is there a way to handle "blit-it-down", "factory" images?

   What is the best way to implement bad-block reservation in the build
   system for kernels and for images in general?


-----

Perhaps my memory of probability is poor, but it seems unlikely that
all the blocks in even a 2 MB kernel partition stay good over the life
of a device.

Looking at typical SPI-NAND, it has 128 kB blocks, with a bad-block
reservation of 20/1024 blocks.

A 2 MB kernel then has 16 blocks, so, assuming random failures, the
probability that they are all good would be

     (1 - (20/1024))^16 ~ 73%

A 27% failure rate for devices in the field doesn't seem reasonable.

Even with dual-kernel devices with U-Boot failover, the probability of
both being bad is around 7%, which still seems unreasonable.


-----

Leaving aside poor boot-loader implementations, it would then seem
that U-Boot manages bad blocks in some way.

It looks like the `nand write` and `nand read` commands skip bad
blocks[1]. It is my understanding that the OpenWrt `mtd` executable
functions similarly.

If I've got a device with a single, bad, 128 kB block in a 2 MB kernel
partition, then I've only got 1,966,080 of writable space. Lose
another block and there's only 1,835,008 available.

First off, this, coupled with seeing ath79-nand kernels already at
1,933,844 bytes, argues for 3- or 4-MB kernel partitions.

Second, `KERNEL_SIZE := 2048k` may not be the right way to handle
this. Not only is it used for a kernel-size check, but it is often
used in the construction of "factory" images as the size of the kernel
partition

   IMAGE/factory.bin := append-kernel | pad-to $$$$(KERNEL_SIZE) | 
append-ubi

It seems that either bad-block reservation needs to be "built in" to
the KERNEL_SIZE check, or a new parameter introduced that is the
"pad-to" size.

There seems to be an additional complication with pad-to, as it often
is used in "factory" images to move the UBI image to its desired
starting address. The build system can't know how many bad blocks
there already may be on a specific end-user's device.


-----

As far as how much to reserve, my recollection of combinations and
permutations probably isn't what it should be. If I've made a mistake
in my assumptions or analysis, please let me know!

For the SPI NAND that I've looked at, as well as for something like
the Micron MT29F1-series parallel NAND[2], it looks like

* Blocks are 128 kB of data
* Number of valid blocks is typically 1004 per 1024

At least as far as I know, when a block "goes bad", the entire 128 kB
of data is no longer available.

If my dusty memory of probability is correct, then the probability of a
given number of bad blocks is the probability that it happens in one
pattern

   p_one_way = (p_bad ** bad_count) *
               (p_good ** (partition_blocks - bad_count))

multiplied by the number of ways those number of bad blocks can be
arranged among the number of blocks being examined.

   combinations(partition_blocks, bad_count)

The probability of failure at a given number of failed blocks is then
one minus the cumulative probability of that number of failed blocks or
less.

If I got this all right (Python code below), the results are ugly

(The second column is the "unreserved" space for the kernel,
"One in" is the probability of the bad blocks impacting the kernel.)

   p_bad = 20/1024

2 MB partition:                                     Pnbb Pcumul
  0 in 16    2,048 kB    One in              3      0.729357 0.729357
  1 in 16    1,920 kB    One in             26      0.232464 0.961821
  2 in 16    1,792 kB    One in            290      0.034731 0.996552
  3 in 16    1,664 kB    One in          4,558      0.003229 0.999781
  4 in 16    1,536 kB    One in         96,443      0.000209 0.999990
  5 in 16    1,408 kB    One in      2,662,549      0.000010 1.000000

4 MB partition:                                     Pnbb Pcumul
  0 in 32    4,096 kB    One in              2      0.531961 0.531961
  1 in 32    3,968 kB    One in              7      0.339099 0.871060
  2 in 32    3,840 kB    One in             41      0.104702 0.975762
  3 in 32    3,712 kB    One in            295      0.020857 0.996619
  4 in 32    3,584 kB    One in          2,713      0.003012 0.999631
  5 in 32    3,456 kB    One in         30,772      0.000336 0.999968
  6 in 32    3,328 kB    One in        421,026      0.000030 0.999998
  7 in 32    3,200 kB    One in      6,827,817      0.000002 1.000000


If 1 in ~100,000 is an "acceptable" failure rate for a given kernel
(see later note on dual-kernel layouts)

2 MB partition can "safely" hold 1,536 kB
4 MB partition can "safely" hold 3,328 kB


Dropping to p_bad = 5/1024 "helps" a little

2 MB partition:                                     Pnbb Pcumul
  0 in 16    2,048 kB    One in             13      0.924672 0.924672
  1 in 16    1,920 kB    One in            365      0.072594 0.997266
  2 in 16    1,792 kB    One in         16,087      0.002672 0.999938
  3 in 16    1,664 kB    One in      1,013,041      0.000061 0.999999

4 MB partition:                                     Pnbb Pcumul
  0 in 32    4,096 kB    One in              6      0.855018 0.855018
  1 in 32    3,968 kB    One in             93      0.134252 0.989270
  2 in 32    3,840 kB    One in          1,925      0.010211 0.999481
  3 in 32    3,712 kB    One in         54,574      0.000501 0.999982
  4 in 32    3,584 kB    One in      1,997,005      0.000018 0.999999


Still not great

2 MB partition can "safely" hold 1,664 kB
4 MB partition can "safely" hold 3,584 kB


You can "play" with the Python code for other values. I have seen two
bad blocks in 128 MB NAND.


For dual-firmware devices, there will be a "helpful" effect in that
until both kernels are "bad", the device is still functional. If the
failures are independent, then, for example, a ~1/316 chance of
failure of one kernel would be a 1/100,000 chance of failure of
both.

Looking at the data, there would still be the need to reserve at least
one or two blocks for each of the kernels. (sysupgrade messaging and
NAND upgrade would need to be improved, as it would be reasonably
likely that the partition didn't "switch every time" as it does now.)


Yes, UBI-resident kernels[3] help this as the bad blocks are dealt with
over the span of the UBI partition, but very few devices I know of
"natively" boot a kernel from UBI.



Jeff



[1] http://www.denx.de/wiki/publish/DULG/DULG-enbw_cmc.html#Section_5.9.9.2.

[2] 
https://datasheet.octopart.com/MT29F1G08ABADAWP-IT%3AD-Micron-datasheet-11552893.pdf

[3] http://www.denx.de/wiki/publish/DULG/DULG-enbw_cmc.html#Section_5.9.3.6.


Calculate probablity table:

8<

from math import factorial


def combinations(n, k):
     return factorial(n) / (factorial(k) * factorial(n-k))


if __name__ == '__main__':

     p_bad = 20/1024

     p_good = 1 - p_bad

     kB = 1024
     MB = 1024*1024

     block_size = 128 * kB

     for partition_mb in (2, 4):

         print(f"{partition_mb} MB partition: {'Pnbb':>40s}       Pcumul")

         partition_size = partition_mb * MB

         partition_blocks = round(partition_size / block_size)

         p_cumulative = 0

         for bad_count in range(0, partition_blocks+1):

             p_one_way = (p_bad ** bad_count) * (p_good ** 
(partition_blocks - bad_count))

             p_all_ways = p_one_way * combinations(partition_blocks, 
bad_count)

             p_cumulative += p_all_ways

             print(f"{bad_count:2d} in {partition_blocks:2d}   ", end='')
             print(f"{round((partition_size - bad_count * 
block_size)/kB):6,d} kB   ", end='')
             try:
                 print(f" One in {int(1/(1 - p_cumulative)):14,d}", end='')
             except ZeroDivisionError:
                 print()
                 break
             print(f"      {p_all_ways:.6f}   {p_cumulative:.6f}")
             if int(1/(1 - p_cumulative)) > 1e8:
                 break

         print()

 >8


Run monte-carlo simulation

8<

import random

runs = 1000000

p_bad = 20/1024

n_blocks = 16

counts = [0] * (n_blocks + 1)

for run in range(0, runs):
     bad_count = 0
     for block in range(0, n_blocks):
         if random.random() < p_bad:
             bad_count += 1
     counts[bad_count] += 1

cumulative = 0

for count in range(0, len(counts)):
     cumulative += counts[count]
     print(f"{count:2n}  {counts[count]:7n}  {cumulative:7n} 
{counts[count] / runs:0.7f}  {cumulative / runs:0.7f}")

 >8



_______________________________________________
openwrt-devel mailing list
openwrt-devel at lists.openwrt.org
https://lists.openwrt.org/mailman/listinfo/openwrt-devel


More information about the openwrt-devel mailing list