'tr' character class support?

Rosen Penev rosenp at gmail.com
Fri Jul 10 23:33:27 EDT 2020


On Fri, Jul 10, 2020 at 5:15 PM Jordan Geoghegan <jordan at geoghegan.ca> wrote:
>
>
>
> On 2020-07-10 16:59, Rosen Penev wrote:
> > On Fri, Jul 10, 2020 at 4:17 PM Jordan Geoghegan <jordan at geoghegan.ca> wrote:
> >>
> >>
> >> On 2020-07-10 14:54, Rosen Penev wrote:
> >>> On Fri, Jul 10, 2020 at 2:29 PM Jordan Geoghegan <jordan at geoghegan.ca> wrote:
> >>>>
> >>>> On 2020-07-10 14:15, Magnus Kroken wrote:
> >>>>> Hi Jordan
> >>>>>
> >>>>> On 10.07.2020 22:45, Jordan Geoghegan wrote:
> >>>>>> Hey folks,
> >>>>>>
> >>>>>> Does the 'tr' utility support character classes in OpenWRT? I was
> >>>>>> playing around with an OpenWRT x86_64 VM and I noticed that 'tr'
> >>>>>> doesn't seem to support character classes.
> >>>>>> The command " echo HELLO | tr '[:upper:]' '[:lower:]' "  does not
> >>>>>> convert to the text to lowercase as it should (and as required by
> >>>>>> POSIX).
> >>>>> This would be expected behavior. OpenWrt disables tr character classes
> >>>>> in BusyBox by default, see [1]:
> >>>>>
> >>>>> config BUSYBOX_DEFAULT_FEATURE_TR_CLASSES
> >>>>>           bool
> >>>>>           default n
> >>>>> config BUSYBOX_DEFAULT_FEATURE_TR_EQUIV
> >>>>>           bool
> >>>>>           default n
> >>>>>
> >>>>> I don't know what the size cost in the BusyBox binary is, but that
> >>>>> will likely be the deciding factor for such a change.
> >>>>>
> >>>>> 1:
> >>>>> https://git.openwrt.org/?p=openwrt/openwrt.git;a=blob;f=package/utils/busybox/Config-defaults.in
> >>>>>
> >>>>> Regards,
> >>>>> Magnus Kroken
> >>>> Hi Magnus,
> >>>>
> >>>> Thanks for confirming that so quickly.
> >>>>
> >>>> I obviously understand that space saving is essential to OpenWRT, but
> >>>> POSIX does require[1] that 'tr' support character classes:
> >>> awk '{print toupper($0)}' is an alternative.
> >> Yes, but this means that any script expecting tr to work correctly could
> >> explode, as tr silently ignores the character class and treats all the
> >> characters literally.
> > git grep upper | grep tr\ | wc -l
> > 3
> >
> > In the packages feed. All those results are things that run on the
> > host, not on OpenWrt.
> >
> > tr a-z A-Z works as an alternative and is used in many places.
> tr a-z A-Z is bad practice as it can behave unexpectedly in different
> locales; I've also heard tales of folks with Turkish locales having
> issues with '0-9' for example.
> Is a couple kb of space worth such a loss in portability (not to mention
> deviating heavily from POSIX)?
Patches welcome to replace usage of tr with awk.

I don't think anyone runs OpenWrt with any locale other than the default.
> >>>> :class:
> >>>>                 Represents all characters belonging to the defined character class, as defined by the current setting of the LC_CTYPE  locale  cate-
> >>>>                 gory. The following character class names shall be accepted when specified in string1:
> >>>>
> >>>>                   alnum    blank   digit   lower   punct   upper
> >>>>                   alpha    cntrl   graph   print   space   xdigit
> >>>>
> >>>>
> >>>> 1: https://www.unix.com/man-page/posix/1posix/tr/
> >>>>
> >>>>
> >>>> Regards,
> >>>> Jordan
> >>>>
> >>>>
> >>>> _______________________________________________
> >>>> openwrt-devel mailing list
> >>>> openwrt-devel at lists.openwrt.org
> >>>> https://lists.openwrt.org/mailman/listinfo/openwrt-devel
> >>> _______________________________________________
> >>> openwrt-devel mailing list
> >>> openwrt-devel at lists.openwrt.org
> >>> https://lists.openwrt.org/mailman/listinfo/openwrt-devel
>



More information about the openwrt-devel mailing list