'tr' character class support?

Jordan Geoghegan jordan at geoghegan.ca
Sat Jul 11 00:13:44 EDT 2020


Please find patch to enable character classes in 'tr' below.

On 2020-07-10 20:33, Rosen Penev wrote:
> On Fri, Jul 10, 2020 at 5:15 PM Jordan Geoghegan <jordan at geoghegan.ca> wrote:
>>
>>
>> On 2020-07-10 16:59, Rosen Penev wrote:
>>> On Fri, Jul 10, 2020 at 4:17 PM Jordan Geoghegan <jordan at geoghegan.ca> wrote:
>>>>
>>>> On 2020-07-10 14:54, Rosen Penev wrote:
>>>>> On Fri, Jul 10, 2020 at 2:29 PM Jordan Geoghegan <jordan at geoghegan.ca> wrote:
>>>>>> On 2020-07-10 14:15, Magnus Kroken wrote:
>>>>>>> Hi Jordan
>>>>>>>
>>>>>>> On 10.07.2020 22:45, Jordan Geoghegan wrote:
>>>>>>>> Hey folks,
>>>>>>>>
>>>>>>>> Does the 'tr' utility support character classes in OpenWRT? I was
>>>>>>>> playing around with an OpenWRT x86_64 VM and I noticed that 'tr'
>>>>>>>> doesn't seem to support character classes.
>>>>>>>> The command " echo HELLO | tr '[:upper:]' '[:lower:]' "  does not
>>>>>>>> convert to the text to lowercase as it should (and as required by
>>>>>>>> POSIX).
>>>>>>> This would be expected behavior. OpenWrt disables tr character classes
>>>>>>> in BusyBox by default, see [1]:
>>>>>>>
>>>>>>> config BUSYBOX_DEFAULT_FEATURE_TR_CLASSES
>>>>>>>            bool
>>>>>>>            default n
>>>>>>> config BUSYBOX_DEFAULT_FEATURE_TR_EQUIV
>>>>>>>            bool
>>>>>>>            default n
>>>>>>>
>>>>>>> I don't know what the size cost in the BusyBox binary is, but that
>>>>>>> will likely be the deciding factor for such a change.
>>>>>>>
>>>>>>> 1:
>>>>>>> https://git.openwrt.org/?p=openwrt/openwrt.git;a=blob;f=package/utils/busybox/Config-defaults.in
>>>>>>>
>>>>>>> Regards,
>>>>>>> Magnus Kroken
>>>>>> Hi Magnus,
>>>>>>
>>>>>> Thanks for confirming that so quickly.
>>>>>>
>>>>>> I obviously understand that space saving is essential to OpenWRT, but
>>>>>> POSIX does require[1] that 'tr' support character classes:
>>>>> awk '{print toupper($0)}' is an alternative.
>>>> Yes, but this means that any script expecting tr to work correctly could
>>>> explode, as tr silently ignores the character class and treats all the
>>>> characters literally.
>>> git grep upper | grep tr\ | wc -l
>>> 3
>>>
>>> In the packages feed. All those results are things that run on the
>>> host, not on OpenWrt.
>>>
>>> tr a-z A-Z works as an alternative and is used in many places.
>> tr a-z A-Z is bad practice as it can behave unexpectedly in different
>> locales; I've also heard tales of folks with Turkish locales having
>> issues with '0-9' for example.
>> Is a couple kb of space worth such a loss in portability (not to mention
>> deviating heavily from POSIX)?
> Patches welcome to replace usage of tr with awk.
>
> I don't think anyone runs OpenWrt with any locale other than the default.
I don't think it makes sense to replace usage of 'tr' with awk, it makes 
more sense to just make tr work correctly.  As requested, here's a patch 
below
>>>>>> :class:
>>>>>>                  Represents all characters belonging to the defined character class, as defined by the current setting of the LC_CTYPE  locale  cate-
>>>>>>                  gory. The following character class names shall be accepted when specified in string1:
>>>>>>
>>>>>>                    alnum    blank   digit   lower   punct   upper
>>>>>>                    alpha    cntrl   graph   print   space   xdigit
>>>>>>
>>>>>>
>>>>>> 1: https://www.unix.com/man-page/posix/1posix/tr/
>>>>>>
>>>>>>
>>>>>> Regards,
>>>>>> Jordan
>>>>>>
>>>>>>

--- Config-defaults.in.orig     Fri Jul 10 21:03:57 2020
+++ Config-defaults.in  Fri Jul 10 21:03:22 2020
@@ -837,7 +837,7 @@
         default y
  config BUSYBOX_DEFAULT_FEATURE_TR_CLASSES
         bool
-       default n
+      default y
  config BUSYBOX_DEFAULT_FEATURE_TR_EQUIV
         bool
         default n



More information about the openwrt-devel mailing list