'tr' character class support?

Jordan Geoghegan jordan at geoghegan.ca
Sat Jul 11 00:21:32 EDT 2020


Woops, accidentally mangled whitespace in my last diff. Fix below

On 2020-07-10 21:13, Jordan Geoghegan wrote:
> Please find patch to enable character classes in 'tr' below.
>
> On 2020-07-10 20:33, Rosen Penev wrote:
>> On Fri, Jul 10, 2020 at 5:15 PM Jordan Geoghegan 
>> <jordan at geoghegan.ca> wrote:
>>>
>>>
>>> On 2020-07-10 16:59, Rosen Penev wrote:
>>>> On Fri, Jul 10, 2020 at 4:17 PM Jordan Geoghegan 
>>>> <jordan at geoghegan.ca> wrote:
>>>>>
>>>>> On 2020-07-10 14:54, Rosen Penev wrote:
>>>>>> On Fri, Jul 10, 2020 at 2:29 PM Jordan Geoghegan 
>>>>>> <jordan at geoghegan.ca> wrote:
>>>>>>> On 2020-07-10 14:15, Magnus Kroken wrote:
>>>>>>>> Hi Jordan
>>>>>>>>
>>>>>>>> On 10.07.2020 22:45, Jordan Geoghegan wrote:
>>>>>>>>> Hey folks,
>>>>>>>>>
>>>>>>>>> Does the 'tr' utility support character classes in OpenWRT? I was
>>>>>>>>> playing around with an OpenWRT x86_64 VM and I noticed that 'tr'
>>>>>>>>> doesn't seem to support character classes.
>>>>>>>>> The command " echo HELLO | tr '[:upper:]' '[:lower:]' "  does not
>>>>>>>>> convert to the text to lowercase as it should (and as required by
>>>>>>>>> POSIX).
>>>>>>>> This would be expected behavior. OpenWrt disables tr character 
>>>>>>>> classes
>>>>>>>> in BusyBox by default, see [1]:
>>>>>>>>
>>>>>>>> config BUSYBOX_DEFAULT_FEATURE_TR_CLASSES
>>>>>>>>            bool
>>>>>>>>            default n
>>>>>>>> config BUSYBOX_DEFAULT_FEATURE_TR_EQUIV
>>>>>>>>            bool
>>>>>>>>            default n
>>>>>>>>
>>>>>>>> I don't know what the size cost in the BusyBox binary is, but that
>>>>>>>> will likely be the deciding factor for such a change.
>>>>>>>>
>>>>>>>> 1:
>>>>>>>> https://git.openwrt.org/?p=openwrt/openwrt.git;a=blob;f=package/utils/busybox/Config-defaults.in 
>>>>>>>>
>>>>>>>>
>>>>>>>> Regards,
>>>>>>>> Magnus Kroken
>>>>>>> Hi Magnus,
>>>>>>>
>>>>>>> Thanks for confirming that so quickly.
>>>>>>>
>>>>>>> I obviously understand that space saving is essential to 
>>>>>>> OpenWRT, but
>>>>>>> POSIX does require[1] that 'tr' support character classes:
>>>>>> awk '{print toupper($0)}' is an alternative.
>>>>> Yes, but this means that any script expecting tr to work correctly 
>>>>> could
>>>>> explode, as tr silently ignores the character class and treats all 
>>>>> the
>>>>> characters literally.
>>>> git grep upper | grep tr\ | wc -l
>>>> 3
>>>>
>>>> In the packages feed. All those results are things that run on the
>>>> host, not on OpenWrt.
>>>>
>>>> tr a-z A-Z works as an alternative and is used in many places.
>>> tr a-z A-Z is bad practice as it can behave unexpectedly in different
>>> locales; I've also heard tales of folks with Turkish locales having
>>> issues with '0-9' for example.
>>> Is a couple kb of space worth such a loss in portability (not to 
>>> mention
>>> deviating heavily from POSIX)?
>> Patches welcome to replace usage of tr with awk.
>>
>> I don't think anyone runs OpenWrt with any locale other than the 
>> default.
> I don't think it makes sense to replace usage of 'tr' with awk, it 
> makes more sense to just make tr work correctly.  As requested, here's 
> a patch below
>>>>>>> :class:
>>>>>>>                  Represents all characters belonging to the 
>>>>>>> defined character class, as defined by the current setting of 
>>>>>>> the LC_CTYPE  locale  cate-
>>>>>>>                  gory. The following character class names shall 
>>>>>>> be accepted when specified in string1:
>>>>>>>
>>>>>>>                    alnum    blank   digit   lower punct   upper
>>>>>>>                    alpha    cntrl   graph   print space   xdigit
>>>>>>>
>>>>>>>
>>>>>>> 1: https://www.unix.com/man-page/posix/1posix/tr/
>>>>>>>
>>>>>>>
>>>>>>> Regards,
>>>>>>> Jordan
>>>>>>>
>>>>>>>
>

--- Config-defaults.in.orig     Fri Jul 10 21:03:57 2020
+++ Config-defaults.in  Fri Jul 10 21:03:22 2020
@@ -837,7 +837,7 @@
         default y
  config BUSYBOX_DEFAULT_FEATURE_TR_CLASSES
         bool
-       default n
+       default y
  config BUSYBOX_DEFAULT_FEATURE_TR_EQUIV
         bool
         default n




More information about the openwrt-devel mailing list