'tr' character class support?

Jordan Geoghegan jordan at geoghegan.ca
Fri Jul 10 20:15:40 EDT 2020



On 2020-07-10 16:59, Rosen Penev wrote:
> On Fri, Jul 10, 2020 at 4:17 PM Jordan Geoghegan <jordan at geoghegan.ca> wrote:
>>
>>
>> On 2020-07-10 14:54, Rosen Penev wrote:
>>> On Fri, Jul 10, 2020 at 2:29 PM Jordan Geoghegan <jordan at geoghegan.ca> wrote:
>>>>
>>>> On 2020-07-10 14:15, Magnus Kroken wrote:
>>>>> Hi Jordan
>>>>>
>>>>> On 10.07.2020 22:45, Jordan Geoghegan wrote:
>>>>>> Hey folks,
>>>>>>
>>>>>> Does the 'tr' utility support character classes in OpenWRT? I was
>>>>>> playing around with an OpenWRT x86_64 VM and I noticed that 'tr'
>>>>>> doesn't seem to support character classes.
>>>>>> The command " echo HELLO | tr '[:upper:]' '[:lower:]' "  does not
>>>>>> convert to the text to lowercase as it should (and as required by
>>>>>> POSIX).
>>>>> This would be expected behavior. OpenWrt disables tr character classes
>>>>> in BusyBox by default, see [1]:
>>>>>
>>>>> config BUSYBOX_DEFAULT_FEATURE_TR_CLASSES
>>>>>           bool
>>>>>           default n
>>>>> config BUSYBOX_DEFAULT_FEATURE_TR_EQUIV
>>>>>           bool
>>>>>           default n
>>>>>
>>>>> I don't know what the size cost in the BusyBox binary is, but that
>>>>> will likely be the deciding factor for such a change.
>>>>>
>>>>> 1:
>>>>> https://git.openwrt.org/?p=openwrt/openwrt.git;a=blob;f=package/utils/busybox/Config-defaults.in
>>>>>
>>>>> Regards,
>>>>> Magnus Kroken
>>>> Hi Magnus,
>>>>
>>>> Thanks for confirming that so quickly.
>>>>
>>>> I obviously understand that space saving is essential to OpenWRT, but
>>>> POSIX does require[1] that 'tr' support character classes:
>>> awk '{print toupper($0)}' is an alternative.
>> Yes, but this means that any script expecting tr to work correctly could
>> explode, as tr silently ignores the character class and treats all the
>> characters literally.
> git grep upper | grep tr\ | wc -l
> 3
>
> In the packages feed. All those results are things that run on the
> host, not on OpenWrt.
>
> tr a-z A-Z works as an alternative and is used in many places.
tr a-z A-Z is bad practice as it can behave unexpectedly in different 
locales; I've also heard tales of folks with Turkish locales having 
issues with '0-9' for example.
Is a couple kb of space worth such a loss in portability (not to mention 
deviating heavily from POSIX)?
>>>> :class:
>>>>                 Represents all characters belonging to the defined character class, as defined by the current setting of the LC_CTYPE  locale  cate-
>>>>                 gory. The following character class names shall be accepted when specified in string1:
>>>>
>>>>                   alnum    blank   digit   lower   punct   upper
>>>>                   alpha    cntrl   graph   print   space   xdigit
>>>>
>>>>
>>>> 1: https://www.unix.com/man-page/posix/1posix/tr/
>>>>
>>>>
>>>> Regards,
>>>> Jordan
>>>>
>>>>
>>>> _______________________________________________
>>>> openwrt-devel mailing list
>>>> openwrt-devel at lists.openwrt.org
>>>> https://lists.openwrt.org/mailman/listinfo/openwrt-devel
>>> _______________________________________________
>>> openwrt-devel mailing list
>>> openwrt-devel at lists.openwrt.org
>>> https://lists.openwrt.org/mailman/listinfo/openwrt-devel




More information about the openwrt-devel mailing list