'tr' character class support?
Jordan Geoghegan
jordan at geoghegan.ca
Sat Jul 11 00:21:32 EDT 2020
Woops, accidentally mangled whitespace in my last diff. Fix below
On 2020-07-10 21:13, Jordan Geoghegan wrote:
> Please find patch to enable character classes in 'tr' below.
>
> On 2020-07-10 20:33, Rosen Penev wrote:
>> On Fri, Jul 10, 2020 at 5:15 PM Jordan Geoghegan
>> <jordan at geoghegan.ca> wrote:
>>>
>>>
>>> On 2020-07-10 16:59, Rosen Penev wrote:
>>>> On Fri, Jul 10, 2020 at 4:17 PM Jordan Geoghegan
>>>> <jordan at geoghegan.ca> wrote:
>>>>>
>>>>> On 2020-07-10 14:54, Rosen Penev wrote:
>>>>>> On Fri, Jul 10, 2020 at 2:29 PM Jordan Geoghegan
>>>>>> <jordan at geoghegan.ca> wrote:
>>>>>>> On 2020-07-10 14:15, Magnus Kroken wrote:
>>>>>>>> Hi Jordan
>>>>>>>>
>>>>>>>> On 10.07.2020 22:45, Jordan Geoghegan wrote:
>>>>>>>>> Hey folks,
>>>>>>>>>
>>>>>>>>> Does the 'tr' utility support character classes in OpenWRT? I was
>>>>>>>>> playing around with an OpenWRT x86_64 VM and I noticed that 'tr'
>>>>>>>>> doesn't seem to support character classes.
>>>>>>>>> The command " echo HELLO | tr '[:upper:]' '[:lower:]' " does not
>>>>>>>>> convert to the text to lowercase as it should (and as required by
>>>>>>>>> POSIX).
>>>>>>>> This would be expected behavior. OpenWrt disables tr character
>>>>>>>> classes
>>>>>>>> in BusyBox by default, see [1]:
>>>>>>>>
>>>>>>>> config BUSYBOX_DEFAULT_FEATURE_TR_CLASSES
>>>>>>>> bool
>>>>>>>> default n
>>>>>>>> config BUSYBOX_DEFAULT_FEATURE_TR_EQUIV
>>>>>>>> bool
>>>>>>>> default n
>>>>>>>>
>>>>>>>> I don't know what the size cost in the BusyBox binary is, but that
>>>>>>>> will likely be the deciding factor for such a change.
>>>>>>>>
>>>>>>>> 1:
>>>>>>>> https://git.openwrt.org/?p=openwrt/openwrt.git;a=blob;f=package/utils/busybox/Config-defaults.in
>>>>>>>>
>>>>>>>>
>>>>>>>> Regards,
>>>>>>>> Magnus Kroken
>>>>>>> Hi Magnus,
>>>>>>>
>>>>>>> Thanks for confirming that so quickly.
>>>>>>>
>>>>>>> I obviously understand that space saving is essential to
>>>>>>> OpenWRT, but
>>>>>>> POSIX does require[1] that 'tr' support character classes:
>>>>>> awk '{print toupper($0)}' is an alternative.
>>>>> Yes, but this means that any script expecting tr to work correctly
>>>>> could
>>>>> explode, as tr silently ignores the character class and treats all
>>>>> the
>>>>> characters literally.
>>>> git grep upper | grep tr\ | wc -l
>>>> 3
>>>>
>>>> In the packages feed. All those results are things that run on the
>>>> host, not on OpenWrt.
>>>>
>>>> tr a-z A-Z works as an alternative and is used in many places.
>>> tr a-z A-Z is bad practice as it can behave unexpectedly in different
>>> locales; I've also heard tales of folks with Turkish locales having
>>> issues with '0-9' for example.
>>> Is a couple kb of space worth such a loss in portability (not to
>>> mention
>>> deviating heavily from POSIX)?
>> Patches welcome to replace usage of tr with awk.
>>
>> I don't think anyone runs OpenWrt with any locale other than the
>> default.
> I don't think it makes sense to replace usage of 'tr' with awk, it
> makes more sense to just make tr work correctly. As requested, here's
> a patch below
>>>>>>> :class:
>>>>>>> Represents all characters belonging to the
>>>>>>> defined character class, as defined by the current setting of
>>>>>>> the LC_CTYPE locale cate-
>>>>>>> gory. The following character class names shall
>>>>>>> be accepted when specified in string1:
>>>>>>>
>>>>>>> alnum blank digit lower punct upper
>>>>>>> alpha cntrl graph print space xdigit
>>>>>>>
>>>>>>>
>>>>>>> 1: https://www.unix.com/man-page/posix/1posix/tr/
>>>>>>>
>>>>>>>
>>>>>>> Regards,
>>>>>>> Jordan
>>>>>>>
>>>>>>>
>
--- Config-defaults.in.orig Fri Jul 10 21:03:57 2020
+++ Config-defaults.in Fri Jul 10 21:03:22 2020
@@ -837,7 +837,7 @@
default y
config BUSYBOX_DEFAULT_FEATURE_TR_CLASSES
bool
- default n
+ default y
config BUSYBOX_DEFAULT_FEATURE_TR_EQUIV
bool
default n
More information about the openwrt-devel
mailing list