Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

'en_US.UTF-8' locale error in Termux #507

Open
matte22ladde opened this issue Sep 24, 2024 · 23 comments
Open

'en_US.UTF-8' locale error in Termux #507

matte22ladde opened this issue Sep 24, 2024 · 23 comments
Labels
compatibility External Problem/Bug Problems/Bugs of other projects

Comments

@matte22ladde
Copy link

matte22ladde commented Sep 24, 2024

ble version: 0.4.0-devel4+32f290d
Bash version: 5.2.32(1)-release (aarch64-unknown-linux-android)

ble.sh: The locale 'en_US.UTF-8' (LC_CTYPE) seems broken. Please check that the locale exists in the system.

@akinomyoga
Copy link
Owner

Please check that the locale exists in the system. What are the results of the following commands?

$ locale
$ locale -a | grep en_US

I started to check the locale in commit 537c650 since broken system locales turned out to cause problems.

@akinomyoga akinomyoga added External Problem/Bug Problems/Bugs of other projects question Question / Mis-usage and removed External Problem/Bug Problems/Bugs of other projects labels Sep 24, 2024
@matte22ladde
Copy link
Author

matte22ladde commented Sep 25, 2024

No command locale found, did you mean:
 Command locate in package mlocate

@akinomyoga
Copy link
Owner

No, I didn't mean locate. locale is one of the standard utilities required by POSIX.

@matte22ladde
Copy link
Author

pkg list-all | grep locale

WARNING: apt does not have a stable CLI interface. Use with caution in scripts.

[ble: exit 1]

@akinomyoga akinomyoga added compatibility External Problem/Bug Problems/Bugs of other projects and removed question Question / Mis-usage labels Sep 25, 2024
@akinomyoga
Copy link
Owner

akinomyoga commented Sep 25, 2024

If you want to search the package that provides the command locale, pkg list-all is not the right command. It only prints the names of packages. Since it is a part of the basic utilities, I don't think the package that contains locale would have the name locale. After some searching, pkg doesn't seem to provide a way to search for a file in packages that haven't been installed. Instead, it seems one can install and use apt-file. The correct way seems this:

$ pkg install apt-file
$ apt-file search /locale | grep 'locale$'

However, it turned out no package provides the command locale in the Termux repository. The above command only lists a file that is not a command.


I confirmed the problem in my Android using Termux. Surprisingly, it seems the C locale doesn't work in Termux while LANG=en_US.UTF-8 works. The following discussion says that Termux actually doesn't support C locale:

I found another thread that seems to imply that one can set up a locale in a Termux environment:

The first thread talked about dpkg-reconfigure and /etc/envrionment, but I don't find them in my Termux environment. The second thread seems to imply that those locales are only available with proot-distro, which doesn't seem to be the default package management of Termux.

ble.sh haven't been assuming the environments that don't support the most basic locale C, (though it assumes the opposite case where only the C locale is available). This implies that many places in the codebase can be potentially broken in Termux. I need to investigate the impact.

@akinomyoga
Copy link
Owner

I'll later investigate the impact and fix the relevant parts, which can be affected by the non-working C locale.

@NoNameWasDefined
Copy link

If I can help Termux doesn't include the locale package and doesn't have dpkg-reconfigure.
PRoot-Distro is just an utility to run Linux distributions without most Termux problems. It cannot help (even if ble.sh works under) because it uses another environment and it's slower (the OP solution). Isn't it possible to use the en_US.UTF-8 locale for Termux.

@akinomyoga
Copy link
Owner

If I can help Termux doesn't include the locale package and doesn't have dpkg-reconfigure. PRoot-Distro is just an utility to run Linux distributions without most Termux problems. It cannot help (even if ble.sh works under) because it uses another environment and it's slower (the OP solution).

Thanks for the information.

Isn't it possible to use the en_US.UTF-8 locale for Termux.

ble.sh basically uses the locale supplied by the user, i.e., en_US.UTF-8 in the case of Termux.

However, for some specific operations, ble.sh needs to use C for a specific locale category.

  • For example, when ble.sh processes the settings of Bash's builtin bind, it needs to use C for the LC_CTYPE category because Bash's builtin bind would handle binary data supplied in the arguments in general. Unless the binary data does not accidentally form a valid UTF-8 string data, the data would be corrupted.
  • There are also other Vim bindings that operate on the text in a binary-oriented way, which needs processing by LC_CTYPE=C.
  • In many places, the POSIX bracket expression (such as [a-z]) is used to parse the Bash syntax, process the prompt strings supplied by the user, etc. In this case, one needs to set C to the LC_COLLATE category. Otherwise, [a-z] also matches uppercase characters such as B depending on the system locale configuration. It may also match variants of letters like à, á, â, ã, ä, å, etc. depending on the system configuration. For example, you can read this StackOverflow question and the answer. I'm not sure whether the en_US.UTF-8 locale in Termux provides the same collation order as C.
  • There are still other places.

@akinomyoga
Copy link
Owner

I confirmed the problem in my Android using Termux. Surprisingly, it seems the C locale doesn't work in Termux while LANG=en_US.UTF-8 works. The following discussion says that Termux actually doesn't support C locale:

I've again searched in Termux organization and found the following comment from 2022:

The issue termux/termux-packages#5845 was made in 2020, so they might have updated the support for the C locale after that. However, it seems strange that the C locale still does not work apparently.

@NoNameWasDefined
Copy link

Image
There is my current issue. Tried with LANG=C(.UFT-8) source ... LANG=en_US.UFT-8 source ... LC_ALL=... source ... LC_CTYPES=... in ~/.bashrc

@akinomyoga
Copy link
Owner

akinomyoga commented Jan 21, 2025

So there is still an issue with LC_CTYPE=C in Termux.

The content of the error message "The locale 'en_US.UTF-8' (LC_CTYPE) seems broken" is actually wrong in Termux. The code to detect the current locale support (en_US.UTF-8 in the case of Termux) assumes that LC_CTYPE=C works as POSIX specifies. In Termux, this assumption is broken, so the locale-support detection is confused and wrongly report that "'en_US.UTF-8' is broken". Actually, en_US.UTF-8 works in Termux, but C doesn't work.

@akinomyoga
Copy link
Owner

Specifically, Termux produces "1" instead of the expected "3" with the following command, which is wrong.

$ (LC_CTYPE=C; a=$'\xE3\x81\x82'; echo "${#a}")
3

@akinomyoga
Copy link
Owner

The current status of the locale support by Termux is really unclear, so I raised a discussion at termux/termux-packages#23010.

@NoNameWasDefined
Copy link

I think I cannot help more. You know more that me in locales (not that I know a lot: Termux doesn't use glibc)

@NoNameWasDefined
Copy link

NoNameWasDefined commented Jan 26, 2025

I've got some strange news, in Termux I have setup GLIBC to do some operation:

grun -s

ble.sh: The locale 'en_US.UTF-8' (LC_CTYPE) seems broken. Please check that the locale exists in the system.
And after generating locales like in any GLIBC Linux distribution and setting locale

LANG=en_US.UTF-8 $0

Does not show any error from ble.sh

@akinomyoga
Copy link
Owner

akinomyoga commented Jan 26, 2025

And after generating locales like in any GLIBC Linux distribution

I'm not familiar with Termux. Could you explain it step by step? I seem to have been able to install a glibc-based system by pkg install glibc-repo and start a session by grun -s, but locale-gen en_US.UTF-8 doesn't seem to work, and I don't have permission to edit /etc/locale.gen. The package sudo is not available, and su doesn't seem to work either although the binary seems to exist. I don't find any langpack packages either.

@NoNameWasDefined
Copy link

NoNameWasDefined commented Jan 26, 2025

In Termux you can consider that the root is in /data/data/com.termux/files/ because Android uses / and Termux doesn't use chroot or proot for faster operations.
When using GLIBC packages you can change $PREFIX from /data/data/com.termux/files/usr/ to /data/data/com.termux/files/usr/glibc/ so /data/data/com.termux/files/usr/etc/ is used and glibc should load after /data/data/com.termux/files/usr/glibc/etc/
So /etc/locale.gen is in /data/data/com.termux/files/usr/glibc/etc/locale.gen.

@NoNameWasDefined
Copy link

su command is just an utility to start su on Android depending on the location but to get su the device needs to be "rooted". Almost nothing requires su in Termux, just use proot instead of chroot and mount, etc.
There is many sudo in Termux, the one in tsu package start only a program with a different PATH (almost always $PATH:/system/bin:/system/xbin/:... (/system is like / for Linux but Termux doesn't gave permissions to use this partition)) like the Android shell (nothing incredible) as root.

@akinomyoga
Copy link
Owner

akinomyoga commented Jan 26, 2025

OK, Thanks. I could generate en_US.UTF-8 by setting PREFIX=$PREFIX/glibc, editing $PREFIX/etc/locale.gen, and running locale-gen. Then what would be the next step? I tried to run bash with LANG=en_US.UTF-8 $0, LANG=en_US.UTF-8 bash, LANG=en_US.utf8 $0, and LANG=en_US.utf8 bash outside/inside grun -s, but nothing worked.

@akinomyoga
Copy link
Owner

Sorry, I actually generated es_US.UTF-8. It's quite confusing. I commented in es_US.UTF-8 and commented out en_US.UTF-8 and generated the locale again. It now works with bash inside grun -s (i.e., $PREFIX/glibc/bin/bash) but not with bash of the default Termux (i.e., $PREFIX/usr/bin/bash).

@NoNameWasDefined
Copy link

Normally $PREFIX/glibc/bin executables take over $PREFIX/bin ones.

@akinomyoga
Copy link
Owner

akinomyoga commented Jan 26, 2025

Normally $PREFIX/glibc/bin executables take over $PREFIX/bin ones.

I only have source /path/to/ble.sh in my ~/.bashrc, but $PREFIX/glibc/bin doesn't seem to take over $PREFIX/bin outside grun -s. I rebooted the phone, but the situation didn't change. $PATH only contains /data/data/com.termux/files/usr/bin.

@NoNameWasDefined
Copy link

Android uses Bionic libc, GLIBC packages are non native to Android, so by default each packages uses Bionic. We have to enable GLIBC to make locales working, what does grun -s that start a child process with another PREFIX while keeping environment variables to keep base packages accessible. I don't think using GLIBC packages without configuring an environment is a good idea.
I think that the final solution would be to start ble.sh with some commands ran manually with GLIBC. (NOTE : Termux doesn't have any packages that uses locales, LANG=en_US.UTF-8 is fake for compatibility.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
compatibility External Problem/Bug Problems/Bugs of other projects
Projects
None yet
Development

No branches or pull requests

3 participants