charsearch

Small Linux app to search for Unicode characters by name and copy them to the clipboard, using a shell script with Zenity.

System Requirements

First, it's assumed that Zenity is already installed. Zenity is a program that allows GTK+ dialog boxes to be used from shell scripts.

It's also assumed that you're using the usual GNU base system that most Linux systems have. Alternatives like BSD or Busybox have not been tested. While the shell syntax itself is totally standard (so using Dash instead of Bash on Debian/Ubuntu is supported), I'm not sure whether or not I might still be using GNU-specific features of grep and/or sed.

It's also assumed that you're using the X11 graphics stack. Wayland has not been tested; it may or may not work depending on how good the clipboard integration with Xwayland is.

Installation

The shell script, 'charsearch', and the data file, 'codepoint_sorted_by_name.txt', must both be installed into /usr/local/bin. The script needs its executable bit set. So you would run something like

$ sudo cp --target-directory=/usr/local/bin charsearch codepoint_sorted_by_name.txt
$ sudo chmod +x /usr/local/bin/charsearch

After this, you might want to set up a keyboard shortcut in your window manager or desktop environment to bring it up quickly (I prefer WinKey + U). The exact method to accomplish this may vary; as an Xfce user I used Xfconf Settings Editor.

Usage

When run, the charsearch script will produce a dialog box asking for your search terms. Enter them and confirm with the OK button or Enter/Return key, and a list of matches will come up. Double clicking one will copy the character to the X11 clipboard and exit. It can then be pasted into a text entry in a web browser or anything else that supports pasting from clipboard.

Background Info

The data file is current as of Unicode 12.1.

Only the "main" name is used, not any aliases. Characters without non-alias names are omitted. Certain other characters with names not considered of interest for my purposes were also omitted.

It was produced with the following Shell pipeline command:

grep '<char' <ucd.all.flat.xml | sed '/na=""/d' | sed 's/^\s*<char cp="//' | sed 's/"\sage=.*\sna="/ /' | sed 's/"\sJSN=.*$//'| sed '/#/d'| sed '/VARIATION\sSELECTOR/d' | sed '/ACTIVATE/d' | sort -dk 2 | sed 's/^/0x/' | ./hex-to-unicode-filter.py  > codepoint_sorted_by_name.txt

I'm sure there's a more elegant way to do that, but it worked, anyway.

The 'ucd.all.flat.xml' file mentioned in the command can be found here in the 'ucd.all.flat.zip' archive. It is the property of the Unicode Consortium, and its license is listed here

The hex-to-unicode-filter.py command can be found here. It's not using the same license as the rest of this project, you can consider it public domain if you want it.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
LICENSE		LICENSE
README.md		README.md
charsearch		charsearch
codepoint_sorted_by_name.txt		codepoint_sorted_by_name.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

charsearch

System Requirements

Installation

Usage

Background Info

About

Releases

Packages

Languages

License

nlburgin/charsearch

Folders and files

Latest commit

History

Repository files navigation

charsearch

System Requirements

Installation

Usage

Background Info

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages