Provides functionality to transliterate all UTF-8 characters to ASCII in pure PostgreSQL.
Example:
SELECT system.transliterate_to_ascii('Příliš žluťoučký kůň úpěl ďábelské ódy'); -- will print "Prilis zlutoucky kun upel dabelske ody"
SELECT system.transliterate_to_ascii('stößt'); -- will print "stosst"
SELECT system.transliterate_to_ascii('鍖椾喊'); -- will print "Chen Zhan Han"
SELECT system.transliterate_to_ascii('Питер. Лето. Любов'); -- will print "Piter. Leto. Liubov"
SELECT system.transliterate_to_ascii('10°C'); -- will print "10degC"
First, import dist/transliterate-structure.sql
file, which create schema system
with table transliterate_to_ascii_rules
and function transliterate_to_ascii
.
Second, import dist/transliterate-data.sql
file with rules to transliterate all UTF-8 characters to ASCII.
Optional third, import dist/transliterate-webalize.sql
file with DB function webalize()
that provide converting string for nice URLs
psql dbname < dist/transliterate-structure.sql
psql dbname < dist/transliterate-data.sql
psql dbname < dist/transliterate-webalize.sql
Now, you can use function system.transliterate_to_ascii
in your SQL queries or in PL/pgSQL functions, and you will always get pure ASCII string.
-- in query
SELECT system.transliterate_to_ascii('Příliš žluťoučký kůň úpěl ďábelské ódy');
-- in Pl/pgSQL
CREATE FUNCTION lower_unaccent(in_string character varying)
RETURNS character varying AS
$BODY$
RETURN lower(system.transliterate_to_ascii(in_string));
$BODY$
LANGUAGE plpgsql IMMUTABLE;
In system.transliterate_to_ascii_rules
table are all transliterations form UTF-8
chars to ASCII
chars. You can update existing rules, if you're not satisfied with the original one.
Existing rules are taken from the great Perl library Text-Unidecode by Sean M. Burke. You can use PHP script bin/build-sql
to generate actual rules dist/transliterate-data.sql
from library source.
Copy all lib/Text/Unidecode/*.pm
files to data/Unidecode
and run bin/build-sql
. You will get new rules definition and log in data
directory.
Database function system.transliterate_to_ascii
is written in PL/pgSQL
and just quick replace all UTF-8 characters in a given string by rules from system.transliterate_to_ascii_rules
table.
This mechanism is great for example for creating URL slugs from strings. Just import into DB also file dist/transliterate-webalize.sql
.
Then create URL slugs like this:
SELECT system.webalize('Příliš žluťoučký kůň úpěl ďábelské ódy'); -- will print "prilis-zlutoucky-kun-upel-dabelske-ody"
SELECT system.webalize('stößt'); -- will print "stosst"
SELECT system.webalize('鍖椾喊'); -- will print "chen-zhan-han"
SELECT system.webalize('Питер. Лето. Любов'); -- will print "piter-leto-liubov"
SELECT system.webalize('10°C'); -- will print "10-c"
SELECT system.webalize('@utonomous'); -- will print "autonomous"
The easiest way to get Perl library source is in Debian like Linux system with cpan
command.
cpan Text::Unidecode
Sources are placed in /usr/share/perl5/Text
directory. Just copy *.pm
from /usr/share/perl5/Text/Unidecode
directory to data/Unidecode
and run bin/build-sql
(you will need PHP > 8.0 installed on the system).