lower() function does not encode Sigma correctly #24229

Jason-Waldrop · 2024-11-22T14:15:13Z

https://en.wikipedia.org/wiki/Sigma

Sigma: uppercase Σ, lowercase σ, lowercase in word-final position ς;

Trino does currently convert each "Σ" into a "σ" char.

select
	a,
	lower(a),
	lower(a) = 'νεστορας βλσχος i.k.e.',   -- will is false -> should be true
	LOWER(regexp_replace(a, 'Σ\b', 'ς')),
	LOWER(regexp_replace(a, 'Σ\b', 'ς')) = 'νεστορας βλσχος i.k.e.'  -- will be true
from (values('ΝΕΣΤΟΡΑΣ ΒΛΣΧΟΣ I.K.E.')) as t(a)

this can be used as a quickfix:

LOWER(regexp_replace(lower_me_col, 'Σ\b', 'ς'))

The text was updated successfully, but these errors were encountered:

wendigo · 2024-11-22T19:13:06Z

@martint can you confirm that this is an expected behaviour?

Converts slice to lower case code point by code point. This method does not perform perform locale-sensitive, context-sensitive, or one-to-many mappings required for some languages. Specifically, this will return incorrect results for Lithuanian, Turkish, and Azeri.
Note: Invalid UTF-8 sequences are copied directly to the output.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

lower() function does not encode Sigma correctly #24229

lower() function does not encode Sigma correctly #24229

Jason-Waldrop commented Nov 22, 2024

wendigo commented Nov 22, 2024

lower() function does not encode Sigma correctly #24229

lower() function does not encode Sigma correctly #24229

Comments

Jason-Waldrop commented Nov 22, 2024

wendigo commented Nov 22, 2024