Skip to content
heiglandreas edited this page Dec 1, 2011 · 2 revisions

h1. Hyphenate texts with PHP

Hyphenation is something not widely used on the internet.

But sometimes it is something you need to do in Browser-based applications.

Just think of a PDF-File created on the fly by an application and the text is hyphenated in very strange ways or not at all. All just because of an algorithm that might hyphenate english texts in one way or an other but certainly not german, french or other texts.

But it is possible with a bit of LaTeX (don’t worry, you need no kowledge of that whatsoever)

A long time ago Franklin Marc Liang has writen a thesis about Word Hyphenation by Computers and this algorithm is the one I have adapted to PHP.

I would not have stumbled over that without the code from Mathias Nater[email protected] who adapted the mentioned algorithm for JavaScript.

So here you can get a Hyphenation-Algorithm for PHP based on the TeX-Hyphenation files.

Basicallly the hyphenation works as shown in the example.

include_once 'Org/Heigl/Hyphenator.php';
// Create an Instance for the locale of your choice.
// Note that a hyphenation-file has to be present
// in the folder /Org/Heigl/Hyphenator/parsedFiles/ for that locale!
$hyphenator = Org_Heigl_Hyphenator::getInstance ( 'de_DE' );
// What character shall be used as Hyphenation-Character.
// This defaults to ASCII 173.
$hyphenator->setHyphen ( '-' )
// How many Characters have to stay to the right of the
// hyphenation character.
           ->setRightMin ( 2 )
// What characters are trated in a special way.
           ->setSpecialChars ( 'äüöß' ); 
 
$string = 'This is the String you want to be hyphenated'; 
 
$hyphenatedString = $hyphenator->hyphenate ( $string );

Alternatively you can simply use

$hyphenatedString = Org_Heigl_Hyphenator::parse ( $string );

CAVEAT: Org_Heigl_Hyphenator currently is not UTF-8-safe!! We are currently working on that!

Clone this wiki locally