Transliteration

Transliterate from Unicode to US-ASCII

Transliteration is the process of translating individual non-US-ASCII characters into ASCII characters, which specifically does not transform non-printable and punctuation characters in any way. This process will always be both inexact and language-dependent. For instance, the character Ö (O with an umlaut) is commonly transliterated as O, but in German text, the convention would be to transliterate it as Oe or OE, depending on the context (beginning of a capitalized word, or in an all-capital letter context).

The Drupal default transliteration process transliterates text character by character using a database of generic character transliterations and language-specific overrides. Character context (such as all-capitals vs. initial capital letter only) is not taken into account, and in transliterations of capital letters that result in two or more letters, by convention only the first is capitalized in the Drupal transliteration result. Also, only Unicode characters of 4 bytes or less can be transliterated in the base system; language-specific overrides can be made for longer Unicode characters. So, the process has limitations; however, since the reason for transliteration is typically to create machine names or file names, this should not really be a problem. After transliteration, other transformation or validation may be necessary, such as converting spaces to another character, removing non-printable characters, lower-casing, etc.

Here is a code snippet to transliterate some text:

// Use the current default interface language.
$langcode = language(\Drupal\Core\Language\Language::TYPE_INTERFACE)->langcode;

// Instantiate the transliteration class.
$trans = drupal_container()
  ->get('transliteration');

// Use this to transliterate some text.
$transformed = $trans
  ->transliterate($string, $langcode);

Drupal Core provides the generic transliteration character tables and overrides for a few common languages; modules can implement hook_transliteration_overrides_alter() to provide further language-specific overrides (including providing transliteration for Unicode characters that are longer than 4 bytes). Modules can also completely override the transliteration classes in \Drupal\Core\CoreBundle.

File

drupal/core/modules/system/language.api.php, line 161
Hooks provided by the base system for language support.

Functions

Namesort descending Location Description
hook_transliteration_overrides_alter drupal/core/modules/system/language.api.php Provide language-specific overrides for transliteration.

Classes

Namesort descending Location Description
PHPTransliteration drupal/core/lib/Drupal/Core/Transliteration/PHPTransliteration.php Enhances PHPTransliteration with an alter hook.

Interfaces

Namesort descending Location Description
TransliterationInterface drupal/core/lib/Drupal/Component/Transliteration/TransliterationInterface.php Defines an interface for classes providing transliteration.