Umlaut Handling
The handling umlaut is of special relevance to digital documents such as websites and their URLs. Umlauts can cause problems for Crawlability and display. Umlaut handling should therefore be an important part of marketing and SEO planning for umlauts.
Umlaut handling in URLs
There is a consensus in search engine optimization that umlauts should be avoided in URLs. This is to prevent search engines from misinterpreting a URL and either indexing it incorrectly or not at all. To read a URL with umlauts, it would first have to be “translated” into ASCII Code and it is this intermediate step, which can pose problems for crawlers. While Bots could not read umlaut URLs initially, it is technically possible today.
In order to display umlauts in URLs, the “Punycode” system is used. This converts umlauts into ASCII-compatible character strings. Each umlaut or Special Character is assigned to a specific number and letter code.
The following rules are used when converting URLs:
- If the URL contains only ASCII characters, it will not be changed
- If the URL contains base characters and special characters or umlauts, the base characters are retained and the umlauts get converted to Punycode and attached with hyphens
- If the URL contains only special characters, these are converted to code and arranged chronologically
If a domain name with special characters is displayed according to the IDNA standard, an “xn” is prepended. Here is an example:
Kombüse.de becomes xn-kombse-c6a.de in IDNA standard and Punycode. The disadvantages of umlauts in the domain name or URL are particularly noticeable when the scripts of a website cannot correctly handle IDN code sequences (regardless of the work of the search engine spiders).
Another way to convert umlauts is to use UTF-8.
Consistent umlaut handling using UTF-8
If you would like to avoid umlaut problems, use the following tips:
- Specify the Content-Type in the header as UTF-8
- Specify the charset in HTML pages:
<meta http-equiv="content-type" content="text/html; charset=utf-8" />
- Transfer form data to UTF-8
- Create character sets from databases in UTF-8
- Embedded data should be converted to UTF-8
URLs in UTF-8 notation look as cryptic to users as domain names in Punycode. Therefore, these URLs can be converted into readable and search engine-friendly URLs using mod_rewrite.
Umlauts in digital documents
If you want umlauts in the Content of your website displayed correctly by all browsers, you should ensure that UFT-8 is specified in the Document Type Definition for HTML. That way, the browser recognizes the character encoding and can correctly display the contents, including special characters and umlauts.