URL Encoding


URL Encoding is a method allowing Browsers or Servers to interpret special characters or invalid characters in a URL. The corresponding URL is transferred to ASCII Code. It is therefore recommended to only use ASCII characters for URLs. The standard for URL structures is defined in RFC3986.

Background

Data transfer on the Internet has been based on the American Standard Code for Information Interchange, abbreviated ASCII. The available characters are based on an English typewriter keyboard and include both the Latin alphabet in uppercase and lowercase as well as Arabic numerals and some punctuation marks. ASCII characters were originally assigned to a 7-bit pattern. Today, an 8-bit pattern is used. That way, all possible characters and encodings can be represented since 28 possible combinations can be created.

URL Encoding is based on ASCII and provides solutions for spaces or other special characters in URLs. These problems can occur primarily with automatically generated URLs, for example, when product or article titles are converted into URLs. Encoding to URLs is always initiated with a %.

Example

A space in the URL is usually interpreted at the end of a URL. A space is found in the middle of the URL (for example, www.example.com/new site.html) will cause an error because browsers cannot resolve the URL. If users request such a URL, they may get a 404 error code. The URL encoding replaces the space with an ASCII character, in this case 20 hexadecimal (%20)

Http://www.example.com/neue%20seite.html

Invalid characters

There is a risk that these characters will not be interpreted correctly. It is recommended to encode the following characters in any case:

„ <  > # % { } \ | ^ [ ] ` and spaces

Reserved characters

The following characters are reserved and have a certain meaning in the data path. They cannot always be easily encoded. This includes:

! # $ % & ' ( ) * + , / : ; = ? @ [ ]

For example, the # in a URL denotes a jump mark within a website. The & sign marks a query string and separates individual parameters from the URL, while the equals sign (=) specifies the value of a parameter.

Non-reserved characters

These characters are not reserved and have no predefined meaning for the URL. The non-reserved characters include:

Letters [A-Z, a-z], digits [0-9] and - _. ~

Encoding tools

There are many tools available on the Internet that can quickly and easily convert an invalid URL to a valid one. Manual URL encoding is still feasible for small websites. But in the case of large web projects, webmasters and SEOs should take care to encode URLs in advance in such a way that they can be easily interpreted by browsers and servers.

Relevance to SEO

URL encoding is important for users and servers to be able to correctly interpret and retrieve URLs. Incorrect URLs can result in a high number of error codes. Each error code, in turn, can be interpreted by search engines as poor maintenance of the website. Users themselves send negative signals to Google and other search engines when they quickly bounce from error pages caused by URLs broken with spaces. It is important, in the case of URL encoding, to ensure that on the one hand, no parameter characters or other reserved characters are encoded. On the other hand, converting to SEF URLs can result in double-coding and thus pose problems when retrieving URLs. To avoid problems with URL encoding, UTF-8 should be used for encoding.

Web Links