Document Type Definition


The Document Type Definition, or DTD, specifies the HTML version being used in the source code of a website. That way, a browser or other reading software can display the content of the source code since it is able to detect what type of document it is. In the past, many browsers failed to display content when a missing or incorrect DTD was listed. However, the current standard HTML5 no longer requires the DTD specification in order to display the website correctly, but rather, to validate it against the document type definition. The standard for the document type definition has been set by the W3C.

Origin

HTML has been in use as a markup language for many years. Therefore, HTML is available in many versions. The latest version is HTML5. Older versions such as XHTML or HTML4 are currently only used in old sites. HTML was originally based on the meta markup language SGML. The DTD described the structure of documents, which were created with subsets (XML), or applications (HTML) by SGML. With the development of HTML5, these compounds have been removed. HTML5 is no longer an application of SGML, but is itself a generalized language that is compatible with the previous versions.

The document type definition for HTML specifies what version is being used in the source code of a web document. This information must be provided in order for the application (the browser) to be able to detect which document type it is and what markup is allowed. However, modern browsers will display the content of an HTML document when the DTD is missing but an HTML file is only valid if it is introduced with a clearly defined document type. That, in turn, is defined with the document type definition <!DOCTYPE html>. It specifies which characters can be used in an HTML document and what attributes have to be used. Every HTML document must include a head and a body element and also be listed in a certain logical structure (tree structure), if it is to be valid.

<!DOCTYPE html>
<html lang="de">
  <head>
    <meta charset="utf-8">
    <title></title>
  </head>
  <body>
  </body>
</html>

Document type definitions for browsers

When a website is accessed using a browser, it will download the document type definition specified in the source code and apply it to the document to be opened, in order to correctly interpret the characters contained therein. Browsers like Mozilla’s Firefox, Internet Explorer from Windows or Google Chrome are W3C compliant and can interpret different document type definitions. However, there is a possibility that bugs may be generated in the display and when reading from CSS for websites which use older versions of the document type definition. Other browsers such as Opera only interpret HTML pages based on the most current standards. That way, the risk of errors is reduced. The DTD thus serve to mark up valid HTML documents and the probability of correct representation increases when they are valid.

Construction

The W3C does have proposals for common HTML document type definitions, but every webmaster must eventually make a few adjustments to their website to ensure it is displayed correctly in all browsers. The specification of the document type is the first entry in the source code of an HTML page. The DTD specification is case-sensitive. The doctype string should be capitalized.

<!DOCTYPE html>

In XHTML, the XHTML version specification is listed first:

<?xml version="1.0"?>
<!DOCTYPE ... >

An example:

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" dir="ltr" lang="de">

The address after the doctype specifies the path to the document type definition. The browser can download the definition there and apply it to the current document. If you specify “public,” then the source for the document type definition is public. The language specification, i.e. EN, refers in this case not to the language that is used in the content, but to the language version of the HTML definition. A document type definition for HTML should therefore always refer to EN.

Common document type definitions for HTML/XHTML websites

There are some differences on how to indicate the DTD in different versions of HTML.

DTD for HTML 4

There are three different definitions for HTML 4:

  • HTML 4.01 strict: In such an HTML document, only structuring elements are allowed in the source code. The formatting and design are controlled by style sheets. Thus, the HTML source code is very easy and quick to read.

Example line for this type of document:

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd">
  • HTML 4.01 transitional: This document type specifies that both HTML attributes and style sheets are utilized in this document. It is a transitional definition.

The sample line for this type is as follows:

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">

`

  • HTML 4.01 frameset: With this document type, you specify that your HTML file contains frameset, i.e. individual windows. HTML elements and the BODY are each declared here in another way. All the elements in the BODY are replaced by the FRAMESET.

This definition applies for pages with framesets:

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Frameset//EN" "http://www.w3.org/TR/html4/frameset.dtd">

DTD for XHTML

XHTML 1.0 is based on the same rules as HTML 4.01. The only difference is, that this markup language is based on XML and not SGML. For XHTML, three different types can be specified. The same characteristics which apply to HTML document types, apply to XHTML documents as well. The following are the relevant examples of the information to be listed respectively:

  • XHTML 1.0 strict:
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
  • XHTML 1.0 transitional:
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
  • XHTML 1.0 frameset:
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Frameset//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-frameset.dtd">
  • XHTML 1.1 is no longer bound to the rules of HTML 4.01. Therefore, only one document type definition remains:
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN" "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">

DTD for HTML5

The document type definition has been simplified with HTML5:

<!DOCTYPE html>

Document type definition for websites and SEO

Since the document type definition for HTML pages was an important factor for the correct display of web content, ensuring the correctness of these definitions is currently still part of on-page optimization. Only if all the contents of a website can be displayed as desired to virtually every user, can analyses be done with proper results and marketing objectives be achieved. Many users are still using older browsers that require DTD specifications. If errors occur, it affects not necessarily the look of the website, because most browsers can still display the content. But the HTML documents may contain many errors and the browser will parse these errors anyway. This takes time and users will have to wait longer.

Choosing the appropriate document type definition also determines how lean the source code of a website will be and what the text-to-code ratio is. Because load speed is a search engine ranking criterion, choosing the right document type definition can be advantageous when loading the page. The browser can detect the type of document quickly, to render the content. Moreover, the ratio of source code and text is also considered relevant from an SEO perspective. There should be more text and less source code.

Web Links