If-Modified-Since


The if-modified-since header is a data field in HTTP communication between servers and clients such as browsers and search engine crawlers. When a client accesses a server that supports this header data field, the condition is checked to see if the content on the server has changed since the last time the client accessed it. If the contents have not changed, the server sends the status code 304 instead of the contents stored there to inform the client that it does not have to load the contents. Instead, the client can load a cached version of the Web site: for crawlers, this is the version that was last retrieved; for browsers, it is the version that has been cached in the browser's cache since the first time it was loaded. In accordance with the Google Webmaster Guidelines, we recommend using the if-modified-since header to prevent Google's crawler from loading unnecessary resources.

General Information

Regarding HTTP communication, developers and webmasters have various options for influencing the behavior of servers and clients. This HTTP caching contains various methods for controlling criteria that affect requests and responses. The principles followed by Firefox, for example, are: expiration and validation.[1] The if-modified-since and the last-modified header are part of the validation: A document or resource is checked for its current validity. This reduces server requests, data transfer and access times to control client-side waiting times, server-side utilization and the bandwidth used for data transfer. The resources used by the search engine crawlers and the overhead (data for data management) are also reduced.

The if-modified-since and the last-modified header use a timestamp that is queried by the client. The specification of this time stamp is mandatory for all HTTP protocols - otherwise no conditional query can take place because there is no condition. As part of the Hypertext Transfer Protocol 1.1 (HTTP 1.1), so-called HTTP ETags are available to determine the output of content and the buffering in the client's cache. The tags mark a resource. Apache builds the etag from the inode, but nginx uses file size and timestamp. The etags are compared and if there is a match, the corresponding status code (for example 304) is sent.

Operating principle

The data field if-modified-since is used together with the GET method: A client sends a request with GET statements in the header to a server. The server responds by returning the data that has been requested. If the requested document has not been changed, the client does not need the entire document, but only the header - no body of the document is sent with a positive check of the conditionals (if-modified-since). The conditional is checked by comparing the specified time stamp with the current date.[2]

  • The server responds with a 200 status code: The document has been modified and must be loaded.
  • The server responds with a 304 status code: The document has not been modified and does not need to be loaded.

In the header of a conventional HTTP request with the HEAD method, which only queries the header data of a document, this looks like this:

HEAD /~si/index.html HTTP/1.0
 
HTTP/1.1 200 OK
Date: Thu, 23 Nov 2000 11:21:36 GMT
Server: Apache/1.3.12 ...
Expires: Fri, 24 Nov 2000 11:21:39 GMT
Content-Length: 15643
Last-Modified: Wed, 15 Nov 2000 13:11:22 GMT
Connection: close
Content-Type: text/html


In the third last line of the query, the data field last-modified is found together with the time stamp in standardized GMT time. This request creates a normal 200 status code with the last modified file version. The content is loaded by the client - whether it's a browser or a search engine crawler. The timestamp of the document is important, since no conditional query can take place without it. It contains the server time, which is synchronized by the GMT format with the time of the client.

If you now use a GET method with the if-modified-since header, the server response looks like this:

HEAD /~si/index.html HTTP/1.0
 
HTTP/1.1 200 OK
Date: Thu, 23 Nov 2000 11:21:36 GMT
Server: Apache/1.3.12 ...
Expires: Fri, 24 Nov 2000 11:21:39 GMT
Content-Length: 15643
Last-Modified: Wed, 15 Nov 2000 13:11:22 GMT
Connection: close
Content-Type: text/html


The return of a 304 server is only possible by querying the time stamp and server-side support. If the document has been changed, the server sends a 200 status code instead. Whether the server used supports a 304 status code can be verified with a simple HTTP request using various tools. A service contact with the provider may be necessary to find out whether this is the case. Depending on the server version and the technology used (Unix, Apache, PHP or different CMS like Wordpress and Typo3) these requests not only look different, but sometimes also require different implementations. Special plug-ins are available for some CMS.

Significance for Search Engine Optimization

Google recommends using the if-modified-since header for all websites.6] Webmasters, website operators and developers should follow this recommendation if the contents of their websites do not change frequently. If the Googlebot has retrieved a URL before, it automatically inserts the if-modified-since data field in its queries. The HTTP-compliant formatting of the date and time after GMT allows the server to find out whether the document has been processed. The Googlebot then receives the answer from the server whether something has changed since their last visit (200+Body vs. 304 header-only). If not, the bot doesn’t need to retrieve it, therefore saving time and bandwidth for data transfer, and unnecessary overhead (this is metadata that occurs during HTTP communication). This also minimizes the use of the server caused by the Googlebot.[3]

References

  1. HTTP Caching FAQ developer.mozilla.org. Accessed on 12/02/2015
  2. HTTP Caching httpwatch.com. Accessed on 12/02/2015
  3. Date with Googlebot, Part II: HTTP status codes and If-Modified-Since googlewebmastercentral.blogspot.de. Accessed on 12/02/2015

Web Links