Deep Web


Deep web or “hidden” web refers to the part of the world wide web which generally cannot be found through a normal search engine. The deep web largely consists of specialized databases and websites, which are only generated dynamically by requests from specific databases. The size of the deep web is not clearly determined, but it is many times larger than the part which is visible to search engines, known as the visible web or surface web.

Features

It is difficult to obtain accurate data on the deep web, for example, the study (Bergmann 2001) conducted by BrightPlanet [1] The following properties were isolated in this study:

  • The deep web is about 400 to 500 times larger than the surface web
  • There are probably more than 200,000 deep websites
  • Web pages from the deep web have an average of 50% more hits per month and are better linked as compared to surface websites

The University of California, Berkeley published the following values in 2003 ​​on the scope of the Internet:

  • Surface web: 167 terabytes
  • Deep web: 91 850 terabytes

Types of Deep Web

According to Sherman & Price (2001), one can distinguish five types of deep web:

  • Opaque web

The opaque web designates websites that could be indexed, but are not for the time being due to technical efficiency or cost-effectiveness. Since search engines do not take into account all levels of directories and sub-pages of a website, relevant documents may not be reflected in lower hierarchical levels. Particularly affected are websites without hyperlinks or navigation system, as well as unlinked websites.

  • Private web

The private web includes websites that could be indexed, but are not indexed due to access restrictions by the respective webmaster. These may be internal websites, password-protected data or access to only specific IP addresses.

  • Proprietary web

The proprietary web refers to sites which are accessible after confirming usage conditions or by entering a password and thus cannot be indexed. These sites are usually available only after identification.

  • Invisible web

The invisible web includes sites not indexed for strategic or commercial reasons. From a technical perspective, indexing could be done without a problem.

  • Truly invisible web

Websites of the truly invisible web are not indexed for technical reasons. They may be documents that cannot be displayed directly in a browser, or file formats that cannot be detected because of their complexity (mostly graphic formats), or non-standard formats (for example, Flash).

Conclusion

The deep web contains a lot more additional data as opposed to the surface web. An integration of these results can be beneficial to users because potentially applicable results could be obtained. However, an efficient implementation of such a search engine for both surface and deep web is difficult and getting a selection of appropriate sources for a search query could be problematic. In addition to scientific and legal data there are, however, many opaque sites in the deep web. In addition to a huge black market, there are also many websites of cybercriminals, political extremists (neo-nazis, revolutionaries), etc. Therefore, the deep web should be used with caution despite the large supply of helpful documents and data.

Relevance to SEO

Search engine optimization specialists strive to achieve favorable search results for users. Therefore, well linked, contextually relevant websites should achieve a correspondingly high Google index ranking. The exact procedure for crawling and ranking of the deep web is still unknown, however, SEO specialists are developing strategies with which to efficiently make documents of the deep web accessible to search engine users.

References

  1. [1] resulted in the Journal of Electronic Publishing. Accessed on 07/23/2013