Invisible Web


The Invisible Web is the part of the World Wide Web, which is not indexable by search engines and is therefore invisible. In contrast to the Surface Web, the Invisible Web consists of data and information that cannot be searched with search engines for various reasons. Users cannot access this information by using traditional search engines. Non-indexed websites, apps, and resources include protected information in the areas of email, online banking, specialized databases, and other paid services, for example, through a paywall. In addition, there are non-linked and password-protected websites as well as media types and archives which cannot be crawled with current search engine technology.

General information

The different names for the World Wide Web are often confused. Terms such as Dark Web, Deep Web, Invisible / Visible Web or the so-called Darknet are subject to unclear demarcations and definitions. The most important empirical study on this subject dates from 2001 and is likely outdated now.[1] Although the size of the various types of the Internet was examined there, definitions were also created in a further study of the same year which are explained in the article Deep Web.[2] Talking of an Invisible Web makes sense only if the search engine used is also mentioned. Because search engines make resources visible in the Invisible Web as well and everything that has not yet been indexed by search engines is, in principle, invisible to most Internet users.

How it works

The Invisible Web can be viewed as an area of ​​the Internet that has either not (yet) been indexed or has been subjected to various access restrictions. A metaphor that is often used in this context is that of the ocean. In a cross section, the information that makes up the World Wide Web can be represented as the various depths and layers of the ocean. A search engine like Google would then be a fishing boat fishing in the shallow waters. However, there is a lot of other information which is unreachable for the fishing boat, because the nets do not reach down that far. Accordingly, the following terms are common among IT experts:

  • Surface Web: The information resources are linked by hyperlinks. Search engines can crawl and index this information. Most users know this kind of web as the Internet, which they use for example in a search.
  • Shallow Web: The Shallow Web is the information technology background of many pages. This includes databases, servers, and programming instructions, which are stored in the databases. For example, websites are generated directly from these databases when they are accessed by users. This includes, in particular, scripted and dynamic websites that are linked to hyperlinks and created using PHP and other programming languages. The links attract search engines to these websites, but they usually only index the static versions of these websites.
  • Deep Web or Hidden Web: These information and resources are usually hidden and there are no links to it. To search for this information, specific search engines and technologies are needed to gain access. The Tor browser is an example of such a technology. Deep Web directories act as specific search engines that can be used to access information, unless it is subject to other access restrictions (such as passwords, encryption, firewalls).

In the above terminology, the Invisible Web is a combination of the Shallow and the Deep Web. To achieve the content and information of the Invisible Web, either individual queries based on the programming languages ​​used or specific search engines that provide an index are required. Because a variety of subject databases and server resources are thematically organized and written in a particular programming language, a search of this information is almost impossible for general search engines like Google, Yahoo or Bing. In this sense, the content there is invisible, but in principle can be reached using vertical search engines, specific technologies, and the correct programming instructions.

Examples

Some examples of the Invisible Web:

  • Databases from which websites are generated (dynamic websites).
  • Academic purpose databases that require registration.
  • Non-linked and password-protected websites.
  • Access-restricted networks that require specific technologies.

Relevance to online marketing

While links from academic networks (.edu links) and government websites are quite popular in online marketing, caution is advised with content from the Invisible Web. Each resource should be evaluated individually and the link profile of a website should be regularly reviewed. Similar to the Dark Web, certain links may appear to Google as if they are a Bad Neighborhood. The linked sites do not have the trust of the search engines and accordingly, such links can negatively impact the linked resource.

However, a first link to a website is a proof of trust for traditional search engines. Through this link, they often gain knowledge of a new resource. It is also advisable to check whether the website to be marketed is accessible for conventional search engines. For example, the htaccess file, meta tags, and the robots.txt should be examined as to whether they grant search engine access or whether the content is hidden or invisible.[3] Crawlability and indexability are central prerequisites for success on the Internet if online marketing is to be done for a website.

References

  1. White Paper: The Deep Web: Surfacing Hidden Value quod.lib.umich.edu. Retrieved on October 24, 2016
  2. The Invisible Web: Uncovering Sources Search Engines Can’t See ideals.illinois.edu. Retrieved on October 24, 2016
  3. The Ultimate Guide to the Invisible Web oedb.org. Retrieved on October 24, 2016

Weblinks