Scraping usually refers to screen scraping or, more precisely, “web scraping.” In this practice, the content of websites is extracted, copied, and stored manually or with the aid of software and, if necessary, reused in a modified version on your website. If used in a positive way, web scraping presents a possibility to add more value to a website with content from other websites. If misused, however, scraping violates copyrights and is considered spam.
Scraping can be done with different techniques. The most prevalent are briefly described here:
Scraping is used for many purposes. Here are just a few examples:
Within the context of content syndication, content from websites can be distributed to other publishers. Scraping can, however, often violate these rules. There are websites that consist only of content which has been scraped from other websites. Very often you can find pages on the web containing information that has been copied directly from Wikipedia without showing the content source. Another case of spam scraping is that online stores copy their product descriptions from successful competitors. Even the formatting is often kept the same.
It is important for webmasters to know if their content is being copied by other websites. Because in the extreme case, Google may charge the author with scraping, which could then lead to the scraped domain being lowered in ranking on the SERPs. Alerts can be set up in Google Analytics to monitor if content is being copied by other websites.
Search engines such as Google use scraping to enhance their own content with relevant information from other sources. Google, in particular, uses scraping methods to populate its OneBox or to make the KnowledgeGraph. Google is also scraping the Web to add entries to Google Maps that have not yet been claimed by companies. Moreover, Google collects relevant data from websites that have made microformats of their content available in order to create rich snippets.
There are several simple measures, webmasters can use to prevent their websites from being affected by scraping: