The term reCAPTCHA refers to an automated test to distinguish people from machines by means of different interaction patterns and associated parameters. The test serves as an access control for websites, online services, input forms, forums or guestbooks. It is basically a classic Captcha service based on the Turing test. The test verifies whether an authorized human user is trying to gain access and prevent access by machines (bots, scripts, and damaged software). While one or two optically distorted terms have been displayed in older Captcha versions, the latest version requires only one click by the user since Google also includes parameters such as IP addresses, cookies, mouse movements, and length of stay to identify a human user. The current version is also called NoCAPTCHA reCAPTCHA.
Captchas have been used for some time to control spam and malware, but from a usability point of view they are an obstacle to the use of a website or an online service. Users must first make an entry before they can interact with the medium. Accordingly, attempts were made to improve accessibility by constantly testing new methods. People solve Captchas more reliably than a machine because they have experience and skills that a machine does not have. For example, they can arrange the objects of an image so it makes sense if they are related by topic or they can associate terms and objects. With most Captcha versions, accessibility has always been a problem, especially since physically or cognitively restricted users want to use a website easily as well. The problem is how to prevent machines and bots from accessing the medium without neglecting accessibility and usability.
The Turing test is the functional basis of the Captcha method (Completely Automated Public Turing test to tell Computers and Humans Apart). Three participants (A, B and C) perform a test. Person (C) tries to decide whether the two participants are a person (A) or a computer (B). Both the human being and the computer want to convince the system that they are a human being and therefore have the power of thinking or awareness. If they succeed, the test is passed. The functionality of Captchas is based on this test, but it is modified in some aspects. The questioner is not a person, but a computer, which is supposed to identify a user as a human being based on their input. The terms “challenge response test” or “human interaction proof” (HIP) are often used synonymously for such tests.
According to a study by the Carnegie Mellon University, which initiated the reCAPTCHA project, hundreds of thousands of hours were spent every day in the year 2000 on solving captchas. The project, which was taken over by Google in 2009, uses this input for machine learning. The data entered by the crowd supports the digitalization of different media. One of the displayed words in the captcha input field is to be completed by a user because this word could not be digitized so far. All user input is then used for Google Books and Google News to facilitate the scanning of books, magazines, and periodicals. Scanning is called Optical Character Recognition (OCR). The company also uses the technology on Google Street View and Google Maps, for example, to capture places with photos of road signs. The idea of using inputs as crowdsourcing or crowd testing is still being pursued. However, the type of input has changed and the computer has become “wiser” through machine learning.
The project NoCAPTCHA reCAPTCHA is a further development of the previous Captcha procedures. Users no longer need to enter terms, instead they confirm that they are human by clicking the “I’m not a robot” box. The system, which works like an artificial intelligence, checks additional parameters of the user every time an input form or online service is called up and matches this with the data already collected. If the data indicates a machine, a classical test is offered and the user must, for example, enter terms or identify objects in photos. Each of these interactions helps the NoCAPTCHA reCAPTCHA project to distinguish people from machines and to digitize more data.
The surfing behavior, surfing history, user device, various features of the network configuration, as well as some secret parameters are used by the system to distinguish people from machines, because bots or machines would not be able to simulate these parameters according to Google. The KPIs from the web analysis are used to create a canvas fingerprint and identify the user as a human being. Human users will be easily able to operate the website while bots and machines get locked out. Accessibility is ensured by the system providing an audible version of the Captcha procedure for certain users.
Although the current Captcha procedure is relatively simple for regular users, it is not a long-term solution for the problem of spam and malware. Machines can also be used for learning on this end. Spambots, for example, could be taught to solve certain questions and problems more reliably than before and thus masquerade as human beings. Recently the audio version for the Captcha process was hacked and developers were able get machines accepted as human beings. Google responded immediately and changed the source code of the system so that this vulnerability could be fixed. This example shows that the methods for spam control must always be further developed if they are to be effective. Moreover, it shows that these methods are not always compatible with the aspects of accessibility and usability, even if it is the ideal solution.