Natural Language Processing


Natural Language Processing (or: Natural Language Programming, in short: NLP) is a technology that enables computers and people to communicate with each other at eye level. NLP combines linguistic findings with the latest methods of computer science and artificial intelligence.

In order for Natural Language Processing to work, speech recognition must firstly be developed. NLP is seen as a promising technology in the field of Human Computer Interaction (HCI) for the control of devices or web applications. For example, the work of chatbots or digital language assistants is based on this principle.

Background

The development of NLP dates back to the 1950s, when the scientist Alan Turing published an article entitled "Computing Machinery and Intelligence", in which he presented a method for measuring artificial intelligence. The so-called "Turing Test" is still in existence today.

As early as 1954, researchers had already succeeded in translating sixty sentences into English using a machine. Euphoric about this start, many computer scientists thought that machine translation was just a matter of time. However, the first systems for statistically-based machine translation were further developed in the 1980s. In the meantime, certain approaches have been found to translate information from the "real" world into computer language.

A major evolutionary step was made in the late 1980s. Machine Learning became popular around this time. Together with the ever-increasing computing power of computers, NLP algorithms could now be used. One of the pioneers in this field was and still is the linguist Noam Chomsky. The software company IBM also ensured the increasing development of Natural Language Processing.

Today, NLP-based computer programs can no longer only access manually collected data sets, but are also able to analyze text corpora such as websites or spoken language directly.

NaturalLanguageProcessing de en.png

Requirements

NLP is based on the basic idea that any form of language, spoken or written, must first be recognized. However, language is a very complex system of characters. It is not only a single word that is important, but its connection with other words, entire sentences or facts.

What people naturally learn from birth, computers have to achieve with the help of algorithms. While the human being can fall back on his life experience, the computer must be able to fall back on artificially generated experiences. The challenge for the machine processing of natural language is therefore not so much to produce language as to understand it.

How it works

Modern NLP is based on algorithms, which in turn are based on statistical machine learning. The unique thing about this is that computers are not only able to learn on the basis of previously learned dilemmas, but can also independently identify problems and solve new problem areas on the basis of large documentation. Computers do not learn to find a solution for every problem, but they learn general patterns that help them to solve individual problems. This makes NLP a precursor for artificial intelligence.

Google. When it was first established, the project was seen as laughable. Today, the program is able to translate many different texts and even the spoken word fairly fluently.

Google's established "Rank Brain" also uses the Natural Language Processing method to deliver matching results to unprecedented search queries. The "interpreting" of input is supplemented by artificial intelligence.

Computer programs based on NLP must perform the following tasks:

  • Stemming
  • Simplifying text
  • Convert text to spoken language
  • Convert spoken language to text
  • Understand searches in natural language
  • Detect advanced and follow-up questions
  • Check plausibility of answers

Areas

NLP touches on many individual areas. These include:

  • Information Retrieval: in the general processing of information
  • Information Extraction: for semantic questions
  • Speech Processing: Speech recognition or text-to-speech functions

Tasks

Speech recognition as a central task area at NLP depends on many different factors. Here, the most important ones are summarized briefly.

  • Automated summary: The programs must be able to automatically reduce large texts to the essentials.
  • Relationship between words within a sentence: Here, NLP is required to recognize which sentence elements are related to each other.

Example: I'm sitting in the back seat of my car. In this case, the program must recognize that the rear seat belongs to the car.

  • Discourse Analysis: NLP software must be able to recognize the register of a text (raised, colloquial) in the register. The program must also recognize the type of text (purchase note, invoice, request).
  • Machine translation: NLP-based programs must be able to translate the human language into another human language and master grammar, semantics and other linguistic sub-areas.
  • Morphological segmentation: Under this term, the decomposition of a word into its individual components is described.
  • NER (Named Entity Recognition): An NLP program must recognize whether a text contains proper names for places, persons or organizations and must be able to assign them. For the text output, the program must therefore also know for Western languages whether the words in question are capitalized.
  • '"Converting to human speech'": Digitally stored words are translated into human speech.
  • Understanding human language
  • Optical character recognition (OCR): This is an image recognition that can convert images into text, as some scanners can do today.
  • Recognition of feelings
  • Recognition of spoken language
  • Recognizing styles such as irony
  • Recognition of word meanings: Sound can "book" both the action of a ticket purchase and the majority of the tree "beech".

Areas of application and outlook

NLP is an important building block in the development of artificial intelligence. Language plays a central role in the creation of autonomous computers. The Natural Language Processing approach thus forms the important interface between human beings and computers.

Today, these techniques are used in the translation of documents, in the processing of documents, but also in call centres. In the meantime, there are also programs that can create texts on their own.

Services such as Skype should soon be able to translate telephone conversation live. Today, users can already "talk" to chatbots of selected providers on Skype to book tickets or to start simple queries. Google also wants to turn its Translator into a live translator. At the same time, the technology is used by numerous digital assistants from major Internet companies, such as Amazon Echo, Windows Cortana or Siri from Apple.