Parser


A parser (from: to parse – analyze a string or text into logical syntactic components) is a program that is usually part of a compiler. The compiler ensures that the code is correctly translated into a machine executable language. The task of the parser is, in this case, the decomposition and transformation of inputs into a usable format for further processing. A string of instructions in a programming language is analyzed and then broken down into its individual components.

How it works

To analyze a given text, parsers usually use a separate lexical analyzer (called lexer), which breaks down the input data into tokens (input symbols such as words). Lexers are usually finite state machines, which follow regular grammar and thus ensure a proper breakdown. The tokens obtained this way then serve as input characters for the parser.

The actual parser handles the grammar of the input data, performs a syntactic analysis of the input data and as a general rule creates a syntax tree (parse tree). This can be used for further processing of the data, for example, code generation by a compiler or executed by an interpreter (translator). Thus, the parser is the software, which checks, further processes, and forwards the instructions in the source code.

Screen Shot 2017-07-17 at 09.58.23.png

Example of a parse tree

Types of parsers

There are basically two different parse methods, top-down parsing and bottom-up parsing. These generally differ in the order in which the nodes of the syntax tree are created.

  • Top-down: In the top-down method, the parser works in a goal-oriented method, which means it searches starting at the start symbol of the syntax and searches for a fitting, syntactic derivation. Thus, the parse tree develops from top to bottom in the direction of a more and more detailed breakdown.
  • Bottom-up: The bottom-up parser starts with the token of the input string and attempts to establish progressively larger syntactic relationships. This is done until the start symbol of the grammar has been reached.

Applications

A parser is often used to convert text into a new structure, for example, a syntax tree, which expresses the hierarchical arrangement of elements. In the following applications the use of a parser is usually essential:

  • The reading of a programming language is performed by a parser. It provides a data structure to the compiler, with which the machine code or bytecode can be generated.
  • HTML code is at first only a string of characters for a computer which must be analyzed by the parser contained in the web browser. It provides a description of the webpage as a data structure which can then be projected by a layout engine onto the screen.
  • Special XML parsers are responsible for the analysis of XML documents and prepare the information contained therein for further use.
  • URI parsers break down complex schemes such as URLs into their hierarchical structure.
  • Search engines such as Google extract (parse) text of relevance to them from the downloaded webpages with crawlers. They are processed and the parsed data can be used for browsing.

Conclusion

Finer classifications of parser types exist in addition to the coarse subdivision in top-down and bottom-up parsing. Based on the analyzed grammar, better crawls can be carried out on webpages with the appropriate parser. Search engines will always aim to optimize this process of efficient website analysis to provide the user quick and informative search results.

Web Links