What is and how does ascending and descending parsing work?

Asked

Viewed 3,311 times

5

According to some research I did, Parsing in computing, known as Parsing in English, it is the process of analyzing an input sequence (read from a computer file or keyboard, for example) to determine its grammatical structure according to a given formal grammar.

Parsing transforms a text in the input into a data, usually a tree, which is convenient for processing and captures the implicit hierarchy of this entry. Through the Lexical analysis is obtained a group of tokens, so that the analyzer syntactic use a set of rules to build a tree syntactic structure.

What is and how the ascending and descending syntactic analysis procedure is performed?

1 answer

3


So abridged:

  • Upward syntactic analysis, also called bottom-up, the parser can start with a data input and try to rewrite it to the initial symbol. Intuitively, the parser try to locate the most basic elements, and then larger elements that contain the most basic elements, and so on. Example: LR parser (LEft-to-right Right-Most-Derivation).

  • Descending syntactic analysis, also called top-down, the analyzer can start with the initial symbol and try to turn it into the data input. Intuitively, the parser starts from the largest elements and breaks them into smaller elements. Example: LL parser (LEft-to-right LEft-Most-Derivation)


Syntactic analysis

The syntactic analysis is the second stage of the compilation process (the first is the lexical analysis) and in most cases uses context-free grammar to specify the syntax of a programming language. The main task of the parser, known as Parsing, is to determine whether the input program represented by the token stream has valid sentences for the programming language. If so, we additionally want to figure out the way (or one of the ways) the string can be derived by following the grammar rules.

Making a grammar analogy of the Portuguese language, which studies the arrangement of words in a sentence, this part of the compilation is responsible for determining whether a chain of lexical symbols can be generated by a grammar.

Context-free grammars represent a formal grammar and can be written through algorithms deriving all possible language constructs. These derivations aim to determine whether a word stream fits the syntax of the programming language.

A derivation is a sequence of substitutions of nonterminals by a choice of grammatical production rules. When we make the derivation we must apply the production rule to replace each nonterminal symbol with a terminal symbol, this allows us to identify if certain strings belong to the language, the rules expand all possible productions. As a result of this process we have the derivation tree, which is a graphical alternative to show the derivation process of a sentence in a grammar.

The way in which the derivation tree of the analyzed chain x is constructed, says if the syntactic analysis is descending or ascending.

  • Descending syntactic analysis, also called top-down: the branch tree corresponding to x is built from top to bottom, i.e., from the root (the initial symbol S) for leaves, where it is x. In this kind of analysis, you have to decide which rule A→β shall be applied to a node labelled by a non-terminal A. To expansion of A is made by creating child nodes labeled with the symbols of β.

  • Upward syntactic analysis, also called bottom-up: the branch tree corresponding to x is built from bottom to top, i.e., from the leaves, where it is located x, to the root, where the initial symbol is S. In this kind of analysis, you have to decide when the rule A→β shall be applied, and we shall find neighbouring nodes labelled with the symbols of β. To reducing by the rule A→β consists of adding a node to the tree A, whose children are the nodes corresponding to the symbols of β.

In both types, the trees are built from left to right. The reason for this is that the choice of rules must be based on the string to be generated, which is read from left to right.

For example, consider the following grammar and string x = a+a*a:

1. E → E + T
2. E → T
3. T → T*F
4. T → F
5. F → (E)
6. F → a

In descending analysis:

inserir a descrição da imagem aqui

The rules are considered in order 1 2 4 6 3 4 6 6, the same order in which the rules are used in the left derivation:

E ⇒ E+T ⇒ T+T⇒ a+T ⇒ a+T*F ⇒ a+F*F ⇒ a+a*F ⇒ a+a*a

With the ascending analysis, on the other hand, the rules are identified in the order 6 4 2 6 4 6 3 1, in this case the order of the rules corresponds to the right derivation, reversed:

inserir a descrição da imagem aqui

a+a*a ⇐ F+a*a ⇐ T+a*a ⇐ E+a*a ⇐ E+F*a ⇐ E+T*a

That is to say:

E ⇒ E+T ⇒ E+T∗F ⇒ E+T∗a ⇒ E+F∗a ⇒ E+a∗a ⇒ T+a∗a ⇒ F+a∗a ⇒ a+a∗a

References:

Browser other questions tagged

You are not signed in. Login or sign up in order to post.