LL
LL works by trying foresee what will be the next production rule, and then applying this rule to the next read input symbols. If the forecast fails, it is necessary to return to the state before the forecast, return the symbols read to the input (backtracking) and make a new prediction. The parser fails when all possible predictions have been tested (i.e., the initial prediction that there is a valid input to the language if it proves false).
It’s a pretty easy type of parser to write manually because of its recursive feature. It becomes trivial if the language is regular or never needs backtracking (in this case it is a LL(1)). The parser code is also very similar to the grammar that produced it and can be read with ease.
However, runtime is difficult to predict (although it is not so different from linear on input size in most practical cases) and not all grammars can be processed by an LL parser. A classic example is left recursion:
Num → [0-9]+
Base → Num | '(' Sum ')'
Prod → (Prod '*')? Base
Sum → (Sum '+')? Prod
This grammar cannot be processed directly because the parser LL understands that when reading a Sum
, should assume that soon after comes a Sum
and check that possibility, generating an infinite loop. For this to work you must first transform the grammar so that it is acceptable for the LL algorithm, thus:
Num → [0-9]+
Base → Num | '(' Sum ')'
Prod → (Base '*')* Base
Sum → (Prod '+')* Prod
It continues to accept exactly the same language, however it will not produce the same syntax tree (Sum
and Prod
have seen n-dimentional operations in place of binaries) and need additional processing to adjust it.
LR
LR operates on a list in which at each stage one of the following operations will be executed:
- Shift: A symbol will be read from the input and added to the end of the list as a terminal.
- Reduce: A production rule will be applied in the last N terminal or non-terminal symbols present in the list, transforming them into a nonterminal.
The operation is repeated until the input is empty and the list contains only one nonterminal, which represents the language. If this state cannot be reached (the input ends and there are no rules to apply and reduce the list) the parser fails.
The interesting thing here is that an input symbol is never read more than once and that the processing time is clearly linear in the size of the input (assuming that the number of production rules is not absurdly large, practical cases). This mechanism can handle recursion smoothly. Some variants are the LALR and the SLR.
The LR parser accepts a larger set of grammars, having no recursion problems. The following grammar, for example, is perfectly acceptable.
Num → [0-9]+
Expr → Expr '+' Expr | Num
Being an ambiguous grammar, the rightmost derivation will be given (hence the R of LR). 1+2+3
is processed as 1+(2+3)
.
It is a type of parser that is difficult to write in the hand because its logic is mostly written in the form of transformation tables, not code. On the other hand, it is simple to be produced by another algorithm from grammar, a compiler. Some examples are the Bison and the Yacc that produce a parser LALR.
Completion
If you intend to write the parser yourself, without the help of any tool, immediately tend to choose a LL, especially if the language is regular or simple enough to LL(1). It’s an easy model to program and in most cases the difference in performance is too small to be relevant. (There are grammars that can make an LL require exponential time in input size, but I’m assuming real and practical cases). It is a good choice for those who are starting on the subject by allowing to really understand what is happening in the code with ease.
But if you plan a parser of greater complexity or what efficiency is important, the ideal is to rely on a tool to generate the parser for you. In most cases it will be able to make grammar simplifications that make it unreadable in the final parser code, but faster. Almost all such tools produce LALR parsers, although some use LL (such as ANTLR). In this case you describe the grammar, usually in PEG notation. The diagnosthus for syntax errors tends to be better too.
However, if you need fine-tuning in your parser, how to deal with some syntax element that is unconventional and cannot write a grammar directly. An example of this is the C/C++ Preprocessor. You can do the process in two stages by submitting the processed code to the parser. But doing so will lose important diagnostic information and the error message may have nothing to do with the code originally written. The option is to integrate the Preprocessor into the parser. But in this case there is no way to write grammar, you need to write the parser in code.
I particularly choose to write a parser (in LL) only when the language is extremely simple. For any real project I use some tool to generate the parser.
It’s always good to see that we don’t only have web developers here :) You already gave answers that I thought no one would answer and even had not gotten good things in English.
– Maniero
in productions
Expr → Expr '+' Expr
it seems to me that the text would be better if you didn’t use an ambiguous grammar (Expr → Expr '+' Num
). Ambiguity will give unnecessary and semantically dangerous conflicts. It seems to me that you could review line that explains the LR.– JJoao
@Jjoao In fact the example of LL was incorrect. Corrected. As for the LR, it was intended to display an ambiguous grammar, precisely to demonstrate that this is possible.
– Guilherme Bernal
@Guilhermebernal, sorry if I insist. The idea of the R of the LR is not quite this. It is more waiting for the right corner of the derivation tree to make decisions. In ambiguous grammars like this all the tools I have used (no extra definition of associativity) complain about reduce-reduce conflicts and warn that semantics will go wrong
– JJoao
@Jjoao I will review this calmly and read better on the subject. Further forward I make the proper edits. Thank you very much for pointing out the error, I will study.
– Guilherme Bernal