What is the difference between automata and grammars?

Asked

Viewed 886 times

10

I am studying about compilers and relating programming languages. What is the difference between automata and grammars?

1 answer

12


Formal Languages

In the study of formal languages, one language is a set of strings on an alphabet. A string is a sequence of letters obtained from a given alphabet1.

note: String is usually translated as "word" but perhaps "phrase" would be more correct, since spaces are a letter like any other from the point of view of formal languages and "A B" is a single string.

Grammars

A grammar is used to describe the structure of a formal language. Grammars consist of a series of rules describing how to form words from the language. For example, one of the grammars that describes the language of well-balanced parenthesis strings is the following grammar:

regra inicial: S
regras de geração: 
S -> ()     // Um par de parênteses é bem balanceado
S -> ( S )  // Uma string balanceada cercada de parênteses é bem balanceada
S -> S S    // Duas strings balanceadas em sequência são bem balanceadas.

Grammars can be used to generate strings but the most interesting operation is, given a string, to obtain a syntax tree that describes the structure of the string in relation to the grammar. For example, in the language of arithmetic expressions, the string "1 + 2 * 3" will be translated into the following tree:

  soma
  /  \
1   produto
      /  \
    2     3

It is on these syntax trees that the compiler of our programming language or the expression evaluator of our calculator will operate. It’s something much more structured than a flattened text.

There are several questions we can ask about grammars:

  • Is the grammar ambiguous? Is there a string that can be derived using two different derivation trees or does every string have a single derivation? For example, if we do not use rules of precedence for operators, in 1 + 2 * 3 is ambiguous whether the sum or multiplication comes first.

  • Is the grammar regular? Is the grammar context-free? Is the grammar left recursive? The more general the grammar is, the more general the language it can describe, but at the same time the more complex and costly the algorithm to do the Parsing.

Automata

Automata are state machines that receive a word and return YES or NO at the end. In the context of formal languages, automata are associated with the language of the strings they accept with a YES.

Automata are classified with the computational power they can use: The available memory is finite, stack, unlimited...? Processing is deterministic or nondeterministic (parallel)?

Some examples of questions we can ask about automata:

  • Two automata recognize the same language?
  • We can decrease the number of states of an automaton without changing the recognized language?
  • What happens if we decide to combine two automata?
  • What is the difference in recognizable languages of deterministic and nondeterministic automata, with more or less memory? And in terms of execution cost and number of states required?

Classification of grammars vs automata. Where compilers fit?

A very interesting classification is the chomsky hierarchy, which associates some important classes of formal languages with which types of grammars are capable of generating those languages and which automata can be used to recognize them:

       | Linguagens                 | Automato                | Regras de produção da gramática
Tipo 0 | Recursivamente enumeráveis | Máquina de Turing       | (sem restrições)
Tipo 1 | Sensíveis ao contexto      | MT linearmente limitada | a X b -> c Y d
Tipo 2 | Livres de contexto         | Pilha ñ determinístico  | X -> a Y b
Tipo 3 | Regulares                  | Finito                  | X -> a Y 

Of this list, the most important for a compiler are context-free grammars and finite automata.

Context-free grammars are the ones you will normally use to describe your programming language, since programming languages need to be processed efficiently and the absence of context effects leads to more predictable syntax trees. In addition to avoiding ambiguities, you will also have to pay attention if your grammar is compatible with the type of parser you will use. If you are going to write a top down parser, (recursive, handwritten), your grammar will need to be an LL grammar, with no recursion left. If you’re going to use a bottom-up parser like Yacc or Bison, your grammar can be an LR grammar, which has fewer restrictions.

As for the automata, the ones you’ll spend the most time studying are finite automata. They are used to recognize regular expressions, which are used extensively in lexer, which is the first step of compiling.

  • Very good your answer, clarified my doubt, thank you very much!

Browser other questions tagged

You are not signed in. Login or sign up in order to post.