Doubt about Chomsky’s hierarchy

Asked

Viewed 775 times

3

Regarding the 4 types of languages, I doubt to understand about context-dependent or context-sensitive language.

I find very confusing the explanation that I find through the sites, I would like someone to help me understand what is a context-sensitive language and a unrestricted language.

1 answer

6

The hierarchy

The hierarchy of languages is this:

hierarquia

The languages Chomsky considered at the time were regular, context-free, context-sensitive and unrestricted. However, I will address all of them, including the other three.

Finite languages

A finite language is one that corresponds to a finite set of strings. Therefore, a language is not finite if there exists an infinite set of strings in that language.

Regular languages

All finite languages are regular, but not all regular languages are finite.

A regular language is one that can be expressed by a regular expression. A regular expression consists of strings that may have:

  • (to) Repeats of certain subsequences in an undetermined number of times and;

  • (b) Choices between two or more subsequences.

The characteristic b exists in finite languages, and therefore is the to which increases the recognition power of languages and allows regular languages to express an infinite number of strings. Hence, a regular expression that does not make use of the characteristic to expresses a finite language.

other characteristics often attributed to regular expressions such as repetition of subsequences one or more times, repetition of subsequences a certain number of times and subsequences that may or may not appear, are all characteristics derived from combinations of characteristics to and b described above.

Languages free of context

All regular languages are context-free, but not all context-free languages are regular.

A context-free language is one that can be expressed by means of a context-free grammar.

A context-free grammar is a set of nested recursive derivation rules beyond what regular languages are already able to recognize. Each rule, called production, is defined as how a non-terminal symbol can be expanded into a sequence of terminal and non-terminal symbols. The character string to be recognized or generated by a context-free grammar consists only of terminal symbols, the non-terminal symbols being used only in the intermediate generation or recognition steps of the character string.

Even without the use of recursion in rules, the ability to expand non-terminal symbols into endpoint and non-terminal symbol sequences is sufficient to express the characteristics to and b of regular languages. Therefore, it is the possibility of using such a recursion that makes context-free languages more expressive than regular ones. A context-free grammar that does not use this feature is equivalent to a regular expression.

I explain more about context-free languages here.

Languages sensitive to context

All context-free languages are context-sensitive, but not all context-sensitive languages are context-free.

A context-sensitive language is one that can be expressed by some context-sensitive grammar.

A context-sensitive grammar is somewhat similar to a context-free grammar, but with one big difference: Productions do not necessarily map only non-terminal symbols to terminal and non-terminal sequences. They can map terminal and non-terminal sequences to other endpoint and non-terminal sequences, and this feature makes them more expressive than context-free grammars.

An example of context-sensitive grammar, I got from Wikipedia and recognizing the language { tonbncn : n 1 } that is not context-free is this:

S → aBC
S → aSBC
CB → CZ
CZ → WZ
WZ → WC
WC → BC
aB → ab
bB → bb 
bC → bc
cC → cc

Note that unlike a context-free grammar, the left part of the productions is not restricted to a singular nonterminal symbol.

Context-sensitive grammars are often confusing, difficult to understand, and difficult to analyze. The main reason for this is that they define rules that can be used to replace anything in the input with anything else anyway in any order and anywhere. Effectively, they are computationally equivalent to Turing machines with memory limited to input size.

Therefore, using these production rules is not the only way to specify how a context-free language works. A more natural way is to define it as a deterministic Turing machine whose tape on which it operates is limited only to the size of the input. Being a Turing machine, it can perform any type of reading, writing, and symbol manipulation operations on this input tape. However, since its tape is limited and finite, it still has less computational power than a Turing machine with unlimited memory.

By using a deterministic Turing machine with an alphabet much larger than the input alphabet, it is possible to simulate a deterministic Turing machine operating on a memory tape whose maximum size is the size of the input multiplied by some arbitrary constant. Thus, an equivalent definition for a context-sensitive language is one that can be recognized by a deterministic Turing machine operating on a tape whose size maintains a linear relation to the size of the input tape.

A practical example of context-sensitive language that is not context-free is the one previously, { tonbncn : n 1 }. That is, the language that corresponds to a sequence of a’s, followed by a sequence of b’s and then a sequence of c’s, where the number of a’s, b’s and c’s are equal.

Another example of context-sensitive language is to check whether pairs of italic (<i> - </i>), bold (<b> - </b>) and underlined (<u> - </u>) markings open and close correctly, even if they are not properly nested. In that language, the sequence ab<i>cd<b>ef</i>gh<u>ij</b>klm</u>no is a valid string. There is no context-free language capable of recognizing this.

Another simple example of a context-sensitive language is that language consisting of the sequence of digits between 0 and 9 expressing a prime number in the decimal base. Again, there is no context-free language capable of recognizing this.

Another example, which abuses the computational power available, is to determine whether a sequence of symbols representing variables, parentheses and logical operators AND, OR and NOT is satisfiable. That’s the problem of satisfiability which is the best known NP-complete problem. The context-sensitive language class is so comprehensive that it encompasses all problems in NP.

Recursive languages

All context-sensitive languages are recursive, but not all recursive languages are context-sensitive.

The definition of a recursive language is simple: All that for which there is a Turing machine that recognizes or rejects it in finite time.

It’s hard to think of a recursive language that isn’t context-sensitive. Most of the computational problems we encounter in everyday life are computationally equivalent to the recognition of some context-sensitive language.

A recursive language that is not context-sensitive must be impossible to recognize with the use of a memory quantity limited to the size of the input multiplied by any constant, however arbitrarily high it may be. This already implies computational complexity outside of NP and absolutely intractable problems, although still solvable.

Examples of problems that are recursive but not context sensitive:

  • Determine whether two regular expressions are equivalent.

  • Verify or refute the validity of first-order sentences (which include variables, E, OR, NOT, for all ( ) and exists ( )) with real numbers, addition and comparison (but without multiplication).

  • Determine if there is a perfect strategy to win a chess game and disregard any rule that forces the draw after some arbitrary number of moves without capture to a board of arbitrary size.

Since recursive languages are those that can be decided in finite time, then those that are not recursive are those that cannot be decided in finite time for all cases. In other words, languages that are not recursive are undecidable.

Recursively enumerable languages

All recursive languages are recursively enumerable, but not all recursively enumerable languages are recursive.

A recursively enumerable (or semi-decidable) language is one that is accepted by some finite-time Turing machine, but which may require infinite time to be rejected.

For a language to be recursively enumerable without being recursive, it is necessary that there exist strings that do not belong to the language, but that it is impossible to determine this in a finite sequence of steps (after all, it must be necessarily undecidable). However, strings belonging to the language in question can always be decided as such in a finite (even if very large) number of steps.

The classic example of a recursively enumerable language that is not recursive is the halting problem. Given a description of a Turing machine and an input string, determine whether or not the Turing machine accepts such a string. Although this problem seems to be simple, it is actually a very cruel problem, because there is simply no algorithm capable of solving it in finite time for all possible inputs. I talk more about this problem here.

Unrestricted languages

All recursively enumerable languages are unrestricted, but not all unrestricted languages are recursively enumerable.

The unrestricted word already says what language is: anything and everything. This is the set that encompasses all languages, including languages that are not recursively enumerable.

Languages that are not recursively enumerable are insoluble. They contain strings that are impossible to be determined as belonging to language or not.

According to the rice theorem deciding on any nontrivial semantic property of any program is an undecidable problem. Trivial property means those that are valid for all programs or for no program (and therefore, non-trivials are those that are true for some and false for others). By semantic property, it is understood by properties about the behavior of the program. Being undecidable, therefore, these problems are not recursive and many of them are also not recursively enumerable. We can use this to conceive some languages that are not recursively enumerable. For example:

  • The set of strings describing Turing machines that decide whether their inputs are prime numbers.

Another known example that does not depend on Rice’s theorem is this:

  • The set of strings representing pairs of context-free grammars that recognize the same language.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.