In the Chomsky hierarchy, we have the following grammars, from the simplest to the most powerful:
- Regular grammar
- Grammar free of context
- Grammar sensitive to context
- Unrestricted grammar
Curiosity: that’s not Chomsky’s nomenclature. He calls it markov grammar (or similar) what mathematicians and scientists/computer theorists call regular grammar, and also calls it regular grammar what mathematicians and scientists/computer theorists call context-free grammar. I don’t remember what you call it context-sensitive, but the unrestricted keeps the nomenclature in my memory.
A regular grammar can be right-regular or left-regular, depending on how your derivation rules are.
Regular right has the following formats for derivation rules (upper case indicates nonterminals, lower case indicate terminals; S
is the initial nonterminal):
S ==>
A ==> aB
A ==> a
Note that the word grows from left to right.
Regular left has the following formats for derivation rules:
S ==>
A ==> Ba
A ==> a
Context-free grammars have only context-free derivations: a nonterminal generating any lexical form. A lexical form is a mixture of terminals and nonterminals in any order and any quantity (even λ is considered a lexical form). Hence it is easy to see that all regular production is also context-free.
For some jargon in the area of formal languages, see that other answer
One (lazy) way of defining a language as regular is: that language that can be represented by a regular grammar. Analogously: the language X is the one that can be represented by a grammar X.
I can say that a language is strictly context-free if it cannot be represented by a simpler grammar than context-free grammar. And that goes for all other languages.
Let’s take the language L = {a^n b^m |- n inteiro, m inteiro}
. Is it context-free? The answer is yes. Proof of this?
S ==>
S ==> A
S ==> B
S ==> AB
A ==> a
A ==> AA
B ==> b
B ==> BB
Now, she is strictly context-free? The answer is: no.
I’m taking advantage of the fact that regular expressions are equivalent to regular grammars, at least "pure" regular expressions in the mathematical concept
Proof of this: a*b*
.
And the language L = {a^n b^m |- n inteiro, m inteiro, n <= m <= 2n }
? Would she be free of context?
Yes, free of context. And if you want her automaton, see here.
Now, that would be her strictly context-free? Would I have to search for all regular grammars on terminals a
and b
to analyze this?
Fortunately, we don’t need to run a test for running out of options. They are infinitely enumerable, but even so infinite, it would not be possible to do this kind of demonstration in finite time. So, let’s pack up the math and the seatbelts?
Right at the beginning of the text, I put as a curiosity that Chomsky called this type of grammar as being a markov grammar. Do you know why? Because he inadvertently used a Markov process to create a automaton word generator, he created a mealy machine: for each transaction fired on the Markov chain, a new terminal symbol was produced.
I flirt a little more with generative automata and Markov chains in this answer.
So, since a finite state automaton recognizes a regular language and is also a kind of Markov chain, we have then that regular languages are closely related to Markov chains.
If my Markov chain allows for ties, and I am in one of its tie states, nothing prevents me from making that loop endlessly. After all, Markov chains are only dependent on the current state when determining the next state.
To get to the point where the loop occurs, I may need to generate on the Mealy machine the subpalavra x
. The full loop would beget me the subpalavra y
. Then I can finish as the subpalavra z
. This implies that the following word concatenations are recognized by exactly the same automaton:
xz
xyz
xyyz
xyyyyyyyyyyyyyyz
Keep repeating the loop that generates y
is called "pumping". I can pump y
is the will that the word obtained continues to belong to language. Then, it follows the motto of pumping for regular languages: if not available x
, y
and z
which satisfy this condition, so the language is not regular.
In this case, our grammar is context-free, if it is non-regular, then there is no simpler grammar that represents this language, so it would be strictly context-free.
Let us take the word arbitrary a^n b^m
belonging to the language, for some n
and some m
valid. I can take the subpalavra y
in one of the following ways:
- pure-minded: only contain
a
- pure-b: only contain
b
- mixed: begins with some
a
s, then follow some b
s
Whatever the y
mixed, pump the second y
already makes the new word invalid. See: a^n b a b^m
; Bombeei ab
once again. This did not respect the definition of the language of which all a
s are on the left side, and that all b
s are on the right side.
To pure-minded, what happens if I pump m
times over? My least result would be a^(n+m) b^m
. As the amount of a
s is greater than the amount of b
s, this word obtained also does not belong to language.
So will the alternative pure-b can show something regular? Well, the answer is no. Pump 2n
times, then the smallest word would be a^n b^(m+2n)
. The amount of b
s has become strictly larger than twice the amount of a
, also disqualifying this word as belonging to language.
So, since there are no more alternatives left to prove that it is generated by a Markov chain, then I can state that, by exhaustion, it is not generated by a Markov chain; this implies that there is no finite automaton that recognizes; and this implies the absence of regular grammar to recognize this language.
Therefore, L = { a^n b^m |- n inteiro, m inteiro, n <= m <= 2n }
is strictly context-free.
On the possibility of determining, generically, that a context-free grammar G
, that generates the language L(G)
, cannot be simplified to a regular grammar R
such that L(G) = L(R)
... well, in some cases you can decide between yes or nay, but in the general case this is not possible.
You will find some cases where can’t stop with an answer to the question. I do not know the proof of this property, but I know very well this result of the impossibility of determining whether a context-free grammar is not a regular expression disguised.
I got a little confused. You want to prove that not all context-free language is regular?
– Jéf Bueno
@LINQ exactly. But I also wanted to know if I can say the same for languages at other levels of the hierarchy. I’ll try to be clearer: besides proving that not all context-free language is regular I also want to prove that not all context-sensitive is context-free.
– Léo Eduardo Silva
My intention in this question is to know whether or not I can always have an equivalent grammar but at a reduced level in the Chomsky hierarchy. I want to prove that this is undecidable.
– Léo Eduardo Silva
There are decidable cases where the answer is a tacit no. You want an undecidable case or a no for real?
– Jefferson Quesado
@Jeffersonquesado waiting for the answer :)
– Maniero
@Jeffersonquesado I don’t know if I understand your question. Do you mean do I want an example of decidable or undecidable cases? My question is whether or not language can be reduced. I don’t think I’m making myself clear.
– Léo Eduardo Silva
Above I meant "that grammar may or may not be reduced".
– Léo Eduardo Silva
@Léoeduardosilva thanks for the emphasis, I had read language. Ok, grammar really is undecidable. I’m working on the answer
– Jefferson Quesado
@Jeffersonquesado I already thank you for your commitment. Just as Maniero I am waiting :-)
– Léo Eduardo Silva
And so am I (just to put a little pressure on it :p)
– Jéf Bueno