Difference between 'Regular language' for 'irregular language'

Asked

Viewed 1,014 times

5

Reading a question about basic programming concepts, I come across the following statement:

Let A and B be two languages over the binary alphabet, that is, over the alphabet composed only of 0’s and 1’s. Let A be the language in which the quantity of 0’s and 1’s is equal. Let B be the language where no 0 occurs after a character 1, only B is a regular language.

Looking here on the site find the answer: What is a context-free language? but even then, it was not clear.

My questions:

  1. What are the main differences from a regular language to a irregular language? In the above statement why only the language B is regular?
  2. What are the main examples of the two languages?
  • 2

    If you know English, maybe this will help you

  • I’ll take a look yes, thank you very much @Leonardoalvesmachado

  • 1

    Basically the irregular languages are the non-regular ones, that is there is no specific form, Victor quoted several, and regular language referred to there are the ones that can be expressed with regular expressions (tag: [tag:regex]). I believe it was lost at some point in the text, which is quite extensive and informative, but really irregular is not "specific". In addition, the term irregular was not used in the answers, since they only mention regular and "other". I am not understood of the subject, but it was what I understood in a brief reading.

  • hmm, so if you can use regular expressions it’s basically a regular language? is that right? @Guilhermenascimento

  • 1

    I think not, the regex itself within a programming language actually uses an interpreter, so something like in PHP that uses preg_match or preg_replace, uses a sub-interpreter, that is, regular expression is the so-called "regular language", PHP in this case only takes the result of this, as if it were an internal API, it is similar in other languages.

  • 1

    Helped me clarify, because I had this concept in mind, thank you very much @Guilhermenascimento

  • @Luizaugusto wait some more comments, because I admit that is not my area this part of the concept of languages ("off" programming), I mean, I understand about regex (I must say I love xD) and several programming languages. but the concepts of definition to build interpreters and compilers I do not master, there are some people on the site that will probably confirm or explain more correctly ;)

  • Even not being your 'area' was already of great help, I confess that I’m falling in love with regex too, although taking the first steps with the book of Aurelio Marinho Jargas. Thank you so much! @Guilhermenascimento

  • In Chomsky hierarchical classification formal languages are classified as: Type-0 - Recursive / Recursively enumerable Type-1 - Context sensitive Type-2 - Context-free Type-3 - Regular For each type there is a recognizer, in the case of regular language is a finite automaton.

  • 2

    I will reply as soon as possible, but rather just avoid a common mistake: the modern regular expressions (with rearview mirrors and such) do not represent only the regular languages, but a superset that includes some context-sensitive languages (broader than the context-free ones already mentioned in the linked question).

  • @anonymity , the R class is more restricted than the RE class. I dealt with this in the question about search in width vs deep search

Show 6 more comments

1 answer

4


Well, I think the answer here may not be very pleasant, but it is the correct one: a regular language is one that can be generated by a regular grammar.

And then what would a regular grammar be? Well, it would be a grammar that has only regular productions. Well, that I’ve explained in this answer.

Reiterating what was said in the above answer: a regular language is closely related to a Markov chain, where for each regular language, there is a Markov chain that represents it in form and content.


About irregular grammar... well, we’re dealing here with a semantic problem. I’m taking this prefix i- as being the negation of the radical that follows, in this case would be the regular. In that sense, you would have the "non-regular" languages. Which means any language that cannot be described through a regular language.

But let’s go back one step. What is the definition of number irrational? These would be the numbers complementary to the rational ones (i.e., the nay-rational)? Well, the square root of 2 and even pi are not rational numbers, so I can call them irrational, you know? Well, so far quiet, but what about the fourth root of -1? Would that number irrational? We know better, that number is a number complex, and the complex number is neither rational nor irrational. So the definition of irrational number cannot be simply "that number which is not rational", even more so by defining the real numbers as the union of the rational with the irrational.


The language not being regular means that has no specific form? Well, actually, no. She may have a way extremely specific.

For example, the language D formed by a word of a binary alphabet followed by itself? This language is defined more or less like this: D = { WW, para todo W em {a,b}* }. This language has a very specific form, but it is not regular (I will prove it here after proving the languages A and B).


Over regular expressions and regular languages. At first, regex represented yes regular languages. But they have started to add things that are not regular. For example, they put the possibility of having a rearview mirror in the regex query. In some search engines, I can recognize exactly D:

([ab]*)\1

It also has the options of look-Ahead and Egative look-Ahead to make a better way in the state machine and recognize/deny a word. However, this "looking ahead" means that we are no longer dealing with a Markov chain. That is, not regular. The look-Behind means the existence of memory, which means, again, not to be a Markov chain.

You can see more about regex vs regular languages in this question, in your comments and in the reply accepted.


The @anonymo commented on the Chomsky Hierarchy. However, he put at level 0 both recursive languages and recursively enumerable languages. But this has problems.

A decision problem, in the , is equivalent to recognizing whether the input belongs to the given language, and who defines which valid language is the problem. For example, the Parada problem is so equivalent to checking whether, by chance, a word belongs to an unrestricted grammar; that is a problem of class RE.

However, I can define, on top of any language, its complement language. For example, the Lock Problem (which is the negation of the Stop Problem, which must answer yes if the program crashes). It is the complement of the Halting Problem. It is considered a co-RE class problem. That is, the Lock Problem is equivalent to detecting if a word belongs to a co-RE language.

I go into more details of this complexity class in the answer about search in width vs deep search.


I have already answered the difference between one regular language and the others. Everything else is a consequence of this.

One of these consequences is that regular languages obey the pumping motto. There may be non-scheduled languages that also obey this motto, but not obedience necessarily implies not being regular.

Read more in this reply.

Read more (in English)

So how to use to prove that A is not regular?

Consider the operator . as the word concatenation operator (takes as operand two words, returns its concatenation; "abc" . "def" == "abcdef"), and the operator ^ as the word repetition (receives a word and a natural number and returns that word repeated the number times; ""ba"."na"^3 == "ba"."nanana" == "bananana"; "ba"."na"^0 = "ba").

We need to find a word in the format x.y.z belonging to A and a number p such that:

  • |y| >= 1
  • |x.y| <= p

So that when pumping y sometimes, the new word x.y^n.z no longer belong to A.

So let’s take the word formed by all the 0s the left of all 1s, and both with the same amount. How p is arbitrary, so we can take the word "0"^p . "1"^p. What does that entail? How |x.y| <= p, it is not possible that y has the character 1. That is to say, y = "0"^m, 1 <= m <= p.

We pump it, then, y an additional time. That is, of x.y.z we’ll get x.y^2.z, which is identical to x.y.y.z. So, what about the amount of 0s in that string? Well, how x.y has exactly p 0s, it means that x.y.y.z will have p + m 0but the amount of 1s is not changed. That is, if x.y.y.z belong to A, this necessarily implies in p + m = p; however, we know that 0 < 1 <= m <= p, which means that p < 1 + p <= m + p <= 2*p. Therefore, p < p + m. Soon, x.y.y.z does not belong to A. Soon, A is not regular because it does not meet the motto of pumping.


To prove that B is regular, just write a regex (regex pure, without look-arounds, rearview mirrors or anything other than regular). Basically, we are left with lists, denied lists, groupings, choice (usually indicated by the vertical-bar |), quantifiers (optional, Kleene star, Kleene cross).

In B, none 0 may be preceded by 1. That is, if you have the subpalavra 10, no longer belongs to B. It means just words with 0s (words z) or with 1s (words u) are valid. Also, I can make a new word with z.u no problem.

Who is the set of words z? IS 0*. And all the words u? IS 1*. Therefore, z.u is the regular expression 0*1*.

That is to say, B can be written as 0*|1*|0*1*. But this is trivially simplified to 0*1*. How we have a regular expression, B is a regular language.

Your grammar:

S -> '0'.S
S -> '0'
S ->
S -> '1'.U
U -> '1'.U
U -> '1'

Your finite state automaton (with lambda transaction, whereas all states are states of acceptance):

 +---+       +---+
>| q0|>--+-->| q1|>--+
 +---+   |   +---+   |
   ^     |     ^     |
   |     |     |     |
   +- 0 -+     +- 1 -+

Let’s prove that D is not regular? It is the language of the binary word duplicated.

We chose the word "a"^p . "b"^p . "a"^p . "b"^p. She’s exactly the word "a"^p . "b"^p twice, so it belongs to D. This means that necessarily, y will consist purely of as. So what? So what, pump y means to get the following word:

"a"^n . ("a"^p . "b"^p)^2

Therefore, this new word is not a duplication of word. Therefore, it does not belong to D. Soon, D is not regular.

  • Jefferson, today I am traveling, but tomorrow I will read your reply with the utmost attention, but from now on thank you very much!!

  • Sensational your answer!!! I was able to understand very well when you proved that A is not regular, but B is! Just a little knot in the brain when you mentioned 'look-Ahead and Negative look-Ahead', but I looked here on the site and already managed to overcome the doubts! Thank you so much @Jefferson

Browser other questions tagged

You are not signed in. Login or sign up in order to post.