Regular expression for repeated characters

Asked

Viewed 3,202 times

5

Does anyone know if it is possible to check if a string has all the same characters? For example, can characters followed or more than one in the same string, but not all the same:

That may:

$str = "ttste"; 

But not that:

$str ="ttttttt";
  • Hello I just checked the expression above, but it’s not what I’m looking for, this only recognizes if there are equal characters in a row, and I want you to recognize only if ALL are equal, if 1 is no longer return me FALSE

2 answers

5


To answer from Andrei is much simpler, but since a solution with regular expressions has been requested, follow below:

if (preg_match('/^(.)\1*$/', $str)) {
    // string tem todos os caracteres iguais
}

The markers ^ and $ means, respectively, the beginning and the end of the string. This ensures that it will only have what is specified in the expression.

Next we have (.): the point means "any character (except line breaks)", and the parentheses form a catch group. This means that the first character of the string (because it is just after the ^) will be captured by this group.

Next we have \1, which is a backreference, i.e., it means "the same text that was captured in the first group". In this case, the first group is (.) (which is the first character of the string, as it appears just after the ^).

Next we have the quantifier *, meaning "zero or more occurrences". Therefore, \1* means that the \1 (the same character that is at the beginning of the string) can be repeated several times.

In short, ^ regex checks from the beginning of the string, (.) take the first character and capture (enabling the use of \1). Next, \1* checks if this same character repeats several times, until the end of the string ($).

This ensures that the string will have the same character from start to finish. If it has a different character, the regex fails.


Just one detail: I used *, which means "zero or more occurrences", which means that if the string has only one character, regex considers it valid. But if you want the string to have at least two characters, you can switch to:

preg_match('/^(.)\1+$/', $str)

The + means "one or more occurrences". Then the dot takes the first character, and the \1+ ensures that there will be at least one more character in the regex.


Another detail is that . takes any character even. So if the string is ------- or ~~~, regex also considers it valid.

You can limit to letters by swapping the point for [a-zA-Z], for example, that will only accept letters from a to z (uppercase or lowercase). Or still use \w along with the flag u to accept accented characters (note u just after the second bar):

preg_match('/^(\w)\1*$/u', "ççç"); // válido
preg_match('/^(\w)\1*$/', "ççç"); // inválido

In fact, the flag u also gives this difference if we use the point:

preg_match('/^(.)\1*$/u', "áá"); // true
preg_match('/^(.)\1*$/', "áá"); // false

Anyway, look at the documentation all possibilities.


Using the functions multibyte

Another option - without using regex - is enable as function multibyte, which work for both "normal" strings (ASCII), and tttt, how much for multi-bytes, which have accented characters, etc:

function todos_caracteres_iguais($str) {
    $first = mb_substr($str, 0, 1);
    $len = mb_strlen($str);
    for ($i = 1; $i < $len; $i++) {
        $char = mb_substr($str, $i, 1);
        if ($char != $first) {
            return false;
        }
    }
    return true;
}

Therefore, this function will work for all strings below:

todos_caracteres_iguais("ááá") // true
todos_caracteres_iguais("ááç")) // false
todos_caracteres_iguais("ttttttt") // true
todos_caracteres_iguais("tttste")) // false

If you use a for without the functions multi-byte:

$first = $str[0];
for ($i = 1; $i < strlen($str); $i++) {
    $char = $str[$i];
    if ($char != $first) {
        return false;
    }
}
return true;

It even works for tttt and ttste, but failure to ááá.

  • 2

    Great answer! You like a regex huh! =)

  • 1

    @Andreicoelho Haha! Is that the question asked a solution with regex... Although I think it is not the simplest solution :-)

  • 1

    XD ... Yes! I also like to play with regex!

  • 1

    Sensational, young man! Besides, it still explains.

  • If you like regex you want to face (; ? https://answall.com/questions/365250/expess%C3%a3o-para-treat-url-with-par%C3%a2metro

  • @Caiolourençon I had seen this question, but at the time I was running out of time and I ended up forgetting and leaving it aside. I’ll get back to it as soon as I can :-)

  • I thought I would have seen it, because it has many views... I appreciate so if I can make time for this cause, it’s the last thing for me to finish my project! Face in the good never do a project from scratch ( without assistance of frameworks and CMS already ready ), It is a lot of detail to take care.

Show 2 more comments

3

You can use the count_chars to count characters and then check if there is more than 1 element and if this element has more than 1 character.

$str = "tttttt";

$chars = count_chars($str, 1);
unset($chars[195]); // corrige a numeração com caracteres especiais 

if(count($chars) > 1 || reset($chars) == 1){
    echo "Esta string NÃO TEM todos os caracteres repetidos";
} else {
    echo "Esta string TEM todos os caracteres repetidos";
}

This line will return that:

$chars = count_chars($str, 1);
// Array ( [116] => 6 )

So, if there is only 1 element in the array, it means that all characters are repeated as long as there is more than 1 character in this element. But if there is only 1 element in the array and that element has only 1 character, it means that this string does not have all the repeated characters.

BS.: I know you asked for a regular expression, but I believe that using php’s function for this is easier.

  • 1

    Yes, it had achieved otherwise too, but in its way we also arrived at the same result.

  • @Caiolourençon yes... There are several ways to do this! =)

  • @Caiolourençon a solution with the for not bad!

  • 2

    Just be careful that the solution does not work for multi-byte characters, for example, "ãããããã".

  • @Andersoncarloswoss has a rasão. I made an edition. Now works with special characters of any type.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.