Why doesn’t the regex work?

Asked

Viewed 178 times

0

I’m following the playbook OOP Programming with PHP5 by Hasin Hayder (2007) and arrived at the part of Unit Tests. In given exercise, he sets up a method wordCount() and creates some tests for this method,

class WordCount
{
    public function countWords($sentence)
    {
        return count(split(" ",$sentence));
    }
}

which returns all the words of the variable, ie: $this->assertEquals(4, $wordcount);, if the variable is something like: "my name is john" he will have four words.

It creates a test in case the variable has more spaces: "my name is john " (notice the space after John), that when we wheel returns Failure.

To solve, it modifies the method and adds the preg_replace and that regex there "~\s+~" and its code works as expected, but, I used the same thing and it didn’t work.

It even creates another test, where the variable is "my name is \n\r john" and the same regex realized.

class WordCount
{
    public function countWords($sentence)
    {
        $newsentence = preg_replace("~\s+~"," ",$sentence);
        return count(split(" ",$newsentence));
    }
}

I have checked my code to find possible syntax or structure errors, but everything is ok, at least, the same as what it says in the exercise.

To solve the first two tests, I found preg_replace('/\s*$/','',$sentence); which worked, but the test where there is the \n\r did not pass.

So I’d like to know:

  1. Why the regex he used didn’t work?
  2. How would a regex remove extra spaces and Carriage Return/new line (\n\r);

The complete codes used are here:

  • 1

    What is the connection of the question with phpunit and unit testing? If the question is not directly related to them, it is best to [Edit] and remove these tags

  • But has it sent some error message with the word "deprecated"? Because something not working does not mean it is in disuse. Tomorrow I try to understand the problem.

  • gmsantos, already removed. William, no error appeared with "deprecated", the test just did not pass in phpunit. In the tutorial, passes, but when I did, did not give

  • If nothing appears from "deprecated" why the question title cites that?

  • because it is the doubt that I have. the regex "~ s+~" was belittled? because I believe phpunit would not show errors related to this, so I suppose this regex format is old/deprecated.

  • regex is ok for PHP. I just tested your code, the problem is another, I’m responding.

  • 1

    @gmsantos is already responding, I was going to say split is disused see: http://php.net/manual/en/function.split.php

  • 1

    Recommendation : http://answall.com/questions/110701/o-que-significa-o-shortcut-s-nas-regex

Show 3 more comments

1 answer

5


While trying to execute your code I got the following feedback:

<?php

function countWords($sentence)
{
    $newsentence = preg_replace("/\s+/"," ",$sentence);
    return count(split(" ", $newsentence));
}

echo countWords("my name is john");
echo countWords("my name \n\r is john");
echo countWords("my name \n is john");
echo countWords("my name \n\n\r is john");

Deprecated: Function split() is deprecated in /in/Msmej on line 6

4

Deprecated: Function split() is deprecated in /in/Msmej on line 6

4

Deprecated: Function split() is deprecated in /in/Msmej on line 6

4

Deprecated: Function split() is deprecated in /in/Msmej on line 6

4

We have here the first symptom that something is wrong. Function split() was discontinued. That doesn’t mean it won’t work. But it waits for another regular expression to break your string, and you’re passing a blank, which is not the same as a meta space character \s for the regex.

A regex ~\s+~ is valid itself. PHP accepts any non-alphanumeric character or whitespace as regex delimiter. Usually used // in regex. Note also that '/\s*$/' is a regex completely different from /\s+/, basically by cause of the quantifiers + and *. Finally, prefer single quotes when using regex. In your example you use double quotes, which is not ideal.

You can refactor this method to use preg_split:

function countWords($sentence)
{
    return count(preg_split('~\s+~', $sentence));
}

Finally, there is a point of attention. Try to study with updated materials. The book in question is from 2007 and a lot has changed in PHP in these 9 years. The chance that you are learning something that is no longer used is very great.


Looking further, the problem with this test case is the implementation. The split the extra space case will even return an array with 5 elements, the last position being an empty string.

To arrive at the expected result, give a trim() earlier in the $sentence

return count(preg_split('~\s+~', trim($sentence)));
  • +1 for the full answer, but still in my case "My name is John " <- with the space, returns the value 5, when it should return the value 4, so the test returns Failure. If you don’t mind, you could see the code on github? Testing and Method

  • @wdarking gmsantos edited the answer with an example with trim.

  • worked. I believe that in the booklet there is some error of formatting on the space, because it only uses the regex and escapes the last space, but I believe that Trim is the most correct solution. thank you.

  • @wdarking like gmsantos, this booklet is ancient 9 years ago, using it will teach you wrong paths that today we no longer use more.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.