Regex PHP problems

Asked

Viewed 216 times

2

I am trying to get some information from a web page and I am using regex for such.

I’m using regex101.com to test the Pattern and I managed to reach one that suits me perfectly. It turns out that this Pattern works perfectly in regex101.com, but when I do the same thing in PHP, it doesn’t give any match.

This is my code in regex101.com: https://regex101.com/r/zgZL8W/2

Note that 7 occurrences are found.

This is my PHP code. Although using the same text and the same Pattern, no match is made.

$pattern = '/<span\>([0-9\/]{10})( {0,})+([A-zÁ-ÿ ]+) {1,}(&#[0-9]{1,6})*[; ]*([0-9]{2}\:{0,}[0-9]{0,2}) +[A-Z]+ +([A-z]{2,}-{0,}[A-z]*)<br \/> <\/span>([0-9]{1,2}). +([0-9]{1,5}) {0,}[0-9]{1,2} [ A-zÁ-ÿ]{4,9}<br \/> ([0-9]{1,2}). +([0-9]{1,5}) +[0-9]{1,2} +[A-zÁ-ÿ]{4,9}<br \/> ([0-9]{1,2}). +([0-9]{1,5}) +[0-9]{1,2} +[A-zÁ-ÿ]{4,9}<br \/> ([0-9]{1,2}). +([0-9]{1,5}) +[0-9]{1,2} +[A-zÁ-ÿ]{4,9}<br \/> ([0-9]{1,2}). +([0-9]{1,5}) +[0-9]{1,2} +[A-zÁ-ÿ]{4,9}<br \/> ([0-9]{1,2}). +([0-9]{1,5}) {1,}[0-9]{1,2} +[A-zÁ-ÿ]{4,9}<br \/> ([0-9]{1,2}). +([0-9]{1,5}) +[0-9]{1,2} +[A-zÁ-ÿ]{4,9}</';

$mirror = file_get_contents('http://resultadodojogodobicho.deunopostehoje.com/sao-paulo/');

$data = [];

preg_match_all($pattern, $mirror, $data, PREG_SET_ORDER, 0);

var_dump($data);

Does anyone know what might be going on?

1 answer

2


There are two errors in your Pattern that when passing to PHP can cause this error. The first one is very simple.

    • Your PHP code is being handled with formatting ASCII, and its text has special characters such as Á, É, etc. and needs to be treated as UTF-8. In order for the compiler to fix this error, you will need to add the flag /u in his regex.

    • The other error is also just a lack of attention. To separate the white characters from your Pattern, you used a space: . Which can cause the compiler to not identify them correctly depending on the version of your PHP. This error can be solved with the terms: \s or \h.
NOTE: Do not use \s in that case! The term may give match in all types of white characters, including line breaks (\n)! Use \h to give match in all space characters horizontal.

New regex:

/<span\>([0-9\/]{10})(\h{0,})+([A-zÁ-ÿ\h]+)\h{1,}(&#[0-9]{1,6})*[;\h]*([0-9]{2}\:{0,}[0-9]{0,2})\h+[A-Z]+\h+([A-z]{2,}-{0,}[A-z]*)<br\h\/>\h<\/span>([0-9]{1,2}).\h+([0-9]{1,5})\h{0,}[0-9]{1,2}\h[\hA-zÁ-ÿ]{4,9}<br\h\/>\h([0-9]{1,2}).\h+([0-9]{1,5})\h+[0-9]{1,2}\h+[A-zÁ-ÿ]{4,9}<br\h\/>\h([0-9]{1,2}).\h+([0-9]{1,5})\h+[0-9]{1,2}\h+[A-zÁ-ÿ]{4,9}<br\h\/>\h([0-9]{1,2}).\h+([0-9]{1,5})\h+[0-9]{1,2}\h+[A-zÁ-ÿ]{4,9}<br\h\/>\h([0-9]{1,2}).\h+([0-9]{1,5})\h+[0-9]{1,2}\h+[A-zÁ-ÿ]{4,9}<br\h\/>\h([0-9]{1,2}).\h+([0-9]{1,5})\h{1,}[0-9]{1,2}\h+[A-zÁ-ÿ]{4,9}<br\h\/>\h([0-9]{1,2}).\h+([0-9]{1,5})\h+[0-9]{1,2}\h+[A-zÁ-ÿ]{4,9}</u

My result:

C:\wamp64\www\testcode.php:9:
array (size=7)
  0 => 
    array (size=21)
      0 => string '<span>12/08/2017 SÁBADO &#8211; 14 HORAS PT-SP<br /> </span>1° 6319  05 Cachorro<br /> 2° 7792  23 Urso<br /> 3° 0978  20 Peru<br /> 4° 0043  11 Cavalo<br /> 5° 8487  22 Tigre<br /> 6° 3619  05 Cachorro<br /> 7° 237  10 Coelho<' (length=242)
      1 => string '12/08/2017' (length=10)
      2 => string '' (length=0)
      3 => string 'SÁBADO' (length=7)
      4 => string '&#8211' (length=6)
      5 => string '14' (length=2)
      6 => string 'PT-SP' (length=5)
      7 => string '1' (length=1)
      8 => string '6319' (length=4)
      9 => string '2' (length=1)
      10 => string '7792' (length=4)
      11 => string '3' (length=1)
      12 => string '0978' (length=4)
      13 => string '4' (length=1)
      14 => string '0043' (length=4)
      15 => string '5' (length=1)
      16 => string '8487' (length=4)
      17 => string '6' (length=1)
      18 => string '3619' (length=4)
      19 => string '7' (length=1)
      20 => string '237' (length=3)
  1 => 
    array (size=21)
      0 => string '<span>11/08/2017 SEXTA FEIRA &#8211; 18 HORAS PTN-SP<br /> </span>1° 8116  04 Borboleta<br /> 2° 2115  04 Borboleta<br /> 3° 1720  05 Cachorro<br /> 4° 7308  02 Águia<br /> 5° 2939  10 Coelho<br /> 6° 2198  25 Vaca<br /> 7° 165  17 Macaco<' (length=254)
      1 => string '11/08/2017' (length=10)
      2 => string '' (length=0)
      3 => string 'SEXTA FEIRA' (length=11)
      4 => string '&#8211' (length=6)
      5 => string '18' (length=2)
      6 => string 'PTN-SP' (length=6)
      7 => string '1' (length=1)
      8 => string '8116' (length=4)
      9 => string '2' (length=1)
      10 => string '2115' (length=4)
      11 => string '3' (length=1)
      12 => string '1720' (length=4)
      13 => string '4' (length=1)
      14 => string '7308' (length=4)
      15 => string '5' (length=1)
      16 => string '2939' (length=4)
      17 => string '6' (length=1)
      18 => string '2198' (length=4)
      19 => string '7' (length=1)
      20 => string '165' (length=3)
  2 => 
    array (size=21)
      0 => string '<span>11/08/2017 SEXTA FEIRA &#8211; 14 HORAS PT-SP<br /> </span>1° 2254  14 Gato<br /> 2° 0696  24 Veado<br /> 3° 8048  12 Elefante<br /> 4° 5440  10 Coelho<br /> 5° 3019  05 Cachorro<br /> 6° 9457  15 Jacaré<br /> 7° 568  18 Porco<' (length=248)
      1 => string '11/08/2017' (length=10)
      2 => string '' (length=0)
      3 => string 'SEXTA FEIRA' (length=11)
      4 => string '&#8211' (length=6)
      5 => string '14' (length=2)
      6 => string 'PT-SP' (length=5)
      7 => string '1' (length=1)
      8 => string '2254' (length=4)
      9 => string '2' (length=1)
      10 => string '0696' (length=4)
      11 => string '3' (length=1)
      12 => string '8048' (length=4)
      13 => string '4' (length=1)
      14 => string '5440' (length=4)
      15 => string '5' (length=1)
      16 => string '3019' (length=4)
      17 => string '6' (length=1)
      18 => string '9457' (length=4)
      19 => string '7' (length=1)
      20 => string '568' (length=3)
  3 => 
    array (size=21)
      0 => string '<span>10/08/2017 QUINTA FEIRA &#8211; 18 HORAS PTN-SP<br /> </span>1° 7961  16 Leão<br /> 2° 9257  15 Jacaré<br /> 3° 6104  01 Avestruz<br /> 4° 0089  23 Urso<br /> 5° 3311  03 Burro<br /> 6° 6722  06 Cabra<br /> 7° 694  24 Veado<' (length=246)
      1 => string '10/08/2017' (length=10)
      2 => string '' (length=0)
      3 => string 'QUINTA FEIRA' (length=12)
      4 => string '&#8211' (length=6)
      5 => string '18' (length=2)
      6 => string 'PTN-SP' (length=6)
      7 => string '1' (length=1)
      8 => string '7961' (length=4)
      9 => string '2' (length=1)
      10 => string '9257' (length=4)
      11 => string '3' (length=1)
      12 => string '6104' (length=4)
      13 => string '4' (length=1)
      14 => string '0089' (length=4)
      15 => string '5' (length=1)
      16 => string '3311' (length=4)
      17 => string '6' (length=1)
      18 => string '6722' (length=4)
      19 => string '7' (length=1)
      20 => string '694' (length=3)
  4 => 
    array (size=21)
      0 => string '<span>10/08/2017 QUINTA FEIRA &#8211; 14 HORAS PT-SP<br /> </span>1° 6483  21 Touro<br /> 2° 3411  03 Burro<br /> 3° 8032  08 Camelo<br /> 4° 1259  15 Jacaré<br /> 5° 2156  14 Gato<br /> 6° 1341  11 Cavalo<br /> 7° 113  04 Borboleta<' (length=248)
      1 => string '10/08/2017' (length=10)
      2 => string '' (length=0)
      3 => string 'QUINTA FEIRA' (length=12)
      4 => string '&#8211' (length=6)
      5 => string '14' (length=2)
      6 => string 'PT-SP' (length=5)
      7 => string '1' (length=1)
      8 => string '6483' (length=4)
      9 => string '2' (length=1)
      10 => string '3411' (length=4)
      11 => string '3' (length=1)
      12 => string '8032' (length=4)
      13 => string '4' (length=1)
      14 => string '1259' (length=4)
      15 => string '5' (length=1)
      16 => string '2156' (length=4)
      17 => string '6' (length=1)
      18 => string '1341' (length=4)
      19 => string '7' (length=1)
      20 => string '113' (length=3)
  5 => 
    array (size=21)
      0 => string '<span>09/08/2017 QUARTA FEIRA EXTRAÇÃO DAS 13:20 HORAS PT-SP<br /> </span>1• 8222  06 Cabra<br /> 2• 9302  01 Avestruz<br /> 3• 1143  11 Cavalo<br /> 4• 0626  07 Carneiro<br /> 5• 7363  16 Leão<br /> 6• 6656  14 Gato<br /> 7• 481  21 Touro<' (length=264)
      1 => string '09/08/2017' (length=10)
      2 => string '' (length=0)
      3 => string 'QUARTA FEIRA EXTRAÇÃO DAS' (length=27)
      4 => string '' (length=0)
      5 => string '13:20' (length=5)
      6 => string 'PT-SP' (length=5)
      7 => string '1' (length=1)
      8 => string '8222' (length=4)
      9 => string '2' (length=1)
      10 => string '9302' (length=4)
      11 => string '3' (length=1)
      12 => string '1143' (length=4)
      13 => string '4' (length=1)
      14 => string '0626' (length=4)
      15 => string '5' (length=1)
      16 => string '7363' (length=4)
      17 => string '6' (length=1)
      18 => string '6656' (length=4)
      19 => string '7' (length=1)
      20 => string '481' (length=3)
  6 => 
    array (size=21)
      0 => string '<span>08/08/2017 TERÇA FEIRA &#8211; 18 HORAS PTN-SP<br /> </span>1• 3686  22 Tigre<br /> 2• 8315  04 Borboleta<br /> 3• 0928  07  Carneiro<br /> 4• 8461  16 Leão<br /> 5• 6494  24 Veado<br /> 6• 7884  21 Touro<br /> 7• 649  13 Galo<' (length=257)
      1 => string '08/08/2017' (length=10)
      2 => string '' (length=0)
      3 => string 'TERÇA FEIRA' (length=12)
      4 => string '&#8211' (length=6)
      5 => string '18' (length=2)
      6 => string 'PTN-SP' (length=6)
      7 => string '1' (length=1)
      8 => string '3686' (length=4)
      9 => string '2' (length=1)
      10 => string '8315' (length=4)
      11 => string '3' (length=1)
      12 => string '0928' (length=4)
      13 => string '4' (length=1)
      14 => string '8461' (length=4)
      15 => string '5' (length=1)
      16 => string '6494' (length=4)
      17 => string '6' (length=1)
      18 => string '7884' (length=4)
      19 => string '7' (length=1)
      20 => string '649' (length=3)' (length=3)
  • Cool, Matheus!!! I’ll test and come back here to comment. Thanks! Thanks a lot for the help! :)

  • Matheus, I just had time to test today. That’s great! Thanks for the explanations and the code.

  • Ah, no problem at all.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.