PHP Regex Catch the html tag

Asked

Viewed 1,441 times

2

I would like to ask for help to make a regex that separates the values of this string:

<table>|<tr>[<td>#VALOR#</td>]</tr>|</table>

I would need the regex to break the values as follows:

match 1: match 2: match 3: #VALUE#

I tried, I tried and I’m not getting it. I was using something like this:

(<\s*?table\b[^>]*>).*(<\/table\b[^>]*>)

Thanks in advance

Thank you

3 answers

2

I won’t go into all the explanations about do not use REGEX for HTML again.

I think what you want is this :

~<table>.*?(<tr>.*?(<td>(.*?)</td>).*?</tr>).*?</table>~

Explanation

  • <table> - should the sentence start.
  • .*? - anything less than possible before the next sentence.
  • <tr> - literal must have this sentence.
  • .*? - anything less than possible before the next sentence.
  • <td> - literal must have this sentence.
  • .*? - anything less than possible until you fit the next sentence. (your value will be here).

With this you created 4 groups :

  • 0 - The string captured.
  • 1 - Of <tr> ...</tr>.
  • 2 - Of <td> ...</td>.
  • 3 - The value of <td>.

Addendum

Example

REGEX101

2

  // String a ser tratada
  $string = "<table>|<tr>[<td>#VALOR#</td>]</tr>|</table>"; 

  // Expressão regular 
  $regex  = "#\<table\>\|\<tr\>\[\<td\>(.*)\<\/td\>\]\<\/tr\>\|\<\/table\>#"; 

  // Extrai o conteudo
  preg_match_all($regex,$string,$retorno,PREG_PATTERN_ORDER);

  // Valor #VALOR#
  $valor = $retorno[1][0];

  // Exibi o valor
  echo $valor;
  • Sorry buddy, didn’t show up right what I need. It would be like this: match 1: <table></table>, match 2: <tr></tr>, match 3: <td>#VALOR#</td> .... A thousand apologies, and I appreciate the help!

1

There’s no way you can do what you want with just three exact Matches. You can’t capture, for example, just WW from the string WAW in a single capturing group, whether or not using non-capturing groups.

What gives to do, however, is the following:

$string = "<table>|<tr>[<td>#VALOR#</td>]</tr>|</table>";

$regex = "#(<table>)\|(<tr>)\[<td>([^<]*)<\/td>\](<\/tr>)\|(<\/table>)#";

preg_match($regex, $string, $retorno);

$match1 = $retorno[1] . $retorno[5];
$match2 = $retorno[2] . $retorno[4];
$match3 = $retorno[3];

echo $match1 . "\n";
echo $match2 . "\n";
echo $match3 . "\n";

In the end, the variables $match1, $match2 and $match3 will possess the values <table></table>, <tr></tr> and #VALOR#, respectively, which is what you want.

And you can see the regex running on regex 101.

Considerations:

  • regex assumes that the only variable value in its string is "#VALUE#", which can assume any string that nay has the character <;

  • regex does not handle whitespace. If the string starts with < table> all catches would fail.

  • If you want the example to deal with blank spaces gives a touch, I did not because I thought it would only make the regex more complicated without any real gain. And a question, if you know that the string has exactly this pattern, and that your first 2 Matches will always be "<table></table>" and "<tr></tr>", why don’t you just build these hardcoded strings and just capture the "#VALUE#" ?

Browser other questions tagged

You are not signed in. Login or sign up in order to post.