How to use preg_match_all in this excerpt?

Asked

Viewed 156 times

1

<div>
  <span class="dark_text">Type:</span>
  <a href="https://myanimelist.net/topanime.php?type=var1">var2</a></div>~

I need to use preg_match_all to find the variable var2 regardless of what is written in var 1.

Example:

preg_match_all('!<a href="https://myanimelist.net/topanime.php?type=var1">(.*?)</a></div>',$result,$match);
  • Hello Diogo, can you translate the question to English?

  • I need to preg match all the variable var2 regardless of what is written in var 1

  • Hello Diogo, use the EDIT to translate your question, this is Stackoverflow in English. It will make it easier for people to help you.

2 answers

1

That answer is based on the question before editing.

Just use:

/<a href="https:\/\/myanimelist.net\/topanime.php\?type=var1">(.*?)<\/a><\/div>/

It is necessary to escape the / and the ?, otherwise it will not work. So it will get the (.*?) that is between the string.

preg_match_all('/<a href="https:\/\/myanimelist.net\/topanime.php\?type=var1">(.*?)<\/a><\/div>/',$result,$match);


var_dump($match);

Upshot:

array(2) {
  [0]
  array(1) {
    [0]
    string(71) "<a href="https://myanimelist.net/topanime.php?type=var1">var2</a>"
  }
  [1]
  array(1) {
    [0]
    string(4) "var2"
  }
}

Also consider searching for Domdocument, so you can find other options.

1

The first detail is that in PHP as regex must have delimiters (a character indicating the beginning and end of the expression). In your case, you used !, so missed putting another ! in the end.

In addition, several characters have special meaning in regex, such as the dot and the ? (the question indicates that something is optional, then in your case the php? signfica "Ph" followed by an optional "p"). If you want them to be considered common characters with no special meaning, you should escape with \. Then it would be:

$texto = '<div>
<span class="dark_text">Type:</span>
<a href="https://myanimelist.net/topanime.php?type=var1">var2</a></div>';
if (preg_match_all('!<a href="https://myanimelist\.net/topanime\.php\?type=var1">(.*?)</a></div>!', $texto, $match)) {
    var_dump($match);
}

If you only want to get the content inside the tag a (that is to say, var2), just take $match[1].


Do not use regex

But in fact regex is not the ideal tool for manipulating HTML (learn more by reading this famous Soen response).

In this case, it would be better to use DOMDocument:

$texto = '<div>
<span class="dark_text">Type:</span>
<a href="https://myanimelist.net/topanime.php?type=var1">var2</a></div>';
$dom = new DOMDocument();
$dom->loadHtml($texto);
$xpath = new DOMXPath($dom);
foreach ($xpath->query('//a[@href="https://myanimelist.net/topanime.php?type=var1"]') as $link) { // procura elementos a com a URL desejada
    echo $link->textContent; // var2
}

As much as "works" in simpler cases, regex may fail as HTML becomes more complex (for example, if the link is commented, regex does not detect DOMDocument can detect and ignore commented snippets, etc). Regex is not always the best solution.


As indicated in the comments, if var1 is variable in URL, an alternative is to change to:

foreach ($xpath->query('//a[starts-with(@href, "https://myanimelist.net/topanime.php?type=")]') as $link) { // procura elementos a com a URL desejada
    parse_str(parse_url($link->getAttribute('href'), PHP_URL_QUERY), $output);
    if ($output['type'] == 'var1') { // se type é "var1" (ou qualquer outro valor que você quiser) 
        echo $link->textContent; // var2
    }
}
  • The idea of the DOM I agree, regex may even meet, but it is better the native who is ready for the specific focus, still I believe that the use of Xpath in this was even good, but it could be adjusted to treat otherwise, because the intention of the author is that such var1 actually be dynamic, could just use getElementsByTagName('a') and in the iterar of for take the links and use parse_str(parse_url($link, PHP_URL_QUERY), $output); var_dump($output); ....

  • ... but it’s just suggestion, you can solve part of the xpath itself, but instead of [@href=...] would use the [starts-with(@href, "https://myanimelist.net/topanime.php?type=")], or else the link has dynamic querystrings would use a combination [starts-with(@href, "https://myanimelist.net/topanime.php?") and contains(@href, "type=")] and then pick up with parse_url+parse_str

  • @Guilhermenascimento I updated the answer (I had misunderstood, I found that the "var1" was fixed), thanks for the suggestions!

Browser other questions tagged

You are not signed in. Login or sign up in order to post.