Catch string inside <a> tag without attributes

Asked

Viewed 868 times

0

I’m using gift in PHP to get the link of a tag , where through "getattribute" I can get that link by the href attribute.

Script by Crawler:

<?php
//carregamento da url
$dom = new DOMDocument();
libxml_use_internal_errors(true);
$dom->loadHTMLFile("http://www.linkdosite.com.br");

//pega somente os links
$links = $dom->getElementsByTagName('a');

//array que armazena o valor do crawler
$getLink = array();

$nlinks = 0;

foreach ($links as $pegalink) {

    //aqui pega cada link 
    $link = $pegalink->getAttribute('href');

    $termo = 'detalhe';//Termo para diferenciar dos demias links e pegar apenas os que contenham o termo

    $pattern = '/' . $termo . '/';//Padrão a ser encontrado na string $link

    if (preg_match($pattern, $link)) {
        $getLink[$nlinks] = $link;//Atribui o link ao array $getLink 

        echo $getLink[$nlinks]."<br>";//Imprime o link na tela

        $nlinks++;
    } 

}

Now, I also need to take the string that is inside the 'a' tag, I couldn’t find any example to help me solve this.

Block I picked up via Crawler:

<a href="link">
  <font style="font-size: 14px;" color="black" face="arial"><b>String que eu quero pegar</b></font>
</a>

  • Which class are you using ? but it’s probably something like jQuery ('a font b')->html();

  • has no class, this is a Crawler from another site...need to happen everything on the server side

  • 1

    Yes yes, but you’re not using a PHP class to access the DOM ? for example I use this class: https://github.com/punkave/phpQuery

  • No...I am using the very gift of PHP: http://php.net/manual/en/book.dom.php

3 answers

4


To recover the value of the attributes / recover the string within a tag, do the following:

Example:

//carregamento da url
$dom = new DOMDocument();
libxml_use_internal_errors(true);
$dom->loadHTMLFile("http://google.com");

//pega somente os links
$links = $dom->getElementsByTagName('a');
$nlinks = 0;
foreach($links as $link) {
    // Recupera o texto dentro da tag
    echo $link->nodeValue, PHP_EOL;
    // Recupera o valor de um atributo
    echo $links->item($nlinks)->getAttribute('href'), PHP_EOL;
    $nlinks++;
}

In the PHP documentation, you have a contribution note with the example.

  • Ready! Thank you very much!

2

Can use strip_tags:

<?php
$text = '<a href="link">
  <font style="font-size: 14px;" color="black" face="arial"><b>String que eu quero pegar</b></font>
</a>
';
echo strip_tags($text);

See on Ideone

There are many ways to achieve the result you are looking for, with Domxpath (as said by @Lacobus and mentioned in the link I sent), can do with DOM... But this kind of thing(scraping) is very specific because it depends on the structure of the target page...

The most universal way would be the following:

<?php
$str = file_get_contents("/questions/229996/pegar-string-dentro-de-tag-a-sem-atributos");
$link = preg_match_all("/<a.*?>(.*?)<\/a\>/",$str, $matches);
print_r($matches[1]);
?>

If it doesn’t fit put the link to the site to see the structure...

  • So, but in my case because it is a Crawler from another site, the tags are rendered in HTML even, I would have to find a way to take these specific tags, convert in string to use the strip_tags...it is possible to do this through the DOM?

  • https://answall.com/questions/169892/domxpath-query-com-multiplas-classes

  • It turns out that in my case there is no class or attribute in the tags I need.

  • http://php.net/manual/en/domdocument.getelementsbytagname.php

  • Yes I am using: $dom->getelementsbytagname('a'). Now I need to transform this block of tags that I pick into a string so I can use strip_tags as your example. That’s what I can’t do.

  • 1

    I will need to take a exit, later I try to edit my answer, but do the following edit your question and put your full Crawler code...

  • @Charlesfay I edited...

Show 2 more comments

1

Use the method evaluate() class DOMXPath:

<?php

$html = "<a href=\"link\"><font style=\"font-size: 14px;\" color=\"black\" face=\"arial\"><b>String que eu quero pegar</b></font></a>";

$dom = new DOMDocument();

$dom->loadXML($html);

$xp = new DOMXPath($dom);

$str = $xp->evaluate("string(/a)");

echo $str;
  • So, you declared the html block as a string, in my case I take the html block via: $dom->getelementsbytagname('a'). With this through the resources of DOM PHP I can work certain things, eg: $pegalink->getattribute('href') I can get the link. But since this block of html is not a string, I can’t use your example.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.