Modify a href URL using PHP HTML DOM

Asked

Viewed 441 times

0

I want to modify a URL of an object that I’m extracting from another site, in this case, from Exame.com.

Follows the code:

 $exame = file_get_html("http://exame.abril.com.br/");
                $exame_posts = $exame->find("p.content-item-title");
                foreach($exame_posts as $i => $value){  
                    if($i < 10){
                    echo $value;
                    }

                }

The code is simple, it takes the "p" tags with the "content-item-title" class and limits to 10 and prints the 10.

The question is: how do I modify the url of some of these links that I get? Some links I can access the website of Exame.com, others (few) no. Let me show you an example of what some links look like:

Links as they are generated in the HTML page:

Os links como são gerados na pagina

How the url looks when I click on one of them:

inserir a descrição da imagem aqui

And after the url is correct, how to avoid a possible conflict in those other links that are without any error?

Hugs!

1 answer

0


I think you can do it using the str_ireplace in the $value before giving echo:

 $site = "http://exame.abril.com.br/";
 $exame = file_get_html($site);
                $exame_posts = $exame->find("p.content-item-title");
               foreach($exame_posts as $i => $value){  
                    if($i < 10){
                    if (!preg_match('/(href="http|href=\'http)/i', $value)) {
                        $value = str_ireplace(array('href="', "href='", $site.'/'), array('href="'.$site, "href='".$site, $site), $value);
                    }
                    echo $value;
                    }

                }       
  • Jader, thank you very much, friend! It worked perfectly, I only had to remove "http://" in the second array passed as str_ireplace parameter. Could you explain to me how this code works?

  • sorry http:// was my mistake when I edited missed it... as to how it basically works it first checks if href has http before including the site address, and to ensure it tests with double quotes and single, by yes it removes a possible double bar after the address, including improved that part in the reply.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.