file_get_content in part specifies the content of the sitemap

Asked

Viewed 122 times

0

Guys, can someone help me:

I have the following code:

 <?php

 $url = file_get_contents('https://www.site.com.br/sitemap.xml');
 echo $url;

 ?>

I need the following:

The sitemap contains several urls with the following structure: www.site.com.br/numero/123/ (I need to get all the numbers between the /numero/ and the /

The links are listed together

Ex: www.site.com.br/numero/123/www.site.com.br/numero/124/www.site.com.br/numero/125/

I need to list as follows:

123
124
125 
etc...

3 answers

2


Can use with preg_match_all in this way:

<?php

$dados = file_get_contents('https://www.site.com.br/sitemap.xml');

if (preg_match_all('#www\.site\.com\.br/numero/([^/]+)/#', $dados, $matches)) {
    $matches = $matches[1];

    foreach ($matches as $value) {
        echo $value, '<br>', PHP_EOL;
    }
}

The #www\.site\.com\.br/numero/([^/]+)/# is the regex, the dots have the \ in front to escape, because the dot matches any character (less lines break), which is within parentheses ([^/]+) will be captured, in case the [^/] indicates that the preg_match_all takes any character except the /, this way he will extract everything that comes after www.site.com.br/numero/ and before the next bar.

Example in IDEONE

XML

Now if you’re using XML and this:

www.site.com.br/numero/123/www.site.com.br/numero/124/www.site.com.br/numero/125/

Actually it is the preview of your browser that did not render the "XML", so the preg_match nor the substr will work, assuming that your Xml (if it’s really an xml) is like this:

<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
   <url>
      <loc>http://www.site.com.br/numero/123/</loc>
      <lastmod>2005-01-01</lastmod>
      <changefreq>monthly</changefreq>
      <priority>0.8</priority>
   </url>
   <url>
      <loc>http://www.site.com.br/numero/124/</loc>
      <lastmod>2005-01-01</lastmod>
      <changefreq>monthly</changefreq>
      <priority>0.8</priority>
   </url>
   <url>
      <loc>http://www.site.com.br/numero/125/</loc>
      <lastmod>2005-01-01</lastmod>
      <changefreq>monthly</changefreq>
      <priority>0.8</priority>
   </url>
   <url>
      <loc>http://www.site.com.br/numero/126/</loc>
      <lastmod>2005-01-01</lastmod>
      <changefreq>monthly</changefreq>
      <priority>0.8</priority>
   </url>
</urlset>

Then you can use GIFT or simplexml_load_file (or simplexml_load_string), in the case using simplexml:

<?php

$urlset = simplexml_load_file('sitemap.xml');

foreach($urlset as $url) {
    if (preg_match('#www\.site\.com\.br/numero/([^/]+)/#', $url->loc, $match)) {
        $numeros[] = $match[1];
    }
}

foreach ($matches as $value) {
    echo $value, '<br>', PHP_EOL;
}

With the $url->loc was taken the value of the tag <loc>, if your XML might have a different format just swap ->loc by the name of the tag you use.

Example in IDEONE

0

This will only work if the content of sitemap is really in the structure of your example. But if it is different you will have to adapt the code.

<?php
$sitemap = file_get_contents('https://www.site.com.br/sitemap.xml');
$lista = array();
$key = 0;
while (strpos($sitemap,'/numero/') > 0) {
    $sitemap = substr($sitemap,strpos($sitemap,'/numero/')+8);
    $lista[$key] = substr($sitemap,0,strpos($sitemap,'/'));
    $key++;
}
/* Aqui você já tem o Array $lista com a seguinte estrutura:
array(3) {
  [0]=>
  string(3) "123"
  [1]=>
  string(3) "124"
  [2]=>
  string(3) "125"
}
*/

//Percorrendo o Array para obter o valor de cada chave...
foreach($lista as $key => $value) {
    echo $value.'<br/>';
}
?>

0

We can make use of a function whose purpose will be to extract 3 characters after finding the position by which to start the said extraction.

function esquerda($str, $length) {
   return substr($str, 0, $length);
}

$url = file_get_contents('https://www.site.com.br/sitemap.xml');

while (strpos($url,'/numero/') > 0) {
    $url = substr($url,strpos($url,'/numero/')+8);
    echo esquerda($url, 3);
    echo "<br>";
}

DOCS:

Browser other questions tagged

You are not signed in. Login or sign up in order to post.