Can use regex with preg_match_all
in this way:
<?php
$dados = file_get_contents('https://www.site.com.br/sitemap.xml');
if (preg_match_all('#www\.site\.com\.br/numero/([^/]+)/#', $dados, $matches)) {
$matches = $matches[1];
foreach ($matches as $value) {
echo $value, '<br>', PHP_EOL;
}
}
The #www\.site\.com\.br/numero/([^/]+)/#
is the regex, the dots have the \
in front to escape, because the dot matches any character (less lines break), which is within parentheses ([^/]+)
will be captured, in case the [^/]
indicates that the preg_match_all
takes any character except the /
, this way he will extract everything that comes after www.site.com.br/numero/
and before the next bar.
Example in IDEONE
XML
Now if you’re using XML and this:
www.site.com.br/numero/123/www.site.com.br/numero/124/www.site.com.br/numero/125/
Actually it is the preview of your browser that did not render the "XML", so the preg_match
nor the substr
will work, assuming that your Xml (if it’s really an xml) is like this:
<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<url>
<loc>http://www.site.com.br/numero/123/</loc>
<lastmod>2005-01-01</lastmod>
<changefreq>monthly</changefreq>
<priority>0.8</priority>
</url>
<url>
<loc>http://www.site.com.br/numero/124/</loc>
<lastmod>2005-01-01</lastmod>
<changefreq>monthly</changefreq>
<priority>0.8</priority>
</url>
<url>
<loc>http://www.site.com.br/numero/125/</loc>
<lastmod>2005-01-01</lastmod>
<changefreq>monthly</changefreq>
<priority>0.8</priority>
</url>
<url>
<loc>http://www.site.com.br/numero/126/</loc>
<lastmod>2005-01-01</lastmod>
<changefreq>monthly</changefreq>
<priority>0.8</priority>
</url>
</urlset>
Then you can use GIFT or simplexml_load_file
(or simplexml_load_string
), in the case using simplexml:
<?php
$urlset = simplexml_load_file('sitemap.xml');
foreach($urlset as $url) {
if (preg_match('#www\.site\.com\.br/numero/([^/]+)/#', $url->loc, $match)) {
$numeros[] = $match[1];
}
}
foreach ($matches as $value) {
echo $value, '<br>', PHP_EOL;
}
With the $url->loc
was taken the value of the tag <loc>
, if your XML might have a different format just swap ->loc
by the name of the tag you use.
Example in IDEONE