This happens because of the parentheses in this section of regex: ([^<\/]{1,})
.
The parentheses form a catch group, and according to the documentation of preg_match_all
, in the array of pouch the groups are placed separately:
Orders Results so that $matches[0]
is an array of full Pattern Matches, $matches[1]
is an array of strings Matched by the first parenthesized subpattern, and so on.
That is, in $matches[0]
i have an array with all the bit captured by regex, on $matches[1]
I have the content captured by the first capture group, etc.(the groups are numbered in the order they appear in the regex, as yours only has a pair of parentheses, so you will only have one capture group).
So you can ignore $matches[1]
, or remove the capture group from its regex:
$file_contents = '<title>Fastly error: unknown domain 151.101.1.69</title>';
if (preg_match_all('#<title>[^<\/]+<\/title>#i', $file_contents, $matches)) {
print_r ($matches);
}
I removed the parentheses, and I also changed the quantifier {1,}
for +
, which are equivalent (both correspond to "one or more occurrences"). The output is:
Array
(
[0] => Array
(
[0] => <title>Fastly error: unknown domain 151.101.1.69</title>
)
)
But actually, if you’re manipulating HTML, you’d better use DOMDocument
:
$file_contents = '<title>Fastly error: unknown domain 151.101.1.69</title>';
$dom = new DOMDocument();
$dom->loadHtml($file_contents);
$list = $dom->getElementsByTagName("title");
if ($list->length > 0) {
$title = $list->item(0);
// imprimir a tag
print_r($dom->saveHTML($title)); // <title>Fastly error: unknown domain 151.101.1.69</title>
// pegar somente o conteúdo da tag
echo $list->item(0)->textContent; // Fastly error: unknown domain 151.101.1.69
}
That’s because regex is not the best tool for manipulating HTML (for simpler cases may even "work", but also terrible things can happen). Finally, use the most suitable tool for each case, regex is not always the best solution.
Besides, you are calling preg_match_all
within a loop foreach
, but is printing the result out of the loop. This way it will only print the last result. If you want to print the result of all calls, put the print_r
within the loop:
foreach ( $linhas as $url ) {
curl_setopt( $ch, CURLOPT_URL, $url );
curl_exec($ch);
$file_contents = ob_get_contents();
if (preg_match_all('#<title>[^<\/]+<\/title>#i', $file_contents, $matches)) {
print_r ($matches);
}
}
Also note the if
to check whether preg_match_all
found something (if not found, it does not enter the if
, because then there won’t be anything to print).
Thank you for your reply. In relation to Regex, I actually had an error, but it continues to print only the title of the last.txt file value.?
– Antônio Fagundes
@Antôniofagundes Please edit the question and place the contents of the file. If it is too large, reduce it, but so that the error remains.
– hkotsubo
Edited as requested.
– Antônio Fagundes
@Well, I think the problem is that the
print_r
is out of theforeach
. I updated the answer– hkotsubo
That way it doesn’t recognize any input, it doesn’t enter the if by what it seems to me. How is that possible?
– Antônio Fagundes
@Antôniofagundes I don’t know, I can’t reproduce the same mistake here. Maybe it’s some other detail, I don’t know. Tried calling directly with fixed values instead of reading from the file?
– hkotsubo
The same thing happens.
– Antônio Fagundes
@Antoniophagundes Perhaps the
curl
is not returning the whole file (it’s a guess, because I can’t tell what’s in each IP - because if you have onetitle
, regex returns, so if it does not enterif
is because there is notitle
in the text)– hkotsubo
I posted the ip address on the question at the end of it. It has the title tag on both, but returns me blank.I’m thinking it’s the firewall of my network.Funny that by the browser I can access it.
– Antônio Fagundes
@So that must be the problem (something related to the network)
– hkotsubo