I recommend using the PHP Simple HTML DOM Parser, it is great and very easy to use, I use in various scripts to analyze HTML from other sites.
Very good the answer of Bruno Augusto, I just want to complement his reply and give some more details that I think are important to be observed and taken into account. When I need to analyze HTML content and use regular expression for this, I try to make a more complete code because HTML is very irregular, the attributes have no defined order, and can have code with line breaks, I suggest using a more "complete" regular expressionin your case I would use this regular expression:
/<link.*?href=\"([^\"]*?)\".*?\/?>/si
Basically the improvements are 2 replacements:
1 - of (.*?)
for ([^\"]*?)
because it is the right thing to do, because there are no characters "
if the attribute delimiter is also "
, the same goes if it were the character '
.
2 - of >
for \/?>
because there may or may not be the character /
before the character <
.
3 - of /i
for /si
as there may be line breaks between attributes, values, etc... not always the HTML tags on the sites are fully inline, may be a piece on one line and another piece on the other line.
If you use the original regular expression suggested by Bruno Augusto, it may not find certain LINK tag codes if they have broken lines or have carectere /
(bar, representing the closing tag), example:
$string = <<<EOF
<link
rel="shortcut icon"
href="http://localhost/teste/icon.png"
>
EOF;
if ( preg_match_all( '/<link.*?href="(.*?)".*?>/i', $string, $matches, PREG_SET_ORDER ) ) {
var_dump( $matches );
die();
} else {
echo 'Nenhuma tag encontrada.';
/* Esta parte será executada pois não serão encontrados tags, devido as quebras de linhas e adicionalmente também há a presença do caractere "/" (barra) do fechamento da tag LINK */
}
Now using the same example code with the most complete regular expression suggested by me, the results will be obtained successfully:
$string = <<<EOF
<link
rel="shortcut icon"
href="http://localhost/teste/icon.png"
>
EOF;
if ( preg_match_all( '/<link.*?href=\"([^\"]*?)\".*?\/?>/si', $string, $matches, PREG_SET_ORDER ) ) {
/* Tags encontradas com sucesso */
var_dump( $matches );
die();
}