Capture multiple links addresses with Regex in PHP

Question

Capture multiple links addresses with Regex in PHP

Asked 6 years, 5 months ago

Viewed 82 times

0

I’m using the function file_get_contents() on a page that returns multiple links addresses in the body:

#EXTINF:-1 tvg-logo="http://www.brandemia.org/sites/default/files/sites/default/files/axn_logo_antiguo.jpg" group-title="Cine",AXN (MX)
http://live.izzitv.mx/Content/HLS/Live/Channel(AXN)/index.key

#EXTINF:-1 tvg-logo="https://i.imgur.com/Wrgs4X2.png" group-title="Cine",AMC (MX)
http://live.izzitv.mx/Content/HLS/Live/Channel(AMC_HD)/index.key

I wanted to use the function preg_match_all() to search for addresses, specifically those ending with the word key.

PS.: Addresses can start with http or also with https.

Example: http://live.izzitv.mx/Content/HLS/Live/Channel(AXN)/index.key

But all my attempts were drastic, I couldn’t get anywhere near the expected result...

Have you tried in any way? Only regular expression is enough to cure your doubt?

– LipESprY

2019/01/22 at 02:04
look I researched yes on regular expression, but I could not at all do something that worked

– Paulo Vitor

2019/01/22 at 02:08
So. Regular expression is very useful when such page content follows a pattern... Besides, it is not difficult to do... An answer based on the question link solves you?

– LipESprY

2019/01/22 at 02:09
Yes, all help is welcome

– Paulo Vitor

2019/01/22 at 02:10
1

It is necessary to use file_get_contents + preg_match_all?? If so, give more details: Links are in the elements <a>? Do you have any identification classes? Do you have any HTML excerpts from the site you are using for extraction? What is your code? If it is not necessary to use the above mentioned functions, I recommend using DOMDocument + DOMXPath.

– Valdeir Psr

2019/01/22 at 02:16

1 answer

Browser other questions tagged php regex

You are not signed in. Login or sign up in order to post.

by LipESprY • **4,525** points · Answer 1 · 2019-01-22T02:28:12+00:00

"- [...] a page with several types of links [...]".

As you said it is a page with several links, I assume(imos) that these links are an element a attribute-ridden href and such...

As I mentioned in the comment, to use regular expression the ideal is that the links follow a pattern. In case those links are on a page, I made this regular expression by delimiting the link by quotation marks that normally delimit the attribute href, which is where the link address is (and bla bla bla):

\"((http|https)\:\/\/([^\"]+)\.key)\"

Test in Regex101.com:

Now just capture the group $1 in your script:

<?php

$conteudo_da_pagina = '<a href="http://site.com/palavra.key"></a>
<a href="https://site.com/palavra.key"></a>
<a href="http://subdominio.site.com/palavra.key"></a>
<a href="https://subdominio.site.com/palavra.chave"></a>
<a href="http://subdominio.site.com/uma_coisa_qualquer.keykey"></a>
<a href="https://subdominio.site.com/outra_coisa_qualquer.key"></a>
<a href="https://subdominio.site.com/outra_coisa_qualquer.keys"></a>';

preg_match_all('/\"((http|https)\:\/\/([^\"]+)\.key)\"/', $conteudo_da_pagina, $ocorrencias);

print_r($ocorrencias[1]);

/*
Array
(
    [0] => http://site.com/palavra.key
    [1] => https://site.com/palavra.key
    [2] => http://subdominio.site.com/palavra.key
    [3] => https://subdominio.site.com/outra_coisa_qualquer.key
)
*********************************************************************/

@Edit (according to the question):

<?php

// $texto_da_busca = file_get_contents('...');
$texto_da_busca = '#EXTINF:-1 tvg-logo="http://www.brandemia.org/sites/default/files/sites/default/files/axn_logo_antiguo.jpg" group-title="Cine",AXN (MX)
http://live.izzitv.mx/Content/HLS/Live/Channel(AXN)/index.key

#EXTINF:-1 tvg-logo="https://i.imgur.com/Wrgs4X2.png" group-title="Cine",AMC (MX)
http://live.izzitv.mx/Content/HLS/Live/Channel(AMC_HD)/index.key';

preg_match_all('/((http|https)\:\/\/(.*?)\.key)/', $texto_da_busca, $matches);

print_r($matches[1]);
/* Retorna:
Array
(
    [0] => http://live.izzitv.mx/Content/HLS/Live/Channel(AXN)/index.key
    [1] => http://live.izzitv.mx/Content/HLS/Live/Channel(AMC_HD)/index.key
)
***************************************************************************/