Capture multiple links addresses with Regex in PHP

Asked

Viewed 82 times

0

I’m using the function file_get_contents() on a page that returns multiple links addresses in the body:

#EXTINF:-1 tvg-logo="http://www.brandemia.org/sites/default/files/sites/default/files/axn_logo_antiguo.jpg" group-title="Cine",AXN (MX)
http://live.izzitv.mx/Content/HLS/Live/Channel(AXN)/index.key

#EXTINF:-1 tvg-logo="https://i.imgur.com/Wrgs4X2.png" group-title="Cine",AMC (MX)
http://live.izzitv.mx/Content/HLS/Live/Channel(AMC_HD)/index.key

I wanted to use the function preg_match_all() to search for addresses, specifically those ending with the word key.

PS.: Addresses can start with http or also with https.

Example: http://live.izzitv.mx/Content/HLS/Live/Channel(AXN)/index.key

But all my attempts were drastic, I couldn’t get anywhere near the expected result...

  • Have you tried in any way? Only regular expression is enough to cure your doubt?

  • look I researched yes on regular expression, but I could not at all do something that worked

  • So. Regular expression is very useful when such page content follows a pattern... Besides, it is not difficult to do... An answer based on the question link solves you?

  • Yes, all help is welcome

  • 1

    It is necessary to use file_get_contents + preg_match_all?? If so, give more details: Links are in the elements <a>? Do you have any identification classes? Do you have any HTML excerpts from the site you are using for extraction? What is your code? If it is not necessary to use the above mentioned functions, I recommend using DOMDocument + DOMXPath.

1 answer

0


"- [...] a page with several types of links [...]".

As you said it is a page with several links, I assume(imos) that these links are an element a attribute-ridden href and such...

As I mentioned in the comment, to use regular expression the ideal is that the links follow a pattern. In case those links are on a page, I made this regular expression by delimiting the link by quotation marks that normally delimit the attribute href, which is where the link address is (and bla bla bla):

\"((http|https)\:\/\/([^\"]+)\.key)\"

Test in Regex101.com:

Print do RegEx101.com

Now just capture the group $1 in your script:

<?php

$conteudo_da_pagina = '<a href="http://site.com/palavra.key"></a>
<a href="https://site.com/palavra.key"></a>
<a href="http://subdominio.site.com/palavra.key"></a>
<a href="https://subdominio.site.com/palavra.chave"></a>
<a href="http://subdominio.site.com/uma_coisa_qualquer.keykey"></a>
<a href="https://subdominio.site.com/outra_coisa_qualquer.key"></a>
<a href="https://subdominio.site.com/outra_coisa_qualquer.keys"></a>';

preg_match_all('/\"((http|https)\:\/\/([^\"]+)\.key)\"/', $conteudo_da_pagina, $ocorrencias);

print_r($ocorrencias[1]);

/*
Array
(
    [0] => http://site.com/palavra.key
    [1] => https://site.com/palavra.key
    [2] => http://subdominio.site.com/palavra.key
    [3] => https://subdominio.site.com/outra_coisa_qualquer.key
)
*********************************************************************/

@Edit (according to the question):

<?php

// $texto_da_busca = file_get_contents('...');
$texto_da_busca = '#EXTINF:-1 tvg-logo="http://www.brandemia.org/sites/default/files/sites/default/files/axn_logo_antiguo.jpg" group-title="Cine",AXN (MX)
http://live.izzitv.mx/Content/HLS/Live/Channel(AXN)/index.key

#EXTINF:-1 tvg-logo="https://i.imgur.com/Wrgs4X2.png" group-title="Cine",AMC (MX)
http://live.izzitv.mx/Content/HLS/Live/Channel(AMC_HD)/index.key';

preg_match_all('/((http|https)\:\/\/(.*?)\.key)/', $texto_da_busca, $matches);

print_r($matches[1]);
/* Retorna:
Array
(
    [0] => http://live.izzitv.mx/Content/HLS/Live/Channel(AXN)/index.key
    [1] => http://live.izzitv.mx/Content/HLS/Live/Channel(AMC_HD)/index.key
)
***************************************************************************/
  • Lipe the site that I query, it displayed a text file, however I do not have access to this file, and in this file the links, are not displayed within href"" ,are practically loose and would have to find them without attributes, this is possible ?

  • Lipe this code worked for a situation, but it is returning me a list with all the urls, there would be a modification to only display one at a time ?? Array ( [0] => link 1, [1] => link 2

Browser other questions tagged

You are not signed in. Login or sign up in order to post.