I cannot extract data from a site or file . txt

Asked

Viewed 781 times

-3

Hello, I would like to extract some data from a website or even an example notepad

#EXTINF:-1 tvg-logo="https://i.imgur.com/rq9vXKI.jpg" group-title="FILMES",Mulher-Maravilha (2017)
http://cdnv4.ec.cx/RedeCanais/RedeCanais/RCFServer1/ondemand/MLHRMRVLHA.mp4
#EXTINF:-1 tvg-logo="https://i.imgur.com/ftEGyMy.jpg" group-title="FILMES",Guardiões da Galáxia Vol. 2 (2017)
http://cdnv4.ec.cx/RedeCanais/RedeCanais/RCFServer1/ondemand/GRDOESDGLXIAVL2.mp4

I would like to withdraw

"https://i.imgur.com/rq9vXKI.jpg", Mulher-Maravilha e http://cdnv4.ec.cx/RedeCanais/RedeCanais/RCFServer1/ondemand/MLHRMRVLHA.mp4

"https://i.imgur.com/ftEGyMy.jpg", Guardiões da Galáxia e http://cdnv4.ec.cx/RedeCanais/RedeCanais/RCFServer1/ondemand/GRDOESDGLXIAVL2.mp4

And send this to a separate DB and in order, I tried to use regular expression, but I can not, this would be a method to facilitate to send movies to the site, send a list and it already separates with your link your image and your proper name, please someone help me!!! I don’t care if I’m extracting from a file a site, just q separate everything correctly and throw everything inside db, thanks

1 answer

0

You can use regular expression (), an example that would work with your . txt:

#tvg-logo="(https?://[^\s]+)"(\s+|)group-title="\w+",([\s\S]+?)(https?://[^\s]+)#

The explanation of the regex:

  • tvg-logo="(https?://[^\s]+)" will take the photo/image/Thumb, the (\s+|) soon after (before the group) is to check spaces, having a separation or more by space or no

  • group-title="\w+", will pick up anything like group-title="FILMES", or group-title="SERIES",

  • ([\s\S]+?) will take everything that comes after the comma until you find the http link

  • https? will search for occurrences with http or https

  • (https?://[^\s]+) will catch the whole link

Then the php script would look like this:

$txt = file_get_contents('arquivo.txt');

$regex = '#tvg-logo="(https?://[^\s]+)"(\s+|)group-title="\w+",([\s\S]+?)(https?://[^\s]+)#';

if (preg_match_all($regex, $txt, $output)) {

    array_shift($output);

    $j = count($output[0]);

    echo '------------------', PHP_EOL;

    for ($i = 0; $i < $j; $i++) {

        $titulo = trim($output[2][$i]); //Pega o titulo

        $imagem = $output[0][$i]; //Pega a imagem

        $url = $output[3][$i]; //Pega a url

        echo 'Titulo: ', $titulo, '<br>';
        echo 'imagem: ', $imagem, '<br>';
        echo 'url: ', $url, '<hr>';
    }
}

An example for testing (online test: https://repl.it/@inphinit/Pegar-videos-no-txt):

$txt = '
#EXTINF:-1 tvg-logo="https://i.imgur.com/rq9vXKI.jpg" group-title="FILMES",Mulher-Maravilha (2017)
http://cdnv4.ec.cx/RedeCanais/RedeCanais/RCFServer1/ondemand/MLHRMRVLHA.mp4
#EXTINF:-1 tvg-logo="https://i.imgur.com/ftEGyMy.jpg" group-title="FILMES",Guardiões da Galáxia Vol. 2 (2017)
http://cdnv4.ec.cx/RedeCanais/RedeCanais/RCFServer1/ondemand/GRDOESDGLXIAVL2.mp4
';

$regex = '#tvg-logo="(https?://[^\s]+)"(\s+|)group-title="\w+",([\s\S]+?)(https?://[^\s]+)#';

if (preg_match_all($regex, $txt, $output)) {

    array_shift($output);

    $j = count($output[0]);

    echo '------------------', PHP_EOL;

    for ($i = 0; $i < $j; $i++) {

        $titulo = trim($output[2][$i]); //Pega o titulo

        $imagem = $output[0][$i]; //Pega a imagem

        $url = $output[3][$i]; //Pega a url

        echo 'Titulo: ', $titulo, PHP_EOL;
        echo 'imagem: ', $imagem, PHP_EOL;
        echo 'url: ', $url, PHP_EOL;
        echo '------------------', PHP_EOL;
    }
}
  • Hello friend thank you so much for your help, but you could help me increment the code to get the image ex: https://i.imgur.com/rq9vXKI.jpg that is inside tvg-logo="https://i.imgur.com/rq9vXKI.jpg", I tried to go by the way you were using ex: tvg-logo="w+" to get everything from tvg logo with ([ s S]+?) only that I can’t already try to go in regexpal and it seems that I can’t reach anything, besides you can help finish this code you could indicate me some site or videos to understand this part of PHP? is fucked, because this part seems to be very interesting.

  • @Caiosalchesttes to pardon, my fault, I forgot about Thumb, one minute!

  • @Caiosalchesttes updated the answer, tests and tells me if you want to test in https://repl.it/@inphinit/Pegar-videos-no-txt as well.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.