How to extract this information in a string?

Asked

Viewed 77 times

1

Below follows an output string generated by ffmpeg referring some video file:

Stream #0:0(Jpn): Video: H264 ...

Stream #0:1(Jpn): Audio: mp3 ...

Stream #0:2(by): Subtitle ...

In most cases ffmpeg offers a function to track the streams I want to convert through -map and language, but in some video files it is not possible to get such a mapping through language, and that’s why I want to use PHP to track the stream I want by the number and not the language.

How can I with PHP get the video stream keys 0:0, audio 0:1 and the legend 0:2, knowing that they can change position and even language?

2 answers

2

A solution very similar to that presented by Guilherme, but with a little less code, can be implemented as:

if (preg_match_all('/Stream\s[#](\d+:\d+)(\(\w+\))?:(\s\w+|)[ :]+(\w+|)/i', $resposta, $output) == 0) {
    echo 'Nenhuma informação encontrada';
} else {
    print_r(array_map(null, ...$output));
}

In fact, the solution to the problem is exactly the same: use regular expression to extract data from text; what changes is only the way to group this data using the function array_map.

The function array_map receives as the first parameter a function of callback, however, when null, the very value of array is returned. If passed several arrays, is made a compaction, so to speak, similar to function zip python native.

The operator ... used in the call for array_map makes every value of $output is passed as parameter. The equivalent code would be: array_map(null, $output[0], $output[1], $output[2]). This operator is known as splat, he supports Arrays and Traversable and is available from the version PHP 5.6

To the entrance:

Stream #0:0(und): Video: mpeg4 ...
Stream #0:1(jpn): Audio: mp3 ...
Stream #0:1(por): Subtitle:

The output produced would be:

Array
(
    [0] => Array
        (
            [0] => Stream #0:0(und): Video: mpeg4
            [1] => 0:0
            [2] => (und)
            [3] =>  Video
            [4] => mpeg4
        )

    [1] => Array
        (
            [0] => Stream #0:1(jpn): Audio: mp3
            [1] => 0:1
            [2] => (jpn)
            [3] =>  Audio
            [4] => mp3
        )

    [2] => Array
        (
            [0] => Stream #0:1(por): Subtitle:
            [1] => 0:1
            [2] => (por)
            [3] =>  Subtitle
            [4] => 
        )

)

Very similar to the response generated by William’s code, but with a little less than lines.

You can see the code working on Repl.it or in the Ideone.

1


Can use preg_match_all, thus:

<?php

$resposta = 'Stream #0:0(jpn): Video: h264 ...
Stream #0:1(jpn): Audio: mp3 ...
Stream #0:2(por): Subtitle ...';

if (preg_match_all('/Stream\s[#](\d+:\d+)\((\w+)\):\s(\w+)[ :]+(\w+|)/i', $resposta, $output) == 0) {
    echo 'Nenhuma informação encontrada';
} else {
    print_r($output);
}

So just manipulate the array, see the result in ideone: https://ideone.com/T89v4K.

Edited

I was able to create an easier-to-use example, read the descriptions:

<?php
function extrairDados($dados) {
    if (preg_match_all('/Stream\s[#](\d+:\d+)(\(\w+\))?:(\s\w+|)[ :]+(\w+|)/i', $dados, $output) == 0) {
        echo 'Nenhuma informação encontrada', PHP_EOL;
    } else {
        $reorganizado = array(); //Array que terá o resultado final

        //Chaves que serão usadas para tornar mais intuitivo o que é cada item
        $chaves = array(
            'tempo',
            'idioma',
            'formato',
            'codec'
        );

        //Remove o primeiro item do array gerado pelo preg_match_all, ele não é necessário
        array_shift($output);

        //Conta o total de itens
        $y = count($output);

        for ($x = 0; $x < $y; $x++) {
            $item = $output[$x]; //Pega o item atual
            $chave = $chaves[$x]; //Pega a chave atual para identificar no array
            $j = count($item); //Conta "propriedades" do item

            for ($i = 0; $i < $j; $i++) {

                //Se não existir o sub-array irá gerar
                if (isset($reorganizado[$i]) === false) {
                    $reorganizado[$i] = array();
                }

                $str = trim($item[$i]); //Remove espaços em branco
                $str = trim($str, '('); //Remove ( das extremidades
                $str = trim($str, ')'); //Remove ) das extremidades

                //Salva o item no array chave correspondente
                $reorganizado[$i][$chave] = $str;
            }
        }

        //Exibe o array
        return $reorganizado;
    }

    return false;
}

And to use just do so:

$resposta = 'Stream #0:0(jpn): Video: h264 ...
Stream #0:1(jpn): Audio: mp3 ...
Stream #0:2(por): Subtitle ...';

$dados = extrairDados($resposta);

if ($dados) {
    foreach ($dados as $item) {
        echo 'Tempo: ', $item['tempo'], PHP_EOL;
        echo 'Idioma: ', $item['idioma'], PHP_EOL;
        echo 'Formato: ', $item['formato'], PHP_EOL;
        echo 'Codec: ', $item['codec'], PHP_EOL, PHP_EOL;
    }
}

Example with different results:

$resposta1 = '    Stream #0:0(und): Video: h264 (High) ...
    Stream #0:1(und): Audio: aac (LC) ...';

$resposta2 = '    Stream #0:0: Video: mpeg4 ...
    Stream #0:1: Audio: mp3 ...';

$resposta3 = '    Stream #0:0(und): Video: mpeg4 ...
    Stream #0:1(jpn): Audio: mp3 ...
    Stream #0:1(por): Subtitle:';

print_r(extrairDados($resposta1));
print_r(extrairDados($resposta2));
print_r(extrairDados($resposta3));

Example in ideone

  • https://pastebin.com/9aG5qc7H

  • 2

    Apparently, the error happens when the language is not specified, so just set in regex as optional: '/Stream\s[#](\d+:\d+)(\((\w+)\))?:\s(\w+)[ :]+(\w+|)/i'

  • @Andersoncarloswoss Thank you I followed your suggestion and corrected it http://answall.com/revisions/195629/3

  • @Leoletto corrected code, can test, see the example in ideone https://ideone.com/PsvoOk

  • @Guilhermenascimento vlw by the help, seems to serve well for my use, but before applying to my real use I will study a little what you did, because I do not understand much about expressions, so I want to learn more first.

  • 1

    @Leoletto this is the intention, regex at the beginning is very difficult to understand even, but then you will notice how it can be something useful many times, as long as you do not use it unnecessarily. Until the/

Show 1 more comment

Browser other questions tagged

You are not signed in. Login or sign up in order to post.