How to parse a string with double space separator on the left of a character?

Asked

Viewed 64 times

2

I’m having a hard time parsing a string to turn it into an associative array. For example, see an example of the string cited:

Titulo  Valor  Desc
item1   10     Descrição aqui
novo aqui     AB  Mensagem
Label  text_1      Descrição...

Note: each line of the above string is reading horizontally processed, this is because the process transforms each line into an array and processes it individually.

In the first row I have the column header, in the following lines I have the values, which can contain multiple spaces to the left of the column. However, when observing, we can notice that the left of each column there are 2 blank spaces, which turns out to be a delimiter (separator) of columns.

I couldn’t create a regex that could extract the values. My ultimate goal would be an array like this:

Array
(
    [0] => Array
        (
          'Título' => 'item1',
          'Valor' => '10',
          'Desc' => 'Descrição aqui',
        )
    [1] => Array
        (
        //...
  • Are you sure the file is blank space? If they are spaces, which should happen if any value has 2 or more spaces in a row, something like "Anderson Woss"? It’s just a value.

  • Yeah, I’m sure they’re blanks. Values when inserted cannot contain multiple blank spaces between characters, but for some reason they can contain infinite spaces on the right. However, before each new column there will always be 2 blanks.

2 answers

1

Regex was never my strong suit, but after researching several examples, I came up with a possible solution:

preg_split('/\s\s+/', $string_aqui);

In the initial tests it worked.

  • It could just be \s{2,} which also houses 2 or more blanks (including line breaks, tabulations, etc)

0

//A string inicial
$data = "Titulo  Valor  Desc
item1   10     Descrição aqui
novo aqui     AB  Mensagem
Label  text_1      Descrição...";

//Divide a string pelas quebras de linha
$rows = explode("\n", $data);

//Array com o cabeçalho
$header = array_map(
    //Executa a função trim em todas as string resultantes
    "trim",
    //Divide por dois espaços
    explode(
        "  ",
        //Pega a primeira linha (isso já remove ela do array "$rows")
        array_shift($rows)
    )
);

//Cria uma variável para guardar os dados organizados
$parsed = array();

//Percorre as linhas
foreach ($rows as $row) {
    $row = array_map(
        //Executa a função trim em todas as string resultantes
        "trim",
        //Filtra o array para remover índices em branco
        array_filter(
            //Divide por dois espaços
            explode("  ", $row),
            function($v) {
                return $v !== "";
            }
        )
    );

    //Cria um novo array a partir de dois,
    //o primeiro será usado para os índices
    //e o segundo para os valores
    $parsed[] = array_combine($header, $row);
}

//Mostra o resultado
print_r($parsed);

The result of print_r:

Array
(
    [0] => Array
        (
            [Titulo] => item1
            [Valor] => 10
            [Desc] => Descrição aqui
        )

    [1] => Array
        (
            [Titulo] => novo aqui
            [Valor] => AB
            [Desc] => Mensagem
        )

    [2] => Array
        (
            [Titulo] => Label
            [Valor] => text_1
            [Desc] => Descrição...
        )

)

As you can see there is no need for regular expressions

The function trim is executed to remove possible mirrors after and before the column, for example, " text_1 " flipped "text_1"

The function array_filter is executed to remove possible empty indices, for example, ["Label", "", "", "text_1", "Descrição..."] flipped ["Label", "text_1", "Descrição..."]

If you want to use regular expressions...

$data = "Titulo  Valor  Desc
item1   0     Descrição aqui
novo aqui     AB  Mensagem
Label  text_1      Descrição...";

$rows = explode("\n", $data);

$header = preg_split(
    "/\s{2,}/",
    trim(
        array_shift($rows)
    )
);

$parsed = array();

foreach ($rows as $row) {
    $parsed[] = array_combine(
        $header,
        preg_split(
            "/\s{2,}/",
            $row
        )
    );
}

print_r($parsed);

The idea is more or less the same, except that the preg_split already does the job of explode, array_filter and array_map + trim. The end result is the same

  • Just be very careful with the array_filter, for array_filter(['0']) returns []. If it exists in the string the line foo 0 bar the code will be error, because after the filter would remain only ['foo', 'bar'], https://ideone.com/B39sFi.

  • Good to have warned @Andersoncarloswoss, I found that as a string, it does not remove, just as a whole

  • there is another complication, I identified that there may be cases where the column can contain empty value, now I believe the parse has become even more complicated.

  • 2

    I would say impossible, there is no way of knowing if that empty mirror is an empty column or not if the amount of spaces is variable (from 2 to infinity), you can even know, by the amount of items in the array, that a column is missing somewhere, but where? I suggest changing the method that creates this data

Browser other questions tagged

You are not signed in. Login or sign up in order to post.