Read a portion of the file contents

Asked

Viewed 197 times

-1

I have a large. txt file which is basically like this:

1000#
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt.#

1001#
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt.#

I want to read only the respective requested ID, but only the text between the tags.

Example, I want to fetch the value of the text 1001. Ai would be returned this way:

Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt.

Here is the code that to try to implement:

$file = "id.txt";
$f = fopen($file, 'rb');
$found = false;
while ($line = fgets($f, 1000)) {
    if ($found) {
       echo $line;
       continue;
    }
    if (strpos($line, "1000") !== FALSE) {
      $found = true;
    }
}

With it I can reach the ID value, but read everything from this direction down! ! I want it to stop at the end of #, IE, read the text between #TEXT#.

  • 4

    Add to your question the attempts you have already made.

  • Excuse my question, I totally forgot my attempts.

2 answers

0

To solve this problem you can use regular expressions.

As you apparently already own the id that you want to read and it is quite obvious the pattern that follows this text file (if all the content is formatted that way). Soon you can write a regular expression like this:

/[valor_do_id]\$\n(.*)\$/

An example using the function preg_match of PHP would be more or less that way:


  $conteudo_do_arquivo = '....';
  $id = 1000;

  $matches; // vai armazenar os resultados da regex.
  preg_match("/" . $id . "\$\n(.*)\$/", $conteudo_do_arquivo, $matches);

   // [0] vai conter toda a string compativel com o regex
   // [1] vai conter apenas o valor do 1º grupo de captura, tudo entre '()'.
  print_r($matches);

You can read more about the preg_match function on php.net

You can also use the Regexr to test your expressions beforehand.

  • The way returns in the browser only: Array ( ). I made some changes by changing the $ to #. Getting it now: #1000# Lorem ipsum dolor sit Amet, consectetur adipiscing Elit, sed do eiusmod tempor incididunt. # 1001# Lorem ipsum dolor sit Amet, consectetur adipiscing Elit, sed do eiusmod tempor incididunt.#

0


There are several means such as:

With regular expressions:

Search for a single record

$data = '1000#
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt.#

1001#
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt.#';

function buscarTexto($id, $data) {
    $re = '/(?<=^'.$id.'#\r?\n).+?(?=#$)/sm';

    return preg_match($re, $data, $match) ? $match[0] : null;
}

Brief explanation of the regular expression

/
(?<=^ #Casa a linha do id registro sem capturá-la
    $id #id do registro
#\n)
.+? #Casa o texto e o captura
(?=#$) #Casa o delimitador final do texto sem capturá-lo
/smx
#o flag s faz com que o ponto capture as quebras de linhas
#o flag m faz com que os operadores ^ e $ casem o inicio e final das linhas
#o flag x não foi utilizado no código mas faz com que o espaços presentes na expressão não sejam considerados.

Load all records into an array where the key is id.

function buscarTextos($data) {
    $re = '/(?<id>^\d+)(?:#\r?\n)(?<text>.+?)(?:#$)/sm';
    $result = [];

    if(preg_match_all($re, $data, $matches, PREG_SET_ORDER))
        //Converte o resultado da expressão em um array id => texto
        foreach($matches as $m) 
            $result[$m['id']] = $m['text'];

    return $result;
}

Brief explanation of the regular expression

/
(?<id>^\d+) #Armazena id do registro em um grupo separado
(?:#\n) #Casa o delimitador do id e a quebra de linha
(?<texto>.+?) #Armazena o texto em outro grupo
(?:#$) #Casa o delimitador final do texto
/gsmx
#o flag g faz com que sejam capturadas todos os registros
#o flag s faz com que o ponto capture as quebras de linhas
#o flag m faz com que os operadores ^ e $ casem o inicio e final das linhas
#o flag x não foi utilizado no código mas faz com que o espaços presentes na expressão não sejam considerados.

No regular expressions

Read row by row and return the desired record

function buscarTexto2($id, $data) {
    $rows = explode("\n", $data);
    $id = $id.'#';
    $text = '';
    $found = false;

    foreach($rows as $r) {
        //Remove possíveis espaços em branco
        $r = trim($r);

        //Verifica se a linha corresponde ao id do registro selecionado
        if($r === $id) 
            $found = true;
        //Caso tenha encontrado o registro
        elseif($found) {
            //Aqui é assumido que o texto pode ter diversas linhas
            $text .= $text == '' ? $r : PHP_EOL.$r;

            //Então caso a linha lida termine com $
            if(substr($text, -1, 1) == '#')
                //Retorna o texto
                return substr($text, 0, -1);
        }
    }
}

Read row by row and store all records in an array where the key is id.

function buscarTextos2($data) {
    $rows = explode("\n", $data);
    $id = null;
    $text = '';
    $result = [];

    foreach($rows as $r) {
        //Remove possíveis espaços em branco
        $r = trim($r);

        //verifica se algum registro está sendo processado no momento
        if($id === null) {

            //Ignora linhas em branco caso nenhum registro esteja sendo processado no momento.
            if(!$r)
                continue;

            //Armazena o id e desconsidera o último caractere que é o $
            $id = substr($r, 0, -1);
        } else {
            //Aqui é assumido que o texto pode ter diversas linhas
            $text .= $text == '' ? $r : PHP_EOL.$r;

            //Então caso a linha lida termine com $
            if(substr($text, -1, 1) == '#') {
                //Adiciona o registro ao array
                $result[$id] = substr($text, 0, -1);

                //E se prepara para o processamento de um novo registro
                $id = null;
                $text = '';
            }
        }
    }

    return $result;
}

In all functions it was assumed that #(hashtag) is the last character of the line.

You can test the code on the following link http://phpfiddle.org/main/code/y1kq-jg7w

If you want to use them with the file content do the following:

$dados = file_get_contents('caminho do arquivo');

$texto = buscarTexto(1000, $dados);
//ou
$texto = buscarTexto2(1000, $dados);
//ou
$textos = buscarTextos($dados);
//ou
$textos = buscarTextos2($dados);
  • Hello Hwapx, thanks for the collaboration. I think I was not very clear in my question. I noticed that using the $(dolar) character can give coflito, as it can be interpreted by the command in another way. I preferred to change from $ to # thus: 1000# Lorem ipsum dolor sit Amet, consectetur adipiscing Elit, sed do eiusmod tempor incididunt. # 1001# Lorem ipsum dolor sit Amet, consectetur adipiscing Elit, sed do eiusmod tempor incididunt. # I also forgot to mention that this text is inside an .txt. file and I am looking for a way to try to read only the text without the ID

  • I changed the functions to use #, the functions searchText and searchTexto2 return only the text of the record, in the link http://phpfiddle.org/main/code/y1kq-jg7w you can test them and see their result, I made them to receive the text directly which you can get from the file with $conteudo = file_get_contents('caminho do arquivo').

  • Here there is no line with this code $content = file_get_contents('file path'), could resend again?

  • I added an example of reading the file in the reply.

  • Still not being read, nothing appears. Remember that it is to read between the two #. 1001#TEXT HERE# 1002#TEXTHERE 2#

  • Did you set the file path? based on your question should look like $file = 'id.txt';&#xA;$dados = file_get_contents($file); try to see if the file is being read by placing a var_dump($dados); in the line following the file_get_contents.

  • When I add var_dump($data); returns multiple IDS, but no request

  • The strange thing about putting all the text contained inside the id.txt file in $data =' '; it returns without problems, but does not do the same by accessing the file.

  • It can be due to the format of the end of the file line, updated the functions in the answer, perform a new test with all of them.

  • My dear friend Hwapx, I realized now the reason for the errors, within the text there are, probably special characters that are harming. See here the file I want to work on: http://brasilro.com/id.txt

  • There have actually been problems with the functions that use regular expression, however I was successful with the ones that do not use it(searchTexto2 and searchTexts2).

  • All settled! Thank you very much! I can have you as a friend?

  • @Marcelocordeiro Claro

Show 8 more comments

Browser other questions tagged

You are not signed in. Login or sign up in order to post.