Filter word in text with php

Asked

Viewed 1,556 times

5

I would like to extract some words in a text with php. But they are not fixed words.. I want words that will always change but they will be next to default EX keywords:

ID : 123123 Name : Elvis Address : Totis bla

I want to filter the values of "ID" "Name" "Address". There is no default tab ID:123, Name: Elvis, Address! It separates by space, by dash, and there goes... Because the text comes from a text extracted from a PDF file and saved in an ID variable: 123, Name: Elvis - Address: toest.

  • Question: Where does this data come from? Some Json?

  • @Marceloaymone does it really matter?? This information at the end where I will make the filter will be in a variable.

  • It matters, because if they come from a json format, or something like that, it might be simpler to turn into an array.

  • So, it comes from an expensive text. I do the reading of a standard pdf file, and in that pdf file has that specific information that I need to remove...

  • I get it, so there’s a need to work with the same string

  • Can the value of each item have space? For example, "Name" can come with "Elvis da Silva"?

  • It brings me back to this, where I save in a variable. So I wanted to get this information, like Lawyer.. Publication code. .publication.... Lawyer LAWYER (OAB: 0000 SC) Publication Code 309437294 Newspaper Availability 02/04/2014 Newspaper Publication 03/04/2014

  • Apparently, I’m getting it. Rsr

  • @Marceloaymone sorry, it is.. It will be string :D

  • At least it has a type order: it starts from the ID, then it is the Name and then it is the Address? and they are separated by : ?

  • This, will always be the standard indexes with the same name, and are separated by space, because it comes from a table. Ai are separated by space

  • @Douglasbernardino, I was looking here again, and I was thinking of another solution to help you, but for that I would need a more complete example, with the real indexes, I would have like?

Show 7 more comments

3 answers

10


Try it this way:

$string = "ID : 123123 Nome : Elvis Costelo da silva Endereço : Totis bla florianópolis";

$id = preg_split('#(?<!\\\)ID :|Nome :|Endereço :#', $string, -1, PREG_SPLIT_NO_EMPTY | PREG_SPLIT_DELIM_CAPTURE);

$id = array_map('trim', $id); //Adicionado para eliminar os espaços

var_dump($id);

Prints:

array (size=3)
   0 => string '123123' (length=6)
   1 => string 'Elvis Costelo da silva' (length=22)
   2 => string 'Totis bla florianópolis' (length=24)

Update - Solution without regex: (With New user information).

$string = "Advogado ADVOGADO (OAB: 0000 SC) Código da Publicação 309437294 Disponibilização do Jornal 02/04/2014 Publicação do Jornal 03/04/2014";

$indice = array("Advogado", "Código da Publicação", "Disponibilização do Jornal", "Publicação do Jornal");

for ($i = 0; $i < sizeof($indice); $i++) {
    $a = strpos($string, $indice[$i]);
    $a_size = strlen($indice[$i]);
    if (isset($indice[$i + 1])) {
        $b = strpos($string, $indice[$i + 1]);
    } else {
        $b = strlen($string);
    }
    $valores[] = substr($string, $a + $a_size, $b - $a - $a_size);
}
$resultado = array_combine($indice, $valores);
$resultado = array_map('trim', $resultado);
var_dump($resultado);

Upshot:

array (size=4)
    'Advogado' => string 'ADVOGADO (OAB: 0000 SC)' (length=23)
    'Código da Publicação' => string '309437294' (length=9)
    'Disponibilização do Jornal' => string '02/04/2014' (length=10)
    'Publicação do Jornal' => string '03/04/2014' (length=10)
  • If add surnames, add address, worked here.

  • Yeah, I’m gonna have to check it regularly, because that information was for example. Now I have to make expression for my need.. I already test and speak the result

  • Uhu, take a look at these functions I used, all have examples in the php manual;

  • 3

    Great solution. To get better it would only take one array_map( 'trim', $id );about $id to clean the spaces around. More information about the ER itself, I leave as reference to who interest the modern metacharacter (?<!ER)

  • True, I’ll complement

  • I didn’t know this expression and I didn’t have the habit of using preg_split. Very good solution.

  • @Marceloaymone Very good! And thank you very much for your attention!

  • @Marceloaymone: This comment does not interest you for this topic, but I ask that when possible take a look on that topic of the META which may be of interest to you. To those who think that this can be spam, this attitude was suggested in the META itself by others of greater caliber than me.

Show 3 more comments

2

I’m not sure how you take these values because it’s not clear what the question is, but let’s say it’s like this (with the tab ",").

$dados = "ID : 123123, Nome : Elvis, Endereço : Totis";

Then just do the following to get the Keys.

$dados = "ID : 123123, Nome : Elvis, Endereço : Totis";

$dadosArray = explode(',', $dados);

$dadosCorretos = array();
foreach ($dadosArray as $da) {
    $temp = explode(':', $da);
    $dadosCorretos[$temp[0]] = $temp[1];
}

$dadosKeys = array_keys($dadosCorretos);

The return will be:

dadosCorretos:

array(3) {
  ["ID "]=>
  string(7) " 123123"
  [" Nome "]=>
  string(6) " Elvis"
  [" Endereço "]=>
  string(6) " Totis"
}

dadosKeys:

array(3) {
  [0]=>
  string(3) "ID "
  [1]=>
  string(6) " Nome "
  [2]=>
  string(11) " Endereço "
}
  • In the question comment is how you get these values... And there is no default tab. Type, ID: 123431 - Address : Tide Number: 123144, Name: Doug

  • 1

    In that case then it has to be a better worked regex, I will test here.

1

In this case you can use regular expressions (function: ereg_match or preg_match).

To get the ID value, for example, you can use:

$ereg_pattern = '/ID : (\d*)/';
$string = "ID : 123123 Nome : Elvis Endereço : Totis bla";
preg_match($ereg_pattern, $string, $matches);

You will have the ID code in the $Matches array:

array (size=2)
  0 => string 'ID : 123123' (length=11)
  1 => string '123123' (length=6)

For other cases just change the $ereg_pattern.

  • Dude, I didn’t really understand your function, it would be like giving an exlpicada??

  • What about Name values? Address?

  • Regular expressions is an extensive theme. In short: It is a text search method based on "metacharacters". In this case, the "metacharacters" of the search are among the / in $ereg_pattern. Reading the expression, it says q must start with 'ID : ( d*)' means any number of digits (d represents digits). Then, when running the regular expression function, it will search for this pattern: ID : (DIGITS), and will play inside an array what it found. I recommend you read the Aurélio guide: http://aurelio.net/regex/guia and even buy the book, because it’s worth it. I hope I’ve helped.

  • All right, I get it. But how am I going to read a text where in the middle of the text it will get this information ?

  • If the pattern is always the same as you mentioned, the regular expression itself will do the job for you. In my example, the $string variable will contain your text. But it seems that Marcelo, who understands well of regular expression already gave a more complete example. ;)

Browser other questions tagged

You are not signed in. Login or sign up in order to post.