Delete word separation by line break - php

Asked

Viewed 86 times

0

Good afternoon folks,

The code below extracts the text from a pdf and prints on the screen. Only that line break separations are not eliminated, and words that have been separated by line break appear as follows: 'sem - pre'.

I tried to use $DCM_conteudo = str_replace(" - ", "", $DCM_conteudo);, but only hyphens are excluded (and not line breaks) .

Does anyone know how to identify and exclude line breaks?

<?php

// Include conexão com o bd e com Composer autoloader.
include_once 'conexaoBD/conexao.php';
include 'vendor/autoload.php';


ini_set('default_charset', 'UTF8');
set_time_limit(6000);


$file_tmp = $_FILES['file']['tmp_name'];
$file_fnl = $_FILES['file']['name'];

// Parse pdf file and build necessary objects.
$parser = new \Smalot\PdfParser\Parser();
$pdf    = $parser->parseFile($file_fnl);

$DCM_conteudo = $pdf->getText();
$DCM_conteudo = str_replace(" - ", "", $DCM_conteudo);
$DCM_nome = $file_fnl;

echo $DCM_conteudo;

?>

### Att: Piece of text extracted from pdf.

In this con - text, featured the Law n° 6.458/2019, authored by councilors, which requires restaurants, snack bars and similar, beach stalls and street vendors to use and provide their customers only straws manufactured exclusively with biodegradable material or recyclable - .

  • Probably the line break is after the hyphen, so simply remove " - \n"?

  • 1

    preg_replace( "/ r| n/", "", $yourString ); <- Try to use this to remove line breaks

  • @Andersoncarloswoss tried, and also tried " - n r" but none worked.

  • @Edwardramos didn’t solve it either.

  • Renata, add in the question an excerpt of the text that has this part with the hyphen and the line break to see how it is

  • @Andersoncarloswoss I added a piece of text extracted from the pdf in the question, and highlighted, as it appears the word separated by line break.

Show 1 more comment

1 answer

-2

In this case the reading comes from an external Adobe standard, you can try the method next to the already used.

$DCM_conteudo = str_replace("\n", "", $DCM_conteudo); 
$DCM_conteudo = str_replace("\r", "", $DCM_conteudo);

Browser other questions tagged

You are not signed in. Login or sign up in order to post.