0
Good afternoon folks,
The code below extracts the text from a pdf and prints on the screen.
Only that line break separations are not eliminated, and words that have been separated by line break appear as follows: 'sem - pre'
.
I tried to use $DCM_conteudo = str_replace(" - ", "", $DCM_conteudo);
, but only hyphens are excluded (and not line breaks) .
Does anyone know how to identify and exclude line breaks?
<?php
// Include conexão com o bd e com Composer autoloader.
include_once 'conexaoBD/conexao.php';
include 'vendor/autoload.php';
ini_set('default_charset', 'UTF8');
set_time_limit(6000);
$file_tmp = $_FILES['file']['tmp_name'];
$file_fnl = $_FILES['file']['name'];
// Parse pdf file and build necessary objects.
$parser = new \Smalot\PdfParser\Parser();
$pdf = $parser->parseFile($file_fnl);
$DCM_conteudo = $pdf->getText();
$DCM_conteudo = str_replace(" - ", "", $DCM_conteudo);
$DCM_nome = $file_fnl;
echo $DCM_conteudo;
?>
### Att: Piece of text extracted from pdf.
In this con - text, featured the Law n° 6.458/2019, authored by councilors, which requires restaurants, snack bars and similar, beach stalls and street vendors to use and provide their customers only straws manufactured exclusively with biodegradable material or recyclable - .
Probably the line break is after the hyphen, so simply remove
" - \n"
?– Woss
preg_replace( "/ r| n/", "", $yourString ); <- Try to use this to remove line breaks
– Edward Ramos
@Andersoncarloswoss tried, and also tried " - n r" but none worked.
– user145547
@Edwardramos didn’t solve it either.
– user145547
Renata, add in the question an excerpt of the text that has this part with the hyphen and the line break to see how it is
– Woss
@Andersoncarloswoss I added a piece of text extracted from the pdf in the question, and highlighted, as it appears the word separated by line break.
– user145547