How to read a PDF containing accented characters using iText?

Asked

Viewed 474 times

0

I am trying to read a PDF using this iText library, however, accented characters are ignored, I have already looked at the project Encoding and this as UTF-8.

PdfReader reader = new PdfReader("arquivo.pdf");
String conteudo = PdfTextExtractor.getTextFromPage(reader,1);
System.out.println(conteudo);

Example:

  • Text in the PDF: Exercícios
  • Exit: Exerc cios
  • Strange guy, I made an example here reading by filename also, with the project and all its resources as UTF-8 and are ok. Do a test by passing the InputStream of your file and not the filename and see if you’re OK. If you’re wrong, try to force the InputStream as UTF-8

  • I don’t understand of iText. It’s just a theory, but you imported some standard source (that supports accents)?

  • It worked Bruno, I do not know why the pdf file I was testing was giving this problem. I tested with another and it worked perfectly! Thank you

1 answer

0

Maybe the pdf wasn’t using an existing source in Windows. I recently had a problem opening a PDF that used a font that does not exist in Windows. After installing the source and registry of the Windows source directory the problem has been solved.

FontProgramFactory.RegisterSystemFontDirectories()    

Note: The above solution was using Itext7

Browser other questions tagged

You are not signed in. Login or sign up in order to post.