6
Is there any java way to convert a PDF extension file to TXT extension?
6
Is there any java way to convert a PDF extension file to TXT extension?
5
You can try using the library iText, which has some features ready for text extraction from PDF files. A way to do this would be:
public void parsePdf(String pdf, String txt) throws IOException {
PdfReader reader = new PdfReader(pdf);
PdfReaderContentParser parser = new PdfReaderContentParser(reader);
PrintWriter out = new PrintWriter(new FileOutputStream(txt));
TextExtractionStrategy strategy;
for (int i = 1; i <= reader.getNumberOfPages(); i++) {
strategy = parser.processContent(i, new SimpleTextExtractionStrategy());
out.println(strategy.getResultantText());
}
out.flush();
out.close();
reader.close();
}
Where the parameter pdf
is the PDF file that should be extracted the text and parameter txt
is the target TXT file.
This chunk of code was taken from a ready-made example, created by the iText developer. This example, as well as the resulting TXT, can be found in this link.
The Pdfbox library can also help you
Browser other questions tagged java
You are not signed in. Login or sign up in order to post.
The content of PDF can vary a lot, there is no way to extract exactly something standardized, there are many PDF documents that have been generated from files. doc There should be yes, but it won’t be easy. This is just a hint of what you’ll have ahead of you, I’ll search and see if there is any lib. See more.
– Guilherme Nascimento
I already have knowledge of this obstacle, but it would help showing and exemplifying a form and would be grateful @Guilhermenascimento
– Tiago Ferezin