1
Wanted a code to convert a pdf document into word without losing the styles.
I have this class that converts to word but it doesn’t keep the document styles.
public class teste {
public static void main(String[] args) throws IOException {
System.out.println("Document converted started");
XWPFDocument doc = new XWPFDocument();
String pdf = "C:\\Users\\eder\\Downloads\\teste1111.pdf";
PdfReader reader = new PdfReader(pdf);
PdfReaderContentParser parser = new PdfReaderContentParser(reader);
for (int i = 1; i <= reader.getNumberOfPages(); i++) {
TextExtractionStrategy strategy = parser.processContent(i,
new SimpleTextExtractionStrategy());
String text = strategy.getResultantText();
XWPFParagraph p = doc.createParagraph();
XWPFRun run = p.createRun();
run.setText(text);
run.addBreak(BreakType.PAGE);
}
FileOutputStream out = new FileOutputStream("C:\\Users\\eder\\Downloads\\testandoWord.docx");
doc.write(out);
out.close();
reader.close();
System.out.println("Document converted successfully");
}
I’m using iText and POI. I’ve already looked at the documentation but I haven’t found anything in the style I need. PDF example:
Someone knows how to do it?
Are you using iText? If so, the strategy
SimpleTextExtractionStrategy
does not maintain styles. You would have to study the documentation and look for if there is a strategy that parses keeping the style.– StatelessDev
Yes I am using iText 5.4.4 and Poi. I will take another look at the documentation.
– Eder Aparecido