Convert PDF to Word without losing styles

Asked

Viewed 187 times

1

Wanted a code to convert a pdf document into word without losing the styles.

I have this class that converts to word but it doesn’t keep the document styles.

public class teste {
public static void main(String[] args) throws IOException {
     System.out.println("Document converted started");
    XWPFDocument doc = new XWPFDocument();
    String pdf = "C:\\Users\\eder\\Downloads\\teste1111.pdf";
    PdfReader reader = new PdfReader(pdf);
    PdfReaderContentParser parser = new PdfReaderContentParser(reader);
    for (int i = 1; i <= reader.getNumberOfPages(); i++) {
        TextExtractionStrategy strategy = parser.processContent(i,
                new SimpleTextExtractionStrategy());
        String text = strategy.getResultantText();
        XWPFParagraph p = doc.createParagraph();
        XWPFRun run = p.createRun();
        run.setText(text);
        run.addBreak(BreakType.PAGE);
    }
    FileOutputStream out = new FileOutputStream("C:\\Users\\eder\\Downloads\\testandoWord.docx");
    doc.write(out);
    out.close();
    reader.close();
    System.out.println("Document converted successfully");
}

I’m using iText and POI. I’ve already looked at the documentation but I haven’t found anything in the style I need. PDF example: inserir a descrição da imagem aqui

Someone knows how to do it?

  • Are you using iText? If so, the strategy SimpleTextExtractionStrategy does not maintain styles. You would have to study the documentation and look for if there is a strategy that parses keeping the style.

  • Yes I am using iText 5.4.4 and Poi. I will take another look at the documentation.

No answers

Browser other questions tagged

You are not signed in. Login or sign up in order to post.