Convert docx file to pdf without losing formatting?


I am converting a docx to pdf file using the Docx4j API, but I am finding it difficult to maintain the original formatting of the text after performing the conversion.


<!-- docx4j -->


Method that performs file replacement and conversion:

    public Response fichaCaptacao(@Context ServletContext servletContext) throws Exception {
        // Exclude context init from timing
        org.docx4j.wml.ObjectFactory foo = org.docx4j.jaxb.Context.getWmlObjectFactory();

        // Font regex (optional)
        // Set regex if you want to restrict to some defined subset of fonts
        // Here we have to do this before calling createContent,
        // since that discovers fonts
        String outputFile = "/home/desenvolvimento/qimob.git/qimob-web/src/main/webapp/resources/templates/contratos/OUT_VariableReplace.docx";
        // Set regex se você quiser definir um grupo de fonte
        String regex = null;
        regex = ".*(Courier New|Arial|Times New Roman|Comic Sans|Georgia|Impact|Lucida Console|Lucida Sans Unicode|Palatino Linotype|Tahoma|Trebuchet|Verdana|Symbol|Webdings|Wingdings|MS Sans Serif|MS Serif).*";


        String docInputStream = servletContext.getRealPath("/") + "/resources/templates/contratos/CONTRATO_LOCACAO_IMOVEL_RESIDENCIAL.docx";
        InputStream docxInputStream = new FileInputStream(docInputStream);

        WordprocessingMLPackage tmpPkg = null;

        tmpPkg = WordprocessingMLPackage.load(docxInputStream);

        MainDocumentPart documentPart = tmpPkg.getMainDocumentPart();

        HashMap<String, String> mappings = new HashMap<>();
        mappings.put("contratante", "Omar Mota");
        mappings.put("naturalidade", "Goiás-GO");
        mappings.put("nacionalidade", "Brasileiro");

        // Refresh the values of DOCPROPERTY fields
        FieldUpdater updater = new FieldUpdater(tmpPkg);

        // Set up font mapper (optional)
        Mapper fontMapper = new IdentityPlusMapper();

        // FO exporter setup (required)
        // .. the FOSettings object
        final FOSettings foSettings = Docx4J.createFOSettings();

        // Document format:
        // The default implementation of the FORenderer that uses Apache Fop will output
        // a PDF document if nothing is passed via
        // apacheFopMime can be any of the output formats defined in org.apache.fop.apps.MimeConstants eg org.apache.fop.apps.MimeConstants.MIME_FOP_IF or
        // FOSettings.INTERNAL_FO_MIME if you want the fo document as the result.

        // Specify whether PDF export uses XSLT or not to create the FO
        // (XSLT takes longer, but is more complete).

//      // Save it
//      if (true) {
//          SaveToZipFile saver = new SaveToZipFile(tmpPkg);
//      } else {
//          System.out.println(XmlUtils.marshaltoString(documentPart.getJaxbElement(), true,
//                  true));
//      }

//      PdfSettings pdfSettings = new PdfSettings();
//      OutputStream out = new FileOutputStream(new File("/home/desenvolvimento/Documents/conversao.pdf"));
//      PdfConversion converter = new org.docx4j.convert.out.pdf.viaXSLFO.Conversion(tmpPkg);
//      converter.output(out,pdfSettings);

        ResponseBuilder builder = Response.ok(
                new StreamingOutput() {
                    public void write(OutputStream output) throws IOException, WebApplicationException {
                        try {
                            Docx4J.toFO(foSettings, output, Docx4J.FLAG_EXPORT_PREFER_XSL);
                        } catch (Docx4JException e) {
                            throw new WebApplicationException(e);

//      // Clean up, so any ObfuscatedFontPart temp files can be deleted
        if (tmpPkg.getMainDocumentPart().getFontTablePart() != null) {
        // This would also do it, via finalize() methods
        updater = null;
        tmpPkg = null;

//      // Prefer the exporter, that uses a xsl transformation
//      // Docx4J.toFO(foSettings, os, Docx4J.FLAG_EXPORT_PREFER_XSL);
//      // Prefer the exporter, that doesn't use a xsl transformation (= uses a visitor)
//      // .. faster, but not yet at feature parity
//      // Docx4J.toFO(foSettings, os, Docx4J.FLAG_EXPORT_PREFER_NONXSL);
//      System.out.println("Saved: " + outputfilepath);



 15:24:27,217 INFO  [org.docx4j.openpackaging.contenttype.ContentTypeManager] (default task-41) Detected WordProcessingML package 
    15:24:27,217 INFO  [org.docx4j.openpackaging.io3.Load3] (default task-41) Instantiated package of type org.docx4j.openpackaging.packages.WordprocessingMLPackage
    15:24:27,218 INFO  [org.docx4j.openpackaging.io3.Load3] (default task-41) package read;  elapsed time: 3 ms
    15:24:27,218 INFO  [] (default task-41) Lazily unmarshalling /word/document.xml
    15:24:27,224 INFO  [] (default task-41) unmarshalling
    15:24:27,224 INFO  [] (default task-41) unmarshalling
    15:24:27,225 INFO  [org.docx4j.model.fields.FieldUpdater] (default task-41) 

    Simple Fields in /word/document.xml
    Found 0 simple fields 

     Complex Fields in /word/document.xml
    Found 0 fields 

    15:24:27,225 WARN  [org.docx4j.fonts.IdentityPlusMapper] (default task-41) WARNING! SubstituterWindowsPlatformImpl works best on Windows.  To get good results on other platforms, you'll probably  need to have installed Windows fonts.
    15:24:27,227 INFO  [org.docx4j.fonts.RunFontSelector] (default task-41) rPrDefault/rFonts referenced Calibri
    15:24:27,227 WARN  [org.docx4j.fonts.RunFontSelector] (default task-41) Font 'Calibri' is not mapped to a physical font. 
    15:24:27,236 INFO  [] (default task-41) Writing temp embedded fonts 1463077467236
    15:24:27,236 WARN  [org.docx4j.fonts.IdentityPlusMapper] (default task-41) - - No physical font for: Calibri
    15:24:27,236 WARN  [org.docx4j.fonts.Mapper] (default task-41) Overwriting existing fontMapping: arial
    15:24:27,236 WARN  [org.docx4j.fonts.IdentityPlusMapper] (default task-41) - - No physical font for: Times New Roman
    15:24:27,244 INFO  [org.docx4j.fonts.RunFontSelector] (default task-41) rPrDefault/rFonts referenced Calibri
    15:24:27,244 WARN  [org.docx4j.fonts.RunFontSelector] (default task-41) Font 'Calibri' is not mapped to a physical font. 
    15:24:27,252 INFO  [] (default task-41) Writing temp embedded fonts 1463077467252
    15:24:27,254 INFO  [org.docx4j.convert.out.common.preprocess.FieldsCombiner] (default task-41) starting
    15:24:27,255 INFO  [org.docx4j.convert.out.common.preprocess.CoverPageSectPrMover] (default task-41) No need to move sectPr 
    15:24:27,261 WARN  [] (default task-41) No w:settings/w:compat element
    15:24:27,265 INFO  [org.docx4j.model.structure.PageDimensions] (default task-41) No cols in this section; defaulting.
    15:24:27,266 INFO  [org.docx4j.fonts.RunFontSelector] (default task-41) rPrDefault/rFonts referenced Calibri
    15:24:27,266 WARN  [org.docx4j.fonts.RunFontSelector] (default task-41) Font 'Calibri' is not mapped to a physical font. 
    15:24:27,266 WARN  [org.docx4j.fonts.RunFontSelector] (default task-41) Calibri is not mapped!
    15:24:27,280 INFO  [org.docx4j.XmlUtils] (default task-41) Using org.apache.xalan.transformer.TransformerImpl
    15:24:27,280 INFO  [org.docx4j.convert.out.common.AbstractConversionContext] (default task-41) /pkg:package
    15:24:27,286 INFO  [org.docx4j.fonts.RunFontSelector] (default task-41) rPrDefault/rFonts referenced Calibri
    15:24:27,286 WARN  [org.docx4j.fonts.RunFontSelector] (default task-41) Font 'Calibri' is not mapped to a physical font. 
    15:24:27,294 INFO  [] (default task-41) Writing temp embedded fonts 1463077467294
    15:24:27,294 INFO  [org.docx4j.convert.out.common.preprocess.FieldsCombiner] (default task-41) starting
    15:24:27,294 INFO  [org.docx4j.convert.out.common.preprocess.CoverPageSectPrMover] (default task-41) No need to move sectPr 
    15:24:27,296 INFO  [org.docx4j.model.structure.PageDimensions] (default task-41) No cols in this section; defaulting.
    15:24:27,296 INFO  [org.docx4j.fonts.RunFontSelector] (default task-41) rPrDefault/rFonts referenced Calibri
    15:24:27,296 WARN  [org.docx4j.fonts.RunFontSelector] (default task-41) Font 'Calibri' is not mapped to a physical font. 
    15:24:27,296 WARN  [org.docx4j.fonts.RunFontSelector] (default task-41) Calibri is not mapped!
    15:24:27,299 INFO  [org.docx4j.XmlUtils] (default task-41) Using org.apache.xalan.transformer.TransformerImpl
    15:24:27,299 INFO  [org.docx4j.convert.out.common.AbstractConversionContext] (default task-41) /pkg:package
    15:24:27,303 WARN  [org.docx4j.fonts.RunFontSelector] (default task-41) Font 'Times New Roman' is not mapped to a physical font. 
    15:24:27,307 WARN  [org.docx4j.fonts.RunFontSelector] (default task-41) Font 'Times New Roman' is not mapped to a physical font. 
    15:24:27,310 WARN  [org.docx4j.fonts.RunFontSelector] (default task-41) Font 'Times New Roman' is not mapped to a physical font. 
    15:24:27,313 WARN  [org.docx4j.fonts.RunFontSelector] (default task-41) Font 'Times New Roman' is not mapped to a physical font. 
    15:24:27,315 INFO  [org.docx4j.fonts.RunFontSelector] (default task-41) rPrDefault/rFonts referenced Calibri
    15:24:27,315 WARN  [org.docx4j.fonts.RunFontSelector] (default task-41) Font 'Calibri' is not mapped to a physical font. 
    15:24:27,317 WARN  [org.docx4j.fonts.fop.util.FopConfigUtil] (default task-41) Document font Calibri is not mapped to a physical font!
    15:24:27,317 WARN  [org.docx4j.fonts.fop.util.FopConfigUtil] (default task-41) Document font Times New Roman is not mapped to a physical font!
    15:24:27,322 WARN  [org.apache.fop.apps.FOUserAgent] (default task-41) Font "Calibri,normal,400" not found. Substituting with "any,normal,400".
    15:24:27,327 WARN  [org.apache.fop.apps.FOUserAgent] (default task-41) The contents of fo:region-body on page 4 exceed its viewport by 42211 millipoints. (See position 1:449)
    15:24:27,327 WARN  [org.apache.fop.apps.FOUserAgent] (default task-41) The contents of fo:region-body on page 3 exceed its viewport by 42211 millipoints. (See position 1:449)
    15:24:27,327 WARN  [org.apache.fop.apps.FOUserAgent] (default task-41) The contents of fo:region-body on page 2 exceed its viewport by 42211 millipoints. (See position 1:449)
    15:24:27,327 WARN  [org.apache.fop.apps.FOUserAgent] (default task-41) The contents of fo:region-body on page 1 exceed its viewport by 42211 millipoints. (See position 1:449)
    15:24:27,331 INFO  [] (default task-41) Using repackaged ToXMLStream
    15:24:27,331 INFO  [] (default task-41) Using repackaged ToXMLStream
    15:24:27,340 INFO  [org.docx4j.model.images.AbstractConversionImageHandler] (default task-41) Wrote @src='file:/tmp/6ccc1fe4-53c9-4661-b078-78c79a9a95d8image1.jpeg
    15:24:27,350 WARN  [org.docx4j.fonts.RunFontSelector] (default task-41) Font 'Times New Roman' is not mapped to a physical font. 
    15:24:27,481 INFO  [org.docx4j.fonts.RunFontSelector] (default task-41) rPrDefault/rFonts referenced Calibri
    15:24:27,481 WARN  [org.docx4j.fonts.RunFontSelector] (default task-41) Font 'Calibri' is not mapped to a physical font. 
    15:24:27,489 WARN  [org.docx4j.fonts.fop.util.FopConfigUtil] (default task-41) Document font Calibri is not mapped to a physical font!
    15:24:27,489 WARN  [org.docx4j.fonts.fop.util.FopConfigUtil] (default task-41) Document font Times New Roman is not mapped to a physical font!
    15:24:27,509 WARN  [org.apache.fop.apps.FOUserAgent] (default task-41) Font "Symbol,normal,700" not found. Substituting with "Symbol,normal,400".
    15:24:27,509 WARN  [org.apache.fop.apps.FOUserAgent] (default task-41) Font "ZapfDingbats,normal,700" not found. Substituting with "ZapfDingbats,normal,400".
    15:24:27,510 WARN  [org.apache.fop.apps.FOUserAgent] (default task-41) Font "Arial,normal,700" not found. Substituting with "Arial,normal,400".
    15:24:27,521 WARN  [org.apache.fop.apps.FOUserAgent] (default task-41) Font "Calibri,normal,400" not found. Substituting with "any,normal,400".
    15:24:27,535 WARN  [org.apache.fop.apps.FOUserAgent] (default task-41) The contents of fo:inline line 1 exceed the available area in the inline-progression direction by 23379 millipoints. (See position 3:11147)
    15:24:27,561 INFO  [org.apache.fop.apps.FOUserAgent] (default task-41) Rendered page #1.

The process files are available here

The PDF file is the result and DOCX is the original file.

If anyone can help me in this challenge I’d be grateful!

