Apache POI is separating the Runs in the wrong place

Asked

Viewed 28 times

-2

I’m learning to work on Apache POI to work with docx documents and I’m trying to do some checks on an existing document:

XWPFDocument doc = new XWPFDocument(OPCPackage.open("conf/templates/relatorio_modelo_laudo/modelo_laudo.docx"));
this.nPag = doc.getProperties().getExtendedProperties().getUnderlyingProperties().getPages();

for (XWPFParagraph p : doc.getParagraphs()) {
    List<XWPFRun> runs = p.getRuns();

    if (runs != null) {
        for (XWPFRun r : runs) {
            String text = r.getText(0);

            if (text != null) {
                while (text.contains("<<")) {
                    String x = text.substring(text.indexOf("<<"), text.indexOf(">>") +2);
                    String result;

                    if (x.contains("{}")) {
                        result = this.getExpressao(x.substring(x.indexOf('{') +1, x.indexOf("}")));
                    } else {
                        String str = x.substring(x.indexOf("<<") +2, x.indexOf(">>"));
                        if (str.equals("nPag")) {
                            result = Integer.toString(nPag);
                        } else {
                            result = params.get(str).isNull() ? "UNDEFINED" : params.get(str).toString();
                        }
                    }

                    text = text.replace(x, result);
                    r.setText(text,0);
                }
            }
        }
    }
}

This would be the method that would take the document and read it, the problem is that in the document I put some fields to be replaced, for example, "<< nPag >>". The system should read, interpret and replace by the variable of the same name, but for some reason Run is cutting my variable, which is in the middle of a paragraph, in 3 parts "... <<" / "nPag" / ">> ...". My variable is coming in 3 different runs.

This book contains << nPag >> pages, electronically numbered...

This is how it is in the document, but this is the only variable that is giving this problem, maybe changing the name solves, but I would like to know the reason... If someone can already explain to me why the command in the second line is not returning the value of the page correctly, I appreciate it (before it was working, updated the document and started to return only 0).

Hugs.

Edit: I tried to rename the variable and still the error persists. I still don’t know why the POI behaves this way only with this variable.

2 answers

0

Before you throw the error in the POI, you should try to take a look at the structure of the XML files that make up your Word document. In the same way I was surprised to discover that Powerpoint clippes into three distinct blocks something like {{marcador}} (I chose something similar to Handlebars as a bookmark in my project).

As the handling of Word documents has not yet been very developed in the POI, you only have the possibility to recover the entire text of your paragraph with texto = p.getText() but you still cannot replace the text of your paragraph with p.setText(texto) as in the case of a Powerpoint paragraph.

I would recommend removing text blocks starting at last until saving only the first. So I would have rewritten your solution as:

String texto = p.getText();
if (1 < runs.size()) {
    for (int i = runs.size() - 1; 0 < i; --i) {
        p.removeRun(i);
    }
    p.getRuns().get(0).setText(texto, 0);
}

0

The way I solved that was to join the runs in a single run:

if (runs.size() > 1) {
    StringBuilder texto = new StringBuilder();

    do {
        texto.append(runs.get(0).getText(0));
        if (runs.size() == 1)
            break;
        p.removeRun(0);
    } while (runs.size() > 0);

    runs.get(0).setText(texto.toString(), 0);
}

As soon as I get the list with my runs, I add all the runs to the text variable until about one run, then I just add the text value back to the run and that’s it.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.