How to separate a PDF file line by line, in Java?

Asked

Viewed 2,624 times

3

I need a way that I can read line by line from a pdf file.

I am reading the entire pdf file at once on this command line:

String conteudo=PdfTextExtractor.getTextFromPage(reader, 1);

However I need to read line by line, because I need to know what line this my occurrence.

Any idea?

  • 3

    How about breaking the string you already get in your code by doing the break where you find the line break characters? (usually n and r in some combination).

1 answer

3


Just like the tip given by @Renan, you already have all your text in a String variable, instead of looking for the text you want in the PDF look for it in your String

//aqui voce tem toda a pagina em uma String
String pagina = PdfTextExtractor.getTextFromPage(reader, 1);

//separe em um vetor de Strings cada linha, o final da linha sempre será \n
String[] linhas = pagina.split("\n");

//agora percorra seu vetor de Strings procurando o texto que deseja
int numLinha = 1;
boolean achou = false;
for(String s: linhas) {
    //quando encontrar, marque um flag e caia fora do loop
    if(s.contains("o texto que eu quero")) {
        achou = true;
        break;
    }
    numLinha++;
}

//pronto, agora voce tem a linha que está o texto que procura
if(achou) {
    System.out.println("Seu texto está na linha:" + numLinha);
}
else {
    System.out.println("texto não encontrado");
}
  • What if the \n not working, try the \r ;) +1

  • @Renan Poisé, it is always good to highlight this because it may vary from case to case, but in the test I did with the library itext (which is the one the author is using) always put \n at the end of the line. Thanks.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.