Jsoup does not bring full HTML Document

Asked

Viewed 63 times

0

When capturing the page and displaying it on the console, I realized that the HTML was not complete. During the execution I can notice that it returns many elements, but when it finishes the execution the console does not have 1/10 of the content that was seen during the execution.

The problem in question is that I cannot capture an existing element on the page (Return = null), I believe, which is related to the situation mentioned above.

Can someone tell me something that might help me solve the problem?

Code:

public class WebCaptura {

 public static void main(String[] args){

     String url = "https://g1.globo.com/";
     Document doc;      
    try{
        doc = Jsoup.connect(url).userAgent("Mozilla").get();
        Element body = doc.getElementsByTag("main").first();

        System.out.println(""+doc.getAllElements());
        System.out.println("----- END -----");
        System.out.println(""+body);

        Element news = doc.getElementsByClass("bstn-hl-wrapper").first();
        System.out.println("--- Conteudo de interesse ---");
        System.out.println(""+news);

    }catch (Exception e) {
        e.printStackTrace();
    }

}
}

Console while running:

inserir a descrição da imagem aqui

Console after execution:

inserir a descrição da imagem aqui

As the element I have interest is on the page:

inserir a descrição da imagem aqui

1 answer

1


This is probably happening due to the limitation of the IDE console itself, you can adjust this limitation as follows:

Opção do console -> Clique direito no console -> Preferências

After opening the settings you should disable the console output limitation:

Exemplo de configuração -> Limitar saída do console

Anyway, removing this limitation is extremely bad as it overloads the console buffer and slows down the IDE.

The best way to solve your problem would be to play all the downloaded content to a file. With this you would always complement the end of the file with the content, and at the end of the proceedings would have the complete document.

  • I removed the console limitation and found the element, but it was with different values in the identifiers, so we could not find the same by Jsoup.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.