Jsoup not returning any value

Asked

Viewed 95 times

1

I have a problem with blibioteca Jsoup, that when I try to make the connection to a certain page, it simply does not return me any value from the connection.

public static void main(String[] args) {

    try{

        Document doc = Jsoup.connect("/").get();

        //Pegando elemento das perguntas
        Elements elements = doc.select("a.question-hyperlink");

        System.out.println("O  titulo da página é: "+doc.title());

        //exibindo titulo da pergunta
        for(int i = 0; i <elements.size(); i++){
            System.out.println(elements.get(i).text());
        }
    }catch(Exception e){
        System.out.println("Erro "+ e);
    }
}

By coincidence I tested with Stack Overflow and it gave the same problem.

Return of Netbeans IDE:

Erro: org.jsoup.HttpStatusException: HTTP error fetching URL. Status=403, URL=/

------------------------------------------------------------------------
BUILD SUCCESS
------------------------------------------------------------------------
Total time: 1.282s
Finished at: Sun May 15 13:26:41 BRT 2016
Final Memory: 5M/109M
------------------------------------------------------------------------

I don’t know if this influences anything, but the kind of project is Maven.

@EDIT

I was able to solve the problem by adding the following method in the connection.

Document doc = Jsoup.connect("/")
                .userAgent("Mozilla").get();
  • Was any of the answer helpful? Don’t forget to choose one and mark it so it can be used if someone has a similar question!

1 answer

0

Important remark: When accessing any page or service using a Web Crawler it is important to check beforehand if the provider allows you to read the content in this way. In some cases it is allowed, the supplier itself provides a API so that this access is done correctly.


The code 403 status HTTP refers to the error Forbidden (Prohibited). This happens for several reasons, but the most recurrent is what the page realized that access is not being done by a browser. In these cases just use the method Connection#userAgent to define it according to a browser accepted by the server. The Chrome 55 in the Windows 8 - 64 bits, for example:

Mozilla/5.0 (Windows NT 6.3; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/55.0.2883.87 Safari/537.36

Adding between the call of Jsoup#connect and of Connection#get as follows:

...
.userAgent("Mozilla/5.0 (Windows NT 6.3; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/55.0.2883.87 Safari/537.36")
...

Browser other questions tagged

You are not signed in. Login or sign up in order to post.