Error trying to access site using Http Service

Asked

Viewed 263 times

0

Hello, people. I really need your help. There is a system in which I am working at the University that consists of entering the academic system of the university, through the requisition of protocol. I’m using Apache’s Http Service classes, but despite knowing the effectiveness of the service, I was unsuccessful on this particular website.

I know it is possible, because an acquaintance of the computer department here has made an application so that students for that purpose access the system and consult notes, schedules, etc.

The process I did was the same already known to install Httpfox (or similar) and discover the attributes to pass on Basicnamevaluepair.

Look what I’m trying to do:

import java.io.BufferedReader;
import java.io.IOException;
import java.io.InputStreamReader;
import java.util.ArrayList;
import java.util.List;

import org.apache.http.HttpResponse;
import org.apache.http.NameValuePair;
import org.apache.http.client.HttpClient;
import org.apache.http.client.entity.UrlEncodedFormEntity;
import org.apache.http.client.methods.HttpPost;
import org.apache.http.impl.client.DefaultHttpClient;
import org.apache.http.message.BasicNameValuePair;

public class NavegadorSite {

    public void x () {

        final HttpClient client = new DefaultHttpClient();
        final HttpPost post = new HttpPost(
                "https://www.sigaa.ufs.br/sigaa/logar.do?dispatch=logOn");
        try {
            final List<NameValuePair> nameValuePairs = new ArrayList<NameValuePair>(1);
            nameValuePairs.add(new BasicNameValuePair("width", "1140"));
            nameValuePairs.add(new BasicNameValuePair("height", "900"));
            nameValuePairs.add(new BasicNameValuePair("urlRedirect", ""));
            nameValuePairs.add(new BasicNameValuePair("acao", ""));
            nameValuePairs.add(new BasicNameValuePair("acessibilidade", ""));
            nameValuePairs.add(new BasicNameValuePair("user.login", "unknown"));
            nameValuePairs.add(new BasicNameValuePair("user.senha", ""));

            post.setEntity(new UrlEncodedFormEntity(nameValuePairs));
            final HttpResponse response = client.execute(post);
            final BufferedReader rd = new BufferedReader(new InputStreamReader(
                    response.getEntity().getContent()));
            String line = "";
            while ((line = rd.readLine()) != null) {
                System.out.println(line);
            }
        } catch (final IOException e) {
            e.printStackTrace();
        }
    }

    public static void main (String[] args) {

        NavegadorSite ns = new NavegadorSite();
        ns.x();
    }
}

I hope you can help me because I really need to create this virtual robot to access the site and capture some information!

Below is the output obtained on the Eclipse console. Of the tested sites, only the one that generates this type of error, thus consolidating a huge frustration.

javax.net.ssl.SSLHandshakeException: sun.security.validator.ValidatorException: PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target
    at sun.security.ssl.Alerts.getSSLException(Unknown Source)
    at sun.security.ssl.SSLSocketImpl.fatal(Unknown Source)
    at sun.security.ssl.Handshaker.fatalSE(Unknown Source)
    at sun.security.ssl.Handshaker.fatalSE(Unknown Source)
    at sun.security.ssl.ClientHandshaker.serverCertificate(Unknown Source)
    at sun.security.ssl.ClientHandshaker.processMessage(Unknown Source)
    at sun.security.ssl.Handshaker.processLoop(Unknown Source)
    at sun.security.ssl.Handshaker.process_record(Unknown Source)
    at sun.security.ssl.SSLSocketImpl.readRecord(Unknown Source)
    at sun.security.ssl.SSLSocketImpl.performInitialHandshake(Unknown Source)
    at sun.security.ssl.SSLSocketImpl.startHandshake(Unknown Source)
    at sun.security.ssl.SSLSocketImpl.startHandshake(Unknown Source)
    at org.apache.http.conn.ssl.SSLSocketFactory.connectSocket(SSLSocketFactory.java:553)
    at org.apache.http.conn.ssl.SSLSocketFactory.connectSocket(SSLSocketFactory.java:412)
    at org.apache.http.impl.conn.DefaultClientConnectionOperator.openConnection(DefaultClientConnectionOperator.java:179)
    at org.apache.http.impl.conn.ManagedClientConnectionImpl.open(ManagedClientConnectionImpl.java:328)
    at org.apache.http.impl.client.DefaultRequestDirector.tryConnect(DefaultRequestDirector.java:612)
    at org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:447)
    at org.apache.http.impl.client.AbstractHttpClient.doExecute(AbstractHttpClient.java:884)
    at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:82)
    at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:107)
    at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:55)
    at br.ufs.httpcomponents.NavegadorSite.x(NavegadorSite.java:35)
    at br.ufs.httpcomponents.NavegadorSite.main(NavegadorSite.java:50)
Caused by: sun.security.validator.ValidatorException: PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target
    at sun.security.validator.PKIXValidator.doBuild(Unknown Source)
    at sun.security.validator.PKIXValidator.engineValidate(Unknown Source)
    at sun.security.validator.Validator.validate(Unknown Source)
    at sun.security.ssl.X509TrustManagerImpl.validate(Unknown Source)
    at sun.security.ssl.X509TrustManagerImpl.checkTrusted(Unknown Source)
    at sun.security.ssl.X509TrustManagerImpl.checkServerTrusted(Unknown Source)
    ... 20 more
Caused by: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target
    at sun.security.provider.certpath.SunCertPathBuilder.build(Unknown Source)
    at sun.security.provider.certpath.SunCertPathBuilder.engineBuild(Unknown Source)
    at java.security.cert.CertPathBuilder.build(Unknown Source)
    ... 26 more

To close, just one more thing that I think is important: There’s something illegal about getting information from websites that way?

For further questions I will be here to discuss. From now on I thank you all.

1 answer

-1

This Apache Client has problems when accessing websites with SSL. In your case, what is causing the problem is exactly the fact that you are trying to access a secure website (https):

new HttpPost("https://www.sigaa.ufs.br/sigaa/logar.do?dispatch=logOn");

A good framework to solve this type of problem is the Crawler4j.

Behold this example usage. In it you can capture exactly the data you need from the site, and the implementation is extremely simple.

And most importantly: it works well with secure websites HTTPS.

  • Hello, andreycleme. Before capturing the information it is necessary to log in to the site, because what I need is restricted to registered. The example you passed does not seem to address this requirement.

  • Murilo, Crawler4j can authenticate. Please see this example in the documentation: https://github.com/yasserg/crawler4j/blob/master/src/main/java/edu/ucis/crawler4j/crawler/authentication/BasicAuthInfo.java

Browser other questions tagged

You are not signed in. Login or sign up in order to post.