Crawler to make pagination

Question

Crawler to make pagination

Asked 10 years ago

Viewed 188 times

1

I need a Crawler who makes pagination on a website.

I’m reading the source code and generating a txt that way

public class CodFonte {

public static void crawler(String str) throws IOException {

    URL url = new URL(str);
    HttpURLConnection connection = (HttpURLConnection) url.openConnection();
    connection.setReadTimeout(15 * 1000);
    connection.connect();

    // read the output from the server
    BufferedReader reader = new BufferedReader(new InputStreamReader(
            connection.getInputStream()));
    StringBuilder stringBuilder = new StringBuilder();

    String linha = "";

    String path = System.getProperty("user.home") + "\\Desktop\\"; 
    String fileName = "Fonte Code.txt"; // Nome do arquivo

    FileWriter file = new FileWriter(path + fileName);
    PrintWriter gravarArq = new PrintWriter(file);
    gravarArq.println("SITE -------- " + url);

    while ((linha = reader.readLine()) != null) {
        gravarArq.println(linha);
    }
     file.close();
    reader.close();
}

}

But I need to go to next page, the url is friendly does not change according to the request form that is via POST.

What is the structure of the URL (paging)? It is a number?

– Felipe Douradinho

2015/06/30 at 19:52
is number the input is within a table --- <input type="text" maxlength="4" maxsize="4" value="3" name="pag"></input>

– roque

2015/06/30 at 19:56
And you can already read this input by your java above?

– Felipe Douradinho

2015/06/30 at 19:59
Yes I can get to it using Document doc = Jsoup.parse(html);

– roque

2015/06/30 at 20:13
Well, I once created a Crawler like this in PHP. The idea was the same. The difference is that I put a link already with the next page at the end of the page, so I clicked and then the current page was the next page. You will have to use a method of sleep (to wait between a page and another) and use a loop to follow to the next page, plus a stop condition (ex.: maxPaginasToCraw). First, you have to see how many pages there are, accessing all of them and checking http header (code 200) to save the total number of pages to use in the loop in a variable.

– Felipe Douradinho

2015/06/30 at 20:46
That part of taking the total of pages I got, has a div in which the page itself already speaks the total, then I myself make a conditions and recover the total, now how to submit the post with the parameters?

– roque

2015/06/30 at 20:51
Your problem is just go to the next page then? You can already get the code from the current page?

– Felipe Douradinho

2015/06/30 at 20:52
Yes, I can even generate a txt of the current page, and I need to know how to submit the form via post with this Crawler once I have the next page I can scan it

– roque

2015/06/30 at 20:56

Show 3 more comments

1 answer

Browser other questions tagged java

You are not signed in. Login or sign up in order to post.

by Felipe Douradinho • **3,338** points · Answer 1 · 2015-06-30T21:03:41+00:00

Making a post request and getting feedback, help?

HttpURLConnectionExample.java

package com.meupacote.app;

import java.io.BufferedReader;
import java.io.DataOutputStream;
import java.io.InputStreamReader;
import java.net.HttpURLConnection;
import java.net.URL;

import javax.net.ssl.HttpsURLConnection;

public class HttpURLConnectionExample {

    private final String USER_AGENT = "Mozilla/5.0";

    public static void main(String[] args) throws Exception {

        HttpURLConnectionExample http = new HttpURLConnectionExample();

        System.out.println("\nTesting 1 - Enviar request via POST");
        http.sendPost();

    }

    // HTTP POST request
    private void sendPost() throws Exception {

        String url = "http://www.url.com/";
        URL obj = new URL(url);
        HttpsURLConnection con = (HttpsURLConnection) obj.openConnection();

        //add reuqest header
        con.setRequestMethod("POST");
        con.setRequestProperty("User-Agent", USER_AGENT);
        con.setRequestProperty("Accept-Language", "en-US,en;q=0.5");

        String urlParameters = "param1=valor1&param2=valor2";

        // Send post request
        con.setDoOutput(true);
        DataOutputStream wr = new DataOutputStream(con.getOutputStream());
        wr.writeBytes(urlParameters);
        wr.flush();
        wr.close();

        int responseCode = con.getResponseCode();
        System.out.println("\Enviando 'POST' request para a URL : " + url);
        System.out.println("Parâmetros parameters : " + urlParameters);
        System.out.println("Response Code: " + responseCode);

        BufferedReader in = new BufferedReader(
                new InputStreamReader(con.getInputStream()));
        String inputLine;
        StringBuffer response = new StringBuffer();

        while ((inputLine = in.readLine()) != null) {
            response.append(inputLine);
        }
        in.close();

        //print result
        System.out.println(response.toString());

    }

}