Jsonexception being launched in Twitter search - Java

Asked

Viewed 182 times

5

I have an algorithm that does a search of old tweets on Twitter. The application worked normally for a few days, but out of nowhere it started to throw an exception.

Code:

public static List<Tweet> getTweets(String username, String since, String until, String querySearch) {
    List<Tweet> results = new ArrayList<Tweet>();

    try {
        String refreshCursor = null;
        while (true) {              
            JSONObject json = new JSONObject(getURLResponse(username, since, until, querySearch, refreshCursor));
            refreshCursor = json.getString("scroll_cursor");   // <<--------
            System.out.println("while");
            Document doc = Jsoup.parse((String) json.get("items_html"));
            Elements tweets = doc.select("div.js-stream-tweet");

            if (tweets.size() == 0) {
                break;
            }

            for (Element tweet : tweets) {
                String usernameTweet = tweet.select("span.username.js-action-profile-name b").text();
                String txt = tweet.select("p.js-tweet-text").text().replaceAll("[^\\u0000-\\uFFFF]", "");
                int retweets = Integer.valueOf(tweet.select("span.ProfileTweet-action--retweet span.ProfileTweet-actionCount").attr("data-tweet-stat-count").replaceAll(",", ""));
                int favorites = Integer.valueOf(tweet.select("span.ProfileTweet-action--favorite span.ProfileTweet-actionCount").attr("data-tweet-stat-count").replaceAll(",", ""));
                long dateMs = Long.valueOf(tweet.select("small.time span.js-short-timestamp").attr("data-time-ms"));
                Date date = new Date(dateMs);

                Tweet t = new Tweet(usernameTweet, txt, date, retweets, favorites);
                results.add(t);
            }
        }
    } catch (Exception e) {
        System.out.println("Error!");
    }

    return results;
}

On the line "<<--------" the exception is being cast. The json object has the contents of the page returned, so I don’t know what’s going on.

This is the method that requests the page:

private static String getURLResponse(String from, String since, String until, String querySearch, String scrollCursor) throws Exception {
    String appendQuery = "";
    if (from != null) {
        appendQuery += "from:"+from;
    }
    if (since != null) {
        appendQuery += " since:"+since;
    }
    if (until != null) {
        appendQuery += " until:"+until;
    }
    if (querySearch != null) {
        appendQuery += " "+querySearch;
    }

    String url = String.format("https://twitter.com/i/search/timeline?f=realtime&q=%s&src=typd&scroll_cursor=%s", URLEncoder.encode(appendQuery, "UTF-8"), scrollCursor);

    URL obj = new URL(url);
    HttpURLConnection con = (HttpURLConnection) obj.openConnection();

    con.setRequestMethod("GET");

    BufferedReader in = new BufferedReader(
            new InputStreamReader(con.getInputStream()));
    String inputLine;
    StringBuffer response = new StringBuffer();

    while ((inputLine = in.readLine()) != null) {
        response.append(inputLine);
    }
    in.close();

    return response.toString();
}

Exception Try/catch:

twitter4j.JSONException: JSONObject["scroll_cursor"] not found.
    at twitter4j.JSONObject.get(JSONObject.java:390)
    at twitter4j.JSONObject.getString(JSONObject.java:504)
    at Manager.TweetManager.getTweets(TweetManager.java:83)
    at Main.Main.main(Main.java:52)

JSON 1, JSON 2, JSON 3,

  • Put a e.printStackTrace(); within your catch and post here what appears. If possible, also put the String which is returned by getURLResponse(...).

  • Basically, there’s nothing called "scroll_cursor" in the JSON you downloaded. I think it would be important for you to put in question an example of JSON downloaded with the getURLResponse(...).

  • I am unable to print the string of the json object. I can only see the content in Debug mode. json.getString() requires a string per parameter. You know which way to go?

  • Try something like this: String sj = getURLResponse(username, since, until, querySearch, refreshCursor); System.out.println(sj); JSONObject json = new JSONObject(sj); and then I put in question what comes out in the println.

  • It worked. It’s a giant string, I think there’s no way I can post it here. It’s like attaching a . txt here at stackoverflow?

  • You can post it on Pastebin.com and link here.

  • I put a Dropbox link on it anyway.

Show 3 more comments

1 answer

5

Your troublesome JSON consists basically of this:

{
    "has_more_items": false,
    "items_html": "<um monte de html...>",
    "focused_refresh_interval": 30000
}

That is, basically your code fails when your search reaches the end of the results, because in this case there is no item called scroll_cursor.

Here’s what you do then. Instead:

JSONObject json = new JSONObject(getURLResponse(username, since, until, querySearch, refreshCursor));
refreshCursor = json.getString("scroll_cursor");   // <<--------

Put this on:

JSONObject json = new JSONObject(getURLResponse(username, since, until, querySearch, refreshCursor));
boolean hasMore = json.getBoolean("has_more_items"); 
refreshCursor = hasMore ? json.getString("scroll_cursor") : null;

And at the end of the loop while, put this:

if (!hasMore) break;

And as Bruno Céssar observed in a comment below:

In place of break could be used itself hasMore, initiates it as true out of the loop, while(hasMore) search for next tweets, as it always updates hasMore even.

  • 1

    In place of break could be used itself hasMore, initiates it as true out of the loop, while(hasMore) search for next tweets, as it always updates hasMore even.

  • @Brunocésar Yes, I agree.

  • It worked for a while, but it stopped working. The short time that worked with your solution I also noticed that there was not being returned the same amount of tweets as before, being limited to a maximum of 18. Before giving this problem, depending on the search I could recover up to 25 thousand tweets.

  • @Rogerrubens So let’s go again... How many and what types of Jsons do you get? Edit the question and put there the different types of JSON you see.

  • @Rogerrubens Can you post a case where "scroll_cursor" comes from? Or did twitter remove this? Also, I would suspect the "focused_refresh_interval".

  • @Because Victorstafusa is expensive, since when it gave the problem the first time it is accusing the non-existence of "scroll_cursor". Regarding the "focused_refresh_interval", you think it would be a way to block them?

  • @Rogerrubens Maybe. In this case this is the time in milliseconds you would have to wait before trying again.

  • @Victorstafusa I added a 'Thread.Sleep (5000)' before each page request but keeps giving the error. I edited the question with the method I use to request the page. Gives a check.

  • @Victorstafusa I made some changes and I’m having a new problem. If you can help me. http://answall.com/questions/77532/filenotfoundexception-sendo-lan%C3%A7ada-em-pesquisa-no-twitter-java

Show 4 more comments

Browser other questions tagged

You are not signed in. Login or sign up in order to post.