Indexoutofboundsexception No Get(0) do Crawler jsoup

Asked

Viewed 126 times

2

I would like to get the names of the companies that appear in a search like "Farmacias em Santo Andre" on Google Maps.

Erro: Exception in thread "main" java.lang.IndexOutOfBoundsException: Index 0 out of bounds for length 0
at java.base/jdk.internal.util.Preconditions.outOfBounds(Preconditions.java:64) 
at java.base/jdk.internal.util.Preconditions.outOfBoundsCheckIndex(Preconditions.java:70) 
at java.base/jdk.internal.util.Preconditions.checkIndex(Preconditions.java:248) 
at java.base/java.util.Objects.checkIndex(Objects.java:372) 
at java.base/java.util.ArrayList.get(ArrayList.java:458) 
at crawlergooglemap.CrawlerGoogleMap.main(CrawlerGoogleMap.java:48)
 C:\Users\Eugene\AppData\Local\NetBeans\Cache\11.1\executor-snippets\run.xml:111: The following error occurred while executing this line:
 C:\Users\Eugene\AppData\Local\NetBeans\Cache\11.1\executor-snippets\run.xml:68: Java returned: 1 BUILD FAILED (total time: 7 seconds)
public static void main(String[] args) throws IOException, XmlPullParserException {

    String url;
    Scanner s = new Scanner(System.in); 
    System.out.println("Cole a URL do maps:"); 
    url = s.next(); 
    Document page = Jsoup.connect(url).get(); 
    for(int i = 0; i < 20; i++){ 
        String empresa = page.getElementsByAttribute("section-result-title").get(0).selectFirst("h3").text(); 
        if(empresa.length() != 0){ 
            System.out.println(empresa); 
        }
       else{ 
           System.out.println("0"); 
       }
    } 
}

Part of the website’s HTML:

<div class="section-result-title-container">
  <h3 class="section-result-title">
    <span jstcache="134">Farmácia Nazaré</span>
    <button jstcache="135" style="display:none"></button>
  </h3>
  <span jstcache="136" class="section-ads-placecard" style="display:none">Anúncio</span>
  <span jstcache="137" class="section-hotel-ads-url" style="display:none">Anúncio</span> 
  • 1

    If you made that mistake, it’s because getElementsByAttribute did not find anything and returned an empty list. In this case, simply check before if the list is empty (using isEmpty(), for example)

  • Blz, this "solved", now I realized that the program does not take any information, the isEmpty() always of the 1

  • If it is empty, then there is no element with the attribute section-result-title. But if you want to get the results, maybe you should use the Google Maps API instead of accessing HTML and trying to extract data from it. Note: I did not test this link, it was one that appeared in the search results, but you can look for others, if you want. Accessing the API directly seems simpler than unlocking the HTML (even if the API has access limits and from a certain amount it charges)

  • In this case I was taking a class with this name and using get Attribute, but I already changed to get class and it won’t go tb, and the class ta la yes, look here a part of the site, (and thanks for the API): <div class="Section-result-title-container"> <H3 class="Section-result-title"> <span jstcache="134">Nazaré Pharmacy</span> <button jstcache="135" style="display:None"></button> </H3> <span jstcache="136" class="Section-ads-placecard" style="display:None">Ad</span> <span jstcache="137" class="Section-hotel-ads-url" style="display:None">Ad</span>

1 answer

1


According to the Jsoup documentation, the method getElementsByAttribute takes the name of the attribute to be searched for. In the case of your HTML, you seem to be looking for this element:

<h3 class="section-result-title"> 

Note that the name of the attribute is class, and the value is section-result-title. The method getElementsByAttribute takes the name of the attribute, but you were passing the value, so it finds nothing (and the list of elements is empty, so get(0) gives IndexOutOfBoundsException, because you’re trying to access an element that doesn’t exist).

What you need in this case is the method getElementsByAttributeValue:

// procura os elementos que possuem class="section-result-title"
Elements results = page.getElementsByAttributeValue("class", "section-result-title");
if (!results.isEmpty()) {
    String empresa = results.get(0).text();
    if (empresa.length() != 0) {
        System.out.println(empresa);
    } else {
        System.out.println("0");
    }
}

Also note that I check if the returned list of elements is empty, because if it is, the get(0) error will continue. If the list is not empty, I proceed with the rest of the code.


Another alternative is to use the method select:

Elements results = page.select("h3[class=section-result-title]");
if (!results.isEmpty()) {
    ....
}

In case, I’m looking for the elements h3 that have the attribute class with the value section-result-title (more about this syntax in documentation).


Another detail is that I don’t understand why you’re only getting the first element of the list. If you want to go through all found elements, just make a for simple:

Elements results = page.select("h3[class=section-result-title]");
for (Element el : results) {
    String empresa = el.text();
    if (empresa.length() != 0) {
        System.out.println(empresa);
    } else {
        System.out.println("0");
    }
}

So if the list is empty, it doesn’t even enter the for, and you don’t need to keep checking if it is empty. But the intention was to only get the first element, so use the previous codes.

  • I tried the way you showed and Results continues Empty, will Google have some protection against Crawler?

  • Do System.out.println(page) and see how HTML is

  • Yes, the file you save is very different from the site, full of parts with null, I believe they use some protection, even because they have an api that sell. Was worth the help

Browser other questions tagged

You are not signed in. Login or sign up in order to post.