Indexoutofboundsexception No Get(0) do Crawler jsoup

Question

Indexoutofboundsexception No Get(0) do Crawler jsoup

Asked 5 years, 11 months ago

Viewed 126 times

2

I would like to get the names of the companies that appear in a search like "Farmacias em Santo Andre" on Google Maps.

Erro: Exception in thread "main" java.lang.IndexOutOfBoundsException: Index 0 out of bounds for length 0
at java.base/jdk.internal.util.Preconditions.outOfBounds(Preconditions.java:64) 
at java.base/jdk.internal.util.Preconditions.outOfBoundsCheckIndex(Preconditions.java:70) 
at java.base/jdk.internal.util.Preconditions.checkIndex(Preconditions.java:248) 
at java.base/java.util.Objects.checkIndex(Objects.java:372) 
at java.base/java.util.ArrayList.get(ArrayList.java:458) 
at crawlergooglemap.CrawlerGoogleMap.main(CrawlerGoogleMap.java:48)
 C:\Users\Eugene\AppData\Local\NetBeans\Cache\11.1\executor-snippets\run.xml:111: The following error occurred while executing this line:
 C:\Users\Eugene\AppData\Local\NetBeans\Cache\11.1\executor-snippets\run.xml:68: Java returned: 1 BUILD FAILED (total time: 7 seconds)

public static void main(String[] args) throws IOException, XmlPullParserException {

    String url;
    Scanner s = new Scanner(System.in); 
    System.out.println("Cole a URL do maps:"); 
    url = s.next(); 
    Document page = Jsoup.connect(url).get(); 
    for(int i = 0; i < 20; i++){ 
        String empresa = page.getElementsByAttribute("section-result-title").get(0).selectFirst("h3").text(); 
        if(empresa.length() != 0){ 
            System.out.println(empresa); 
        }
       else{ 
           System.out.println("0"); 
       }
    } 
}

Part of the website’s HTML:

<div class="section-result-title-container">
  <h3 class="section-result-title">
    <span jstcache="134">Farmácia Nazaré</span>
    <button jstcache="135" style="display:none"></button>
  </h3>
  <span jstcache="136" class="section-ads-placecard" style="display:none">Anúncio</span>
  <span jstcache="137" class="section-hotel-ads-url" style="display:none">Anúncio</span>

1

If you made that mistake, it’s because getElementsByAttribute did not find anything and returned an empty list. In this case, simply check before if the list is empty (using isEmpty(), for example)

– hkotsubo

2019/08/15 at 11:41
Blz, this "solved", now I realized that the program does not take any information, the isEmpty() always of the 1

– Eugenio Maria

2019/08/15 at 14:19
If it is empty, then there is no element with the attribute section-result-title. But if you want to get the results, maybe you should use the Google Maps API instead of accessing HTML and trying to extract data from it. Note: I did not test this link, it was one that appeared in the search results, but you can look for others, if you want. Accessing the API directly seems simpler than unlocking the HTML (even if the API has access limits and from a certain amount it charges)

– hkotsubo

2019/08/15 at 14:34
In this case I was taking a class with this name and using get Attribute, but I already changed to get class and it won’t go tb, and the class ta la yes, look here a part of the site, (and thanks for the API): <div class="Section-result-title-container"> <H3 class="Section-result-title"> <span jstcache="134">Nazaré Pharmacy</span> <button jstcache="135" style="display:None"></button> </H3> <span jstcache="136" class="Section-ads-placecard" style="display:None">Ad</span> <span jstcache="137" class="Section-hotel-ads-url" style="display:None">Ad</span>

– Eugenio Maria

2019/08/15 at 14:57

1 answer

Browser other questions tagged java html web-crawler jsoup

You are not signed in. Login or sign up in order to post.

by hkotsubo • **55,826** points · Answer 1 · 2019-08-15T16:59:35+00:00

According to the Jsoup documentation, the method getElementsByAttribute takes the name of the attribute to be searched for. In the case of your HTML, you seem to be looking for this element:

<h3 class="section-result-title">

Note that the name of the attribute is class, and the value is section-result-title. The method getElementsByAttribute takes the name of the attribute, but you were passing the value, so it finds nothing (and the list of elements is empty, so get(0) gives IndexOutOfBoundsException, because you’re trying to access an element that doesn’t exist).

What you need in this case is the method getElementsByAttributeValue:

// procura os elementos que possuem class="section-result-title"
Elements results = page.getElementsByAttributeValue("class", "section-result-title");
if (!results.isEmpty()) {
    String empresa = results.get(0).text();
    if (empresa.length() != 0) {
        System.out.println(empresa);
    } else {
        System.out.println("0");
    }
}

Also note that I check if the returned list of elements is empty, because if it is, the get(0) error will continue. If the list is not empty, I proceed with the rest of the code.

Another alternative is to use the method select:

Elements results = page.select("h3[class=section-result-title]");
if (!results.isEmpty()) {
    ....
}

In case, I’m looking for the elements h3 that have the attribute class with the value section-result-title (more about this syntax in documentation).

Another detail is that I don’t understand why you’re only getting the first element of the list. If you want to go through all found elements, just make a for simple:

Elements results = page.select("h3[class=section-result-title]");
for (Element el : results) {
    String empresa = el.text();
    if (empresa.length() != 0) {
        System.out.println(empresa);
    } else {
        System.out.println("0");
    }
}

So if the list is empty, it doesn’t even enter the for, and you don’t need to keep checking if it is empty. But the intention was to only get the first element, so use the previous codes.