I can’t get elements from a page with Puppeteer

Asked

Viewed 999 times

-1

I’m creating an application to download an Instagram image through the URL and I’m using the package puppeteer to accomplish this task.

Within the call of the method evaluate (method to run a JS code on the page) I try to get and return an image element.

The problem is that whenever I try to get the image, I end up getting null as a result. See my code below:

const puppeteer = require("puppeteer");
const url = "https://www.instagram.com/p/<image_code>/";

async function getImageFrom(url) {

    const browser = await puppeteer.launch();
    const page = await browser.newPage();

    await page.goto(url);

    const image = await page.evaluate(function() {
        return document.querySelector(".KL4Bh > img");
    });

    await browser.close();
    return image;
}

getImageFrom(url).then((image) => {
    console.log(image);
});

What am I doing wrong? I’ve tried using querySelectorAll searching only for elements with the tag <img> but it also returns me only an empty array.

Edit: I don’t know what happened, but the method is returning undefined now.

  • This selector (.KL4Bh > img) doesn’t seem very reliable and long-lasting... It’s very typical of Facebook to change this type of selector with a certain frequency. Despite this, I don’t think that’s the problem... Besides, what would it be None? :P

  • Oops, I was wrong I mistook null with None kkk (Python xD mania). Good if Instagram does this I don’t know, but I tested pick up the element right on the browser console and it worked. Also, I also tested get only the elements of <img> as I had already said in the reply and yet the puppeteer only returned an empty array.

  • From what I can see here, it doesn’t seem to be a problem with Puppeteer. Instagram inserts the image via Javascript. If you try to download HTML using some tool like Curl, you will see that the selector you entered is not there. I think it is placed via Javascript. I do not know if the method evaluate expects that all the Javascripts of the page will be properly completed, but it seems to me that this is not the case. Worth investigating this.

  • But the goal of puppeteer as far as I understand it is to be a kind of selenium where it renders the page to be manipulated. So even if the images are added via Javascript, the puppeteer should render this and return me the elements.

2 answers

3

You probably DON’T is waiting for the load to complete or it may be that the way instagram loads is dynamic and even after the load, each site makes a way.

Change this:

const image = await page.evaluate(function() {
    return document.querySelector(".KL4Bh > img");
});

For:

page.waitForSelector(selector[, options])

To wait for the "exist" element that is already a function native (evaluete would be JS within the generated webview, which is unnecessary rework), example:

page.waitForSelector('.KL4Bh > img');

Or instead of trading seven of the property waitUntil in options

page.goto(url[, options])

And in it apply the value networkidle2 (which would be like load and also no more than 2 network connections for at least 500ms), example:

await page.goto('https://site.com', { waitUntil: 'load' });
  • What do you mean change the goto? Isn’t this method that makes you open a page by a URL? I tried using waitForSelector and he waited for a long time until there came a time when the method generated an error saying that the time of 30000ms had been exceeded. I also tried using the option waitUntil: 'load' but it still didn’t work.

  • William, one thing I’m seeing now is that the result of evaluate is no longer null and yes undefined. I don’t know when that result has changed, but now that’s what it returns.

  • @Jeanextreme002 was not to change the drop, I corrected the answer, but anyway try to exchange the evaluete for page.waitForSelector ... because evaluete is to run a javascript and there is no pq to do this if you already have native method.

  • This method worked, but how can I get the attributes of the element like the src ? And if possible, could explain in the answer why the option waitUntil does not work ? I would like to know this in case I need to run the evaluate in the future.

  • @Jeanextreme002 may have been something you did wrong, I don’t know, it may be that instagram even after the load still has to load something ... after all, each site works in a way and invents its own fashion, even if sometimes it is kind of clueless (I can’t test if that’s what it is).... Get the CRS I think it should be something like const propertyHandle = await element.getProperty('src');, element is the value returned by page.waitForSelector

  • Thank you William for your help, your reply was quite enlightening. I was able to discover the real problem of the code and put the solution in my answer.

  • @Jeanextreme002 and what differs from what I said ? You practically used the page.waitForSelector that I quoted, in your answer you imply that I answered something wrong about picking up the element, as if I just said to pick up the SRC, being that I didn’t even say in the answer, in fact this is not even in your question, is something that you later brought in the comments, DECONTEXTUALIZING the whole question, how to get the "whole element" (Handler) I answered.

  • No I put the page.waitForSelector as something alternative because the question was more about the problem with the evaluate. The only part of my response that has to do with yours is when I talk about the option waitUntil, at no time said or meant that his answer was wrong by picking up an element of form X or Y. And really the question is about picking up an entire element and not just the src, and in my answer I explain that taking an entire element with the evaluate is wrong and for this to be used the waitForSelector.

  • Anyway, I apologize if it sounded like I was saying your answer was wrong. In my answer the final solution to getting a whole element is the same as your answer, but in mine I explain about the root of the problem and give details about other things. I spent hours behind hours scouring the entire Internet until I found a simple comment on Github that made me understand the problem of everything.

  • The question did not speak of any of this @Jeanextreme002, there went beyond, almost like a chameleon question. I answered what you were asking and adjusted the part that was wrong about the load. The answer is basically in the part that I said "You are probably NOT waiting for the load to complete" and in the comment and "after all, each site works in a way and invents its own fashion", there is no way to say when dynamic items will load, for that the waitForSelector resolves

Show 5 more comments

0


The method evaluate is returning the value null because maybe the element doesn’t exist on the page. Either because it hasn’t been loaded yet or because it doesn’t really exist on the current page even though it has loaded all the JS code.

The other reply says to set the value "load" for the option waitUntil, but this is redundant since according to documentation of this API, by default this option will be "load".

Still this is not the main problem of your code. Even if the element has loaded on the page, the method evaluate must return undefined if all goes well. That’s because you’re trying to return a non-serializable object.

If your goal is just to get the attribute src or another of that element that returns a string or another serializable value, do it within the method and return it as in the code below:

const image = await page.evaluate(function() {
    return document.querySelector(".KL4Bh > img").src;
});

If you need the whole element and not just an attribute or simple value, use the method waitForSelector as shown in the other reply. What this method does is wait and return an element of the page with the selector you set. See the code below:

const image_element = await page.waitForSelector('.KL4Bh > img');

This method will return an object from ElementHandle. To get the attributes of this element, use the method getProperty(attr) which returns an object of JSHandle.

After having obtained this object, use the method jsonValue() to obtain the attribute value.

const image_element = await page.waitForSelector('.KL4Bh > img');
const image_property = await image_element.getProperty("src");

const src = await image_property.jsonValue();

Remember that all these actions should be done with the browser open. Therefore, you cannot close the browser before obtaining the image property or you will receive an error.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.