0
Good morning,
I’m creating a Crawler for you to access a specific page and then take some specific data from the page, but I’m having problems.
Right now I’m trying to perform a test on instagram, my code is as follows:
$client = new Client();
$request = $client->request('GET', 'https://www.instagram.com/user/');
return response()->json( $request->getBody() );
However, the moment I print getBody is returning empty {}
, I also tried to add a second parameter to get the data, as follows:
return response()->json( $request->getBody()->getContents() );
By using getContents you are returning me little html and the rest of javascript, because of this I believe the error may be in the way I am calling.
Thanks for the reply Wallace, I will try to explain a little my situation, I would like to follow several specific profiles and when post an image note I manage to bring the same to my system, I used the cast but the result was wrong and then, and I noticed that it is possible to get the html data after loading the js used the "timeout" setting, but since instagram uses React, the returned html does not contain the published images, it would only be this information that would like, by the quick read in the API, I noticed that it is necessary to pass a token to get the data.
– Henrique Souza Goncalves
Yes, you need to pass the token. It would be the easiest way for you to get a "cool" instagram access. The problem with you doing Crawler is that they can change the page at any time and you dance. The API is already versioned. But if you still insist on making a Crawler, maybe you can use Domdocument to access the attribute
src
of<img>
– Wallace Maxters
I got it Allace, I’m aware of the problem of changing the page, regarding trying to get the Domdocument, what’s going on is that the same Guzzle setting a 5-second timeout is not getting the images, is returning an unknown numbering in place of the attributes '<img>', would know how to solve this?
– Henrique Souza Goncalves
@Henriquesouzagoncalves this is because Instagram must be returning an HTML that uses Javascript to render the page. There is no easy solution to this. I’ve heard that Ghost JS solves this, but I can’t say for sure, because I’ve never used
– Wallace Maxters
All right, Wallace, I’ll check your information, in case it doesn’t work I’ll have to opt for the API myself, but I’ll be waiting for other answers, thanks for your help Wallace!
– Henrique Souza Goncalves