Curl looks 100% identical to my request using common browser

Asked

Viewed 728 times

2

I need to perform some standard functions that I always perform manually, and I would like to automate it. I tested with iframe, but without success, can not read the document nodes.

So I thought I’d use the library Curl, so I get the contents of the page ( could be file_get_contents() ), but I realized that both options go back to the site login page (including logged in, asking only to proceed), but if I access the same page in my browser, it is not redirected to LOGIN screen.

Here comes the question, what to do to get the requisition of Curl be 100% identical to what I do in my ordinary browser, that is, if it were so, I would go to the page I pointed out, rather than go back to login.

Obs: in Curl I configured the CURLOPT_COOKIEFILE to an archive .txt where you have cookies ( in which I make sure to log in ) reason why the Curl back to login page ( already logged in, just asking for permission to proceed ).

I believe that the site somehow detects this, would be some constant that I failed to define in Curl?

Here’s how I currently set up Curl:

$url = "http://trade.aliexpress.com/order_detail.htm?orderId=71064520859834";
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, FALSE);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, TRUE);
curl_setopt($ch, CURLOPT_COOKIEJAR, "cookie.txt");
curl_setopt($ch, CURLOPT_COOKIEFILE, "cookie.txt");
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, FALSE);
curl_setopt($ch, CURLOPT_USERAGENT, $_SERVER['HTTP_USER_AGENT']);
curl_setopt($ch, CURLOPT_AUTOREFERER, TRUE);
curl_setopt($ch, CURLOPT_COOKIESESSION, TRUE);

curl_exec($ch);
  • 1

    You will only be able to stay logged in according to the rules of the page that require authentication, do not just play cookies there, there are situations where the cookie id (being a session cookie) expires from time to time, when reading your question gives me to understand that you want to make a webproxy is this? You can post the url that is trying to stay authenticated?

  • I don’t really understand what a webproxy is, but you say that it expires from time to time, but if I log in now, copy cookies and update my cookie file (set in the COOKIEFILE constant), I don’t understand where "expire" happens. Obg for the comment! @Guilhermenascimento - Authentication persists(at least hints), the problem is actually the page that appears earlier that asks me to continue(requires no password) instead of pointing out what I actually set, I will put the URL.

  • 1

    This is why I asked to put the link (value of $url), Maybe then we get to the root of the problem, because the problem can be a number of factors. Webproxy is similar to proxys, but the access is entirely via the web, that is another page loads the content of a desired page using server-side languages.

  • @Guilhermenascimento, updated with the URL value. What I just want to do there, is to see if there is a tracking code and if there is, perform some function (record in BD), and only. Aliexpress API doesn’t have that (I’ve searched a lot, it’s more for Sellers and not Buyers).

  • 1

    Don’t they provide API? If you do, it’s better than suffering from webcrawler.

  • @Williamokano , I thought I had an API, and I did, but their API is more concentrated for those who sell, in fact, for those who buy I didn’t find anything, for example, get tracking code of finished order, and/or find out how many orders in the missing account to pay.. etc, these things I swear I searched hard, and found nothing.

Show 1 more comment

1 answer

1

You are always being redirected to the login screen because you are creating a cookie jar but it is empty. You need to populate Curl cookies with your current cookies.

curl_setopt($ch, CURLOPT_COOKIE, $_SERVER["HTTP_COOKIE"]);

But note that you may still be redirected to the login page as there may be other authentication mechanisms on the site.

  • cookie was already populated, because COOKIE_FILE had a file. txt where I copied all my cookies and put there, but thanks for the tip, I no longer need to keep passing the browser cookies to . txt.

  • 1

    At all were HTTP_COOKIE will only get cookies from the current domain. I think the simplest thing for you would be to create an individual robot and create an API for you to read from it.

  • What do you mean little robot? William, could you give me a better idea than that?

  • By robot I mean an automated application that will be running ad-eternum, it will be reading your requests (according to certain rules) and populating a bank for example. This bank you could access to read the papers. A webcrawler, like you want to do, but one that runs forever and is "standalone" (outside of your application, be an individual application)

Browser other questions tagged

You are not signed in. Login or sign up in order to post.