Is it possible for a domain to block a query via Curl?

Asked

Viewed 234 times

3

Perform requests via cUrl, where the domain does not have a return via JSON/XML/related, the domain consulted can "block" my IP to query via Curl, or something like?

Currently, the consultation is performed via cUrl, and the HTML of the page is returned. That’s all.

  • add the feedback in your question tbm so I could help you better.

  • You say the return of what comes from the Curl query?

2 answers

4


Hardly a service will block only the cURL, this will probably occur if you are using incorrect header/body, so the receiver can distinguish between your requests via Curl and via non-Curl.

Most likely, however, is the service identifying that it is making more requests than permitted (i.e., reaching a rate-limit), or that you are experiencing abnormal behavior (not performing the requests that typically the app/site would do under normal conditions).


Perform requests via Curl, where the domain does not have a return via JSON/XML/the like, the consulted domain can "block" my IP for Curl query, or something like that?

Yes, it can block "the Curl" the same way it can block the browser. That is, any server can recur its requests, regardless of how it is done (if it is via Curl, via browser, via wget...). The target site will typically not block "only Curl", it will probably be able to recur any request from your IP (or your user...) or it may request some captcha (or additional checks) and the like.

One way to mitigate blocks due to excessive use is to seek an alternative service, use multiple proxies (preferably residential and mobile) and balance requests between it, simulate the use of the application/site closer to a real user and make fewer requests per second...

3

Using the header Access-Control-Allow-Methods you can define what type of request can be made to a file (or route) on your source server.

So if you want to allow only requests of the type GET for your page you must define

header("Access-Control-Allow-Methods: GET");` 

In your source file (or route).

To allow multiple methods, you can simply separate them by , (comma) in this way

header("Access-Control-Allow-Methods: GET, POST, DELETE, PUT, HEADER, OPTIONS");

Can combine any number of methods in the defined list.

So if you are making a request GET for a route that accepts as much GET how much POST the return can be an HTML because the return of the request GET is an HTML.

Each of these requests is received by the server and handled individually, so, answering your question, there is the possibility of the Google play your domain in Blacklist for excessive requests, preventing you from searching for a request on the site for a certain time or even permanently. (I don’t know the terms of use of the service to answer you for sure)

  • Perfect, Erlon. Thank you very much for the answer. In this case, the request is made to Google Images, where the query parameter is passed via GET. Example: https://www.google.com/search?sout=2&biw=1024&bih=548&tbm=isch&q=ferrari+2019 My fear is that Google will block my domain, but how they use "get" to perform such searches, based on what you said then it is sure that my domain is not blocked. Correct?

  • Now I understand the lock you say, in this specific case you better use the google Apis to do these searches. https://developers.google.com/photos/

  • This one you passed is from Google Photos. I need to get the images from Google Images (public image search). The API I know for images is paid for. I need a free one.

  • @Rayanmarcus true, really the ones I looked for are all paid, although they are very good. But to answer your question, there is the possibility that Google will play your domain on the Blacklist for excessive requests, I will edit the answer by adding this comment to be more consistent with what you were looking for.

  • Thank you very much, Erlon!

Browser other questions tagged

You are not signed in. Login or sign up in order to post.