How to detect a bot

Question

How to detect a bot

Asked 9 years, 2 months ago

Viewed 594 times

0

I’m helping a friend develop a visitation system, like the Rede Grana ex: Social Money. It turns out that we will make payments for real visits on the page, and as we know that there are people who are malicious and who will try to circumvent the system to take advantage and gain views, such as using a fictitious user, a bot (Hitleap).

I need to know how to differentiate a real view from a view by bot. I already looked for a solution with the HTTP_USER_AGENt but getting nothing, I also compared it to real views and found nothing I can use.

What would be the best solution to protect yourself from this kind of case, something like Youtube can already perform, distinguish the real accesses from non reai.

Thanks in advance...

P.S.: I know how to detect common indexers! So don’t show me articles about googlebot.

like I said, Daniel, you can check if it’s a robot through the recaptcha

– Ivan Ferrer

2016/05/30 at 20:02
It is something unpleasant for the user this. Because the purpose does not interact with the page, just visualize it.

– Helmesvs

2016/05/30 at 20:07
Behold if that help you.

– Ivan Ferrer

2016/05/30 at 20:08
How do you know I’m not a bot commenting here for you at SOPT?

– Luiz Vieira

2016/05/30 at 21:28
@Ivanferrer was not that...

– Helmesvs

2016/05/31 at 01:00
And @Luizvieira I think a bot wouldn’t ask me that.

– Helmesvs

2016/05/31 at 01:00
1

A good enough bot could ask that. There are actually other indications that I’m not a bot that are better than my previous message or even that. You can look, for example, my history of participation on the site. My point with this joke (sorry for her, by the way, it was just a joke) is that without analyzing some interaction history will be difficult to detect something. Unless you do as the answers you have already suggest, and ignore known bots origins. They are very good solutions, but not necessarily infallible. :)

– Luiz Vieira

2016/05/31 at 01:40

Show 2 more comments

2 answers

2

I think the only effective way is by using Captcha, other ways are easy to circumvent.

There are good ways to estimate the number of visitors, an example is the view Count of OS, but even this method can be circumvented with distributed bots or using Proxy.

This answer was initially a comment, but, if you have nothing better, I think it is good enough to indicate a way to research more.

– Daniel Dutra

2016/05/30 at 19:58

Browser other questions tagged php javascript

You are not signed in. Login or sign up in order to post.

by Ivan Ferrer • **12,096** points · Answer 1 · 2016-05-30T20:56:18+00:00

One way to do this is to create rules in the .htaccess, that prevent some known agents who are robots, hence you would have to have a complete list or search for a complex list of these agents:

RewriteEngine on
RewriteCond %{HTTP_USER_AGENT} facebookexternalhit [NC,OR] 
RewriteCond %{HTTP_USER_AGENT} Twitterbot [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Baiduspider [NC,OR]
RewriteCond %{HTTP_USER_AGENT} MetaURI [NC,OR]
RewriteCond %{HTTP_USER_AGENT} mediawords [NC,OR]
RewriteCond %{HTTP_USER_AGENT} FlipboardProxy [NC]
RewriteCond %{REQUEST_URI} !\/sem_crawler.htm
RewriteRule .* http://seusite.com.br/sem_crawler.htm [L]

Another way is by making use of PHP:

<?php 
class CrawlerDetect
{
   //lista de robôs
  private $agentsInvalids = array(
    'Google'=>'Google',
    'MSN' => 'msnbot',
    'Rambler'=>'Rambler',
    'Yahoo'=> 'Yahoo',
    'AbachoBOT'=> 'AbachoBOT',
    'accoona'=> 'Accoona',
    'AcoiRobot'=> 'AcoiRobot',
    'ASPSeek'=> 'ASPSeek',
    'CrocCrawler'=> 'CrocCrawler',
    'Dumbot'=> 'Dumbot',
    'FAST-WebCrawler'=> 'FAST-WebCrawler',
    'GeonaBot'=> 'GeonaBot',
    'Gigabot'=> 'Gigabot',
    'Lycos spider'=> 'Lycos',
    'MSRBOT'=> 'MSRBOT',
    'Altavista robot'=> 'Scooter',
    'AltaVista robot'=> 'Altavista',
    'ID-Search Bot'=> 'IDBot',
    'eStyle Bot'=> 'eStyle',
    'Scrubby robot'=> 'Scrubby',
    ...
    );
//lista de navegadores válidos
private $agentsValids = array(
    'Mozilla' => 'Mozilla',
    'Chrome'  => 'Chrome',
    'Safari'  => 'Safari',
    'Opera'   => 'Opera',
     ...
);


public function __construct($USER_AGENT)
{
    $invalids =  implode('|',$this->agentsInvalids);
    $valids =  implode('|',$this->agentsValids);
    /* aqui você escolhe como prefere,
    acredito que basta testar uma única lista */
    if (strpos($invalids, $USER_AGENT) !== false ||
        strpos($valids, $USER_AGENT) === false) {
       return true;
    } else {
       return false;
    }
}

//verifica o navegador

$crawler = new CrawlerDetect($_SERVER['HTTP_USER_AGENT']);

//se for robô ele verifica
if ($crawler) {
  echo "acesso inválido!";
} else {
  echo "acesso válido!"; 
}

On this website has a complete or near-complete list showing a full list of brownsers and crawlers.