How to create a robot with PHP?

Asked

Viewed 10,380 times

2

What is the best way to create a robot in PHP?

The goal from robot, is to access a URL, with login and password, enter data in certain fields, submit this data, and interpret the result on screen.

Any suggestions ?

  • Take a look here, I think you’ll find everything you need. https://php.net/manual/en/book.curl.php

  • 3

    The best language for this is C# Windows Form(or Any other .NET) because you can use the Webbrowser object to navigate, doing this in PHP would be a terror.

  • You’d like to build a web Crawler?

1 answer

4


Robots to search and interpret information on other pages are also called web crawlers or Spiders.

These are scripts that perform the following process:

  1. Request for a URL.
  2. Store the return obtained in a variable.
  3. Interpreting the return, that is, performing the HTML parser.
  4. Find the relevant information.
  5. Carry out the processes with the information obtained.

The process in steps 1 to 3 is easily solved as follows:

$url = 'www.exemplo.com';
$dom = new DOMDocument('1.0');
$dom->loadHTMLFile($url);

This way you will get an object that will allow you to navigate throughout the HTML the way you need it.

For example, to take all links on a page and display addresses would look like this:

$anchors = $dom->getElementsByTagName('a');
foreach ($anchors as $element) {
    $href = $element->getAttribute('href');
    echo $href . '<br>';
}

An interesting class that can assist in handling HTML and avoid thousands of lines of code is the Simple HTML DOM, and a tutorial teaching how to use can be found on the site Make Use Of.

To fill in a simulation that a form has been completed, simply make a request for the URL that the form points to using the expected request method, that is, request the URL present in the attribute action using the request method in the attribute method.

To simulate the situation we will change the previous request code to:

$curl = curl_init();
// Set some options - we are passing in a useragent too here
curl_setopt_array($curl, array(
    // Retorna o conteúdo como string
    CURLOPT_RETURNTRANSFER => 1,
    CURLOPT_URL => 'http://www.exemplo.com',
    // Nome de identificação do seu robô
    CURLOPT_USERAGENT => 'Nome do seu crawler',
    // Indica que a requisição utiliza o método POST
    CURLOPT_POST => 1,
    // Parâmetros que serão passados via POST
    CURLOPT_POSTFIELDS => array(
        item1 => 'value',
        item2 => 'value2'
    )
));

// Fazendo a requisiçnao e salvando na variavel $response
$response = curl_exec($curl);

// Finalizando o objeto de requisição
curl_close($curl);

$dom = new DOMDocument('1.0');

// Realiza o parser da String de retorno da requisição
// Observe que o método mudou de loadHTMLFile para loadHTML
$dom->loadHTML($response);

Learn more about CURL

  • I managed to create the robot, using Selenium Server + Webdriver, it is quite complete, allows you to manipulate all the resources of a browser.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.