"Tracking" updates with PHP

Asked

Viewed 23 times

-2

I would like some role tip or how to do to track or generate notifications if a web page undergoes changes or new insertions. However, I say this to track third-party websites (i.e., I don’t have access to BD).

  • The most basic, independent of the structure of the page in question, will be you save the initial HTML code and keep ordering the current HTML, checking if they are the same or different. Problem is that this task tends to be quite costly and does not scale very well. Other than that, your problem is not clear enough and may be an XY problem.

  • An example is for websites that issue notices. If a new notice is published, a notification or a page can be updated by notifying the new entry.

1 answer

0

You can create a routine that from time to time performs a search on the desired page and checks a change in some tag as needed using this script: https://github.com/tj/php-selector

But since it’s not all flowers, we have a problem: If the page uses javascript to render the elements on the screen it will be more difficult to capture the data, then the way I will explain it will only work for pre-rendered pages. To make sure, go to the page and open the source code (Ctrl + u) and see if the tag you want to observe appears in the code.

I won’t talk about how you should save the data to compare, but I advise saving the data in a database.

<?php
//inclui o script https://github.com/tj/php-selector
include 'selector.inc';

$handle = fopen("http://www.example.com/", "rb");
$contents = stream_get_contents($handle);
fclose($handle);

//seleciona o conteudo da tag (igual seletor css)
$tag = select_elements('div p + ul', $contents);

$oldChecksum = ... BUSCA ANTIGOS DO BANCO;
$newChecksum = md5( serialize($tag) );

if($newChecksum !== $oldChecksum) {
    //Houve mudança
    ...EXECUTA O QUE PRECISAR E ATUALIZA NO BANCO
    //lembre-se de salvar serializado `serialize($tag)`
}

To create the routine you have several options, here is an article on the subject: https://e-tinet.com/linux/agendar-script-php-crontab-no-linux/

This script is an adaptation of this: https://www.sitepoint.com/community/t/can-i-monitor-php-web-page-changes/2907/3

Browser other questions tagged

You are not signed in. Login or sign up in order to post.