Convert binary files to .txt
will not make a search work, each file has its own format.
For each file type you will have to use a method to extract the data and save them in one .txt
, some example:
XML use http://php.net/manual/en/class.domdocument.php
Example:
//Caminho que o seu arquivo xml foi salvo
$xml = file_get_contents('arquivo.xml');
$frases = array();
$dom = new DOMDocument;
$dom->loadXML($xml);
$books = $dom->getElementsByTagName('*');
foreach ($books as $book) {
$frases[] = $book->nodeValue, PHP_EOL;
}
//Salve o $vetor em um txt, assim:
file_put_contents('arquivo.xml.txt', implode(' ', $frases));
CSV use http://php.net/manual/en/function.fgetcsv.php
//Caminho que o seu arquivo xml foi salvo
$handle = fopen ("arquivo.csv", "r");
$frases = array();
while (($data = fgetcsv($handle, 1000, ",")) !== FALSE) {
$frases = array_merge($frases, $data);
}
fclose ($handle);
file_put_contents('arquivo.csv.txt', implode(' ', $frases));
XLS will probably have to use a library to facilitate how https://code.google.com/p/php-excel-reader/ or http://sourceforge.net/projects/phpexcelreader/
These are just a few examples, for each format you implement in your application you will have to use a new script.
I believe that there is no "magic" solution ready for this, you will have to take what exists and create an application based on this.
To make the appointment, let’s assume that you saved everyone .txt
in a folder, then you should do a search similar to this:
$consulta = 'Palavra';
$arquivos = array();
if ($dh = opendir($dir)) {
while (($file = readdir($dh)) !== false) {
if (is_file($dir . $file)) {
$data = file_get_contents($dir . $file);
if (stripos($data, $consulta) !== false) {
$arquivos[] = $file;
}
}
}
closedir($dh);
}
echo 'A consulta "', $consulta, '" encontrou ', count($arquivos), ': ', implode(', ', $arquivos);
And if it is an image you will need to do OCR. And if it is PDF you will need (...). And if it is (...) you will need OCR (...).
– bfavaretto