How to check duplicate files?

Asked

Viewed 155 times

4

How can I check if two files are equal, even if the name is different in a js application?

  • Hash between the files.

1 answer

4


One of the ways is comparing the cryptographic hash of each of the files, if they are the same is because the file is identical.

For this you can use the crypto and the fs to open the file.

I created a function:

function CriarHash(Texto){

    return Hash.createHash('sha512').update(Texto).digest('hex');

}

To create the Hashes, in this case in 512-bit SHA-2, you can use other SHA-2 or even SHA-3 family or use BLAKE2.

I also created a function to open each file and create a hash of each and return all hashes:

function CriarHashArquivos(NomeArquivos){

    ArquivosHashes = [];

    NomeArquivos.forEach(function(Nome) {

        ArquivosHashes.push(CriarHash( Arquivo.readFileSync(Nome)));

    });

    return ArquivosHashes;

}

So what you need to do is just check all the hashes are equal, for this I created this:

function HashIgual(Hashes){

    return Hashes.every( v => v === Hashes[0])

}

/!\ This is not Timing Attack safe, this is vulnerable to attacks of the type timming Attack, I do not know if Nodejs has any native comparison function that is protected to this.

In the end you’ll have this:

var Hash = require('crypto');
var Arquivo = require('fs');

function CriarHashArquivos(NomeArquivos){

    ArquivosHashes = [];

    NomeArquivos.forEach(function(Nome) {

        ArquivosHashes.push(CriarHash( Arquivo.readFileSync(Nome)));

    });

    return ArquivosHashes;

}

function CriarHash(Texto){

    return Hash.createHash('sha512').update(Texto).digest('hex');

}

function HashIgual(Hashes){

    return Hashes.every( v => v === Hashes[0])

}

To use just set the files, for example:

var NomesDosArquivos = ['arquivo1.txt', 'arquivo2.txt'];

console.log( HashIgual( CriarHashArquivos(NomesDosArquivos) ) );

He will return true if all the files are the same or false if at least one of the compared files are different.


Nodejs, as far as I searched, doesn’t have a Timing Attack Safe function, so if you want to use a safer comparison, use, for example:

function HashIgual(Hashes){

    eIgual = 0;

    Hashes.forEach(function(HashArquivo){

        for (var i = 0; i < HashArquivo.length; ++i) {
            eIgual |= (HashArquivo.charCodeAt(i) ^ Hashes[0].charCodeAt(i));
        }

    });

    return eIgual;

}

This will run a XOR which will make the running time equal in any case, unlike the comparison of == and ===. In your specific case I do not see the need to use it, but if it is, for example, compare passwords always use functions Constant-time. ;)

PS: I don’t know much about Nodejs.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.