Compare Hash of two files in Python

Asked

Viewed 618 times

0

I need to compare the hash of the files a.txt and b.txt using a python3 native library.

I tried to do so:

import hashlib

file_0 = 'a.txt'
file_1 = 'b.txt'

hash_0 = hashlib.new('ripemd160')
hash_0.update(file_0)
hash_1 = hashlib.new('ripemd160')
hash_1.update(file_1)

assert hash_0 != hash_1, f'O arquivo: {file_0} é diferente do arquivo: {file_1} '

But the following error occurred:

Typeerror: Unicode-Objects must be encoded before hashing


Note: Before editing I made the error by two file_0 variables. In the project was correct.

  • 2

    Do you need to compare the contents of the files or their names only? The way you are doing you will only compare the names. The method update accepted only bytes-like as a parameter, then you will need to encode your string with the method encode.

  • I need to compare the contents of the file. I will continue searching and find how to compare the content. Thank you.

2 answers

7


First - you are comparing file names - the call to update in Hashlib classes does not open files alone - it expects objects of type "bytes" (and that is the reason for the error message, you are passing text) - but even if you put the prefix b' in these name strings, or use .encode() in the same, will continue to hash only the name of the files. (Another error: you used twice the same variable name - if you were opening the files, would be comparing the file "b. txt" with itself)

To see the hash of the contents of the files do:

import hashlib

file_0 = 'a.txt'
file_1 = 'b.txt'

hash_0 = hashlib.new('ripemd160')
hash_0.update(open(file_0, 'rb').read())

hash_1 = hashlib.new('ripemd160')
hash_1.update(open(file_1, 'rb').read())

if hash_0.digest() != hash_1.digest():
    print(f'O arquivo: {file_0} é diferente do arquivo: {file_1} ')

(you were also using the assert wrong way. Avoid using the assert in the same code - and reserve this command for testing. Although it seems a shortening of a if followed by a raise in some places, it is a test that is disabled depending on the parameters with which Python Runtime runs - so there is a lot of developer there assert in a production code that can get in trouble sooner or later, with a test that is not done, because of an apparently innocuous configuration changed elsewhere)

  • Forgive ignorance, but wouldn’t it be better to open files for reading rather than writing? Using rb instead of wb?

  • That should have been "Rb" - you’re wrong there - I’ll change.

5

To compare using hashlib can be done like this:

import hashlib

def open_txt(file):
    with open(file) as f:
        return "".join(f.read())

file_1 = 'a.txt'
file_2 = 'b.txt'
text_1 = open_txt(file_1)
text_2 = open_txt(file_2)

def compare(text_1, text_2):
    hash_1 = hashlib.md5()
    hash_1.update(text_1.encode('utf-8'))
    hash_1 = hash_1.hexdigest()

    hash_2 = hashlib.md5()
    hash_2.update(text_2.encode('utf-8'))
    hash_2 = hash_2.hexdigest()

    assert hash_1 == hash_2, f'O arquivo: {file_1} é diferente do arquivo: {file_2} '


compare(text_1, text_2)

But I find it more practical and fast to use filecmp that compares byte to byte.

import filecmp
filecmp.cmp('a.txt', 'b.txt')

Browser other questions tagged

You are not signed in. Login or sign up in order to post.