What’s wrong with these hash algorithms?

Asked

Viewed 61 times

2

I’ve researched several hash algorithms, and found some examples on Soen, but they’re returning different hashes to the same file:

import java.io.File;
import java.io.FileInputStream;
import java.io.IOException;
import java.io.InputStream;
import java.nio.file.Files;
import java.security.DigestInputStream;
import java.security.MessageDigest;
import java.security.NoSuchAlgorithmException;

public class ObterHash {

    private static String algoritmo = "SHA-256";

    public static void main(String[] args) throws NoSuchAlgorithmException, IOException {
        File arq = new File("C:\\img.jpg");
        System.out.println(toHex(gerarHash1(arq)));
        System.out.println(toHex(gerarHash2(arq)));
        System.out.println(toHex(gerarHash3(arq)));
    }

    // Adaptado de: https://stackoverflow.com/a/19304310/7711016
    public static byte[] gerarHash1(File arq) throws NoSuchAlgorithmException, IOException {
        DigestInputStream shaStream = new DigestInputStream(new FileInputStream(arq),
                MessageDigest.getInstance(algoritmo));
        // VERY IMPORTANT: read from final stream since it's FilterInputStream
        byte[] shaDigest = shaStream.getMessageDigest().digest();
        shaStream.close();
        return shaDigest;
    }

    // Adaptado de: https://stackoverflow.com/a/26231444/7711016
    public static byte[] gerarHash2(File arq) throws IOException, NoSuchAlgorithmException {
        byte[] b = Files.readAllBytes(arq.toPath());
        byte[] hash = MessageDigest.getInstance(algoritmo).digest(b);
        return hash;
    }

    // Adaptado de: https://stackoverflow.com/a/304275/7711016
    public static byte[] gerarHash3(File arq) throws NoSuchAlgorithmException, IOException {
        InputStream fis = new FileInputStream(arq);

        byte[] buffer = new byte[1024];
        MessageDigest complete = MessageDigest.getInstance(algoritmo);
        int numRead;

        do {
            numRead = fis.read(buffer);
            if (numRead > 0) {
                complete.update(buffer, 0, numRead);
            }
        } while (numRead != -1);

        fis.close();
        return complete.digest();
    }

    private static String toHex(byte[] bytes) {
        StringBuilder ret = new StringBuilder();
        for (int i = 0; i < bytes.length; ++i) {
            ret.append(String.format("%02X", (bytes[i] & 0xFF)));
        }
        return ret.toString();
    }
}

When executed, I had this output (a hashcode on each line):

E3B0C44298FC1C149AFBF4C8996FB92427AE41E4649B934CA495991B7852B855 010F60D2927A35D0235490136EF9F4953B7E453073794BCAF153D20A64544EA 010F60D2927A35D0235490136EF9F4953B7E453073794BCAF153D20A64544EA

See that the hash generated by gerarHash2() and gerarHash3() are equal to each other, but different from gerarHash1(). Why? It’s the algorithm of gerarHash1() what’s wrong? If so, what’s wrong with it?

  • Your question has +1/-1. +1 is mine. I don’t know why -1.

1 answer

2


Let’s see the code of the builder of DigestInputStream:

    /**
     * Creates a digest input stream, using the specified input stream
     * and message digest.
     *
     * @param stream the input stream.
     *
     * @param digest the message digest to associate with this stream.
     */
    public DigestInputStream(InputStream stream, MessageDigest digest) {
        super(stream);
        setMessageDigest(digest);
    }

He calls the superclass builder:

    /**
     * Creates a <code>FilterInputStream</code>
     * by assigning the  argument <code>in</code>
     * to the field <code>this.in</code> so as
     * to remember it for later use.
     *
     * @param   in   the underlying input stream, or <code>null</code> if
     *          this instance is to be created without an underlying stream.
     */
    protected FilterInputStream(InputStream in) {
        this.in = in;
    }

And also call the Setter:

    /**
     * Associates the specified message digest with this stream.
     *
     * @param digest the message digest to be associated with this stream.
     * @see #getMessageDigest()
     */
    public void setMessageDigest(MessageDigest digest) {
        this.digest = digest;
    }

Then you call the method getMessageDigest():

    /**
     * Returns the message digest associated with this stream.
     *
     * @return the message digest associated with this stream.
     * @see #setMessageDigest(java.security.MessageDigest)
     */
    public MessageDigest getMessageDigest() {
        return digest;
    }

Note that nowhere, the DigestInputStream is being read or is reading the bytes of the file FileInputStream passed to it. So when you call the method digest(), the MessageDigest does not know the contents of the file and is empty. Thus, the hash generated is the same as that:

System.out.println(toHex(MessageDigest.getInstance(algoritmo).digest()));

What went wrong? Note this comment:

        // VERY IMPORTANT: read from final stream since it's FilterInputStream

He’s just saying that you have to read the content of the last stream produced, which on reply on Soen It made perfect sense since there he is enveloping the streams in each other consecutively, which is not the case for him. So just replace this comment with this:

shaStream.readAllBytes();

By adding this, the generated hash is the same as the other two methods. Thus, your method gerarHash1 can be rewritten like this:

    // Adaptado de: https://stackoverflow.com/a/19304310/7711016
    public static byte[] gerarHash1(File arq) throws NoSuchAlgorithmException, IOException {
        try (DigestInputStream shaStream = new DigestInputStream(new FileInputStream(arq),
                MessageDigest.getInstance(algoritmo))) {
            shaStream.readAllBytes();
            return shaStream.getMessageDigest().digest();
        }
    }

Notice that in this form I’m using the Try-with-Resources. In the method gerarHash3, recommend that you also use the Try-with-Resources. I also recommend putting the modifier final in the field algoritmo and rename it to ALGORITMO to be in accordance with the conventions of language.

Note: THE method readAllBytes() is only available from Java 9. In previous versions, you will need to use something else instead (a loop while probably) to simulate their behavior.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.