What is a checksum for?

Asked

Viewed 10,756 times

15

I was reading a question here on the website and came across the term checksum.

Several other times I have seen this term being used when talking about file transfer or some important data.

After all, what is and what is the use of this checksum? I would like some simple example (I prefer C#, but it can be any language) of how to make a checksum.

3 answers

19


In a nutshell, the checksum serves to check, for example, if a file is exactly the same file after a transfer. To check if it has not been changed by third party or if it is not corrupted.

The idea is, for example, to take all the bytes of a file and add them one by one, and get a value, the checksum. After a transfer, this value of the checksum should be the same both in the file sent by the sender and in the received by the recipient. Even so, it may not be guaranteed that the file is exactly the same. Hence, there are several ways to make this sum.

In my company, for example, we use md5sum using the MD5 algorithm to calculate the checksum so that our customers can ensure that they are using the correct version of our software, after transferring it just check the file with the md5sum.

Example:

diagrama demonstrando o checksum

source

  • 1

    @Diegosantosseabra Its edition brought a bit of controversy: http://meta.pt.stackoverflow.com/q/5629/132

13

As colleagues have already explained in other replies, checksum essentially serves to verify the integrity of a data sequence (whether it is in a radio transmission, Internet, smoke signal, etc., or in a file on disk, sent to someone by email or available for download).

I only decided to answer because I would like to provide a intuition behind the subject. The English word checksum means exactly "sum" (sum) from/to "verification" (check in), because the principle of algorithms is as follows:

  1. The objective is to produce a numerical value, easily calculated on both sides of a transmission (i.e., by both the sender and the receiver), which represents not only the content in the archive but also the order in which the content is.
  2. Once this value has been calculated by the issuer, the file (or data packet) is transmitted together with the value of the checksum. The receiver recalculates the value for the received file/package and compares it to the original value sent by the sender. If different, there was some problem in the transmission (for example, any byte was changed, perhaps by noises in the transmission medium or even by bad faith of a third party).

And as the value of a checksum represents the content and order in a data set? There are several ways. An innocent well, which in practice serves only as a didactic illustration, is the following:

You scroll through the characters of the file/package from start to finish, by multiplying the value of the character (its ASCII value, for example) by the index (the position of the character in the driven sequence). That result is then accumulated in a total value (the such checksum). Must be easy to notice if any character is changed (a A flipped one X, for example) or change position (the sequence ABC flipped BCA, for example), the multiplication value [valor do caractere] * [índice] will be different in that position, incurring on that the checksum resulting from the entire package will necessarily be different.

Example of C++ code that makes this calculation for a data string s (without necessarily considering the \0 as string termination, so the function simpleCheckSum expects a size allocated to the variable size):

#include <stdio.h>

long long simpleCheckSum(char *s, int size)
{
    long long chkSum = 0;
    for (int i = 0; i < size; i++)
        chkSum += (s[i] * i);
    return chkSum;
}

int main(int argc, char** argv)
{
    printf("CheckSum de 'ABC': %lld\n", simpleCheckSum("ABC", 3));
    printf("CheckSum de 'AbC': %lld\n", simpleCheckSum("AbC", 3));
    printf("CheckSum de 'BCA': %lld\n", simpleCheckSum("BCA", 3));

    return 0;
}

Result of this code:

CheckSum de 'ABC': 200
CheckSum de 'AbC': 232
CheckSum de 'BCA': 197

Note that this algorithm, although functional, is quite innocent and may not be suitable for very long files/packages (because the resulting number tends to grow a lot). It was only used to illustrate the principle. The most common algorithms used in practice are described on Wikipedia, and the answers you’ve already gotten.

  • A doubt: the signature of the first method is really long long simpleCheckSum(...) or it should just be one long?

  • 1

    @jbueno É long long even, since the returned values can be really large (an integral type with at least 64 bits: http://stackoverflow.com/a/18971808/2896619).

  • 1

    Cool, @Luizvieira. As I do not understand C++, I was in doubt. By the way, thank you very much for the answer! It was great.

8

Checksum or checksum serves to help ensure the integrity of communication packets or ensure that a file has not been corrupted.

In the header, a pre-arranged calculation is made based on all the significant bits of the package and the result is sent also in the communication so that the comparison can be made on the other side.

For example, if in the serial communication protocol we send a command, two bytes of payload (command data) and the checksum byte summing all the values, we would have to send the command following this rule.

To send the 0x01 command, and payload of 0x00, 0x00 we would have to checksum is 0x01 and this would be sent at the end. When opening the bytes on the other side, you can be more sure that all the bits are correct, because your sum also gave 0x01. If any bit changes, the checksum would no longer match.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.