As colleagues have already explained in other replies, checksum essentially serves to verify the integrity of a data sequence (whether it is in a radio transmission, Internet, smoke signal, etc., or in a file on disk, sent to someone by email or available for download).
I only decided to answer because I would like to provide a intuition behind the subject. The English word checksum means exactly "sum" (sum) from/to "verification" (check in), because the principle of algorithms is as follows:
- The objective is to produce a numerical value, easily calculated on both sides of a transmission (i.e., by both the sender and the receiver), which represents not only the content in the archive but also the order in which the content is.
- Once this value has been calculated by the issuer, the file (or data packet) is transmitted together with the value of the checksum. The receiver recalculates the value for the received file/package and compares it to the original value sent by the sender. If different, there was some problem in the transmission (for example, any byte was changed, perhaps by noises in the transmission medium or even by bad faith of a third party).
And as the value of a checksum represents the content and order in a data set? There are several ways. An innocent well, which in practice serves only as a didactic illustration, is the following:
You scroll through the characters of the file/package from start to finish,
by multiplying the value of the character (its ASCII value, for example)
by the index (the position of the character in the driven sequence). That
result is then accumulated in a total value (the such checksum). Must
be easy to notice if any character is changed (a A
flipped
one X
, for example) or change position (the sequence ABC
flipped
BCA
, for example), the multiplication value [valor do caractere] * [índice]
will be different in that position, incurring on that the checksum resulting from the entire package will necessarily be different.
Example of C++ code that makes this calculation for a data string s
(without necessarily considering the \0
as string termination, so the function simpleCheckSum
expects a size allocated to the variable size
):
#include <stdio.h>
long long simpleCheckSum(char *s, int size)
{
long long chkSum = 0;
for (int i = 0; i < size; i++)
chkSum += (s[i] * i);
return chkSum;
}
int main(int argc, char** argv)
{
printf("CheckSum de 'ABC': %lld\n", simpleCheckSum("ABC", 3));
printf("CheckSum de 'AbC': %lld\n", simpleCheckSum("AbC", 3));
printf("CheckSum de 'BCA': %lld\n", simpleCheckSum("BCA", 3));
return 0;
}
Result of this code:
CheckSum de 'ABC': 200
CheckSum de 'AbC': 232
CheckSum de 'BCA': 197
Note that this algorithm, although functional, is quite innocent and may not be suitable for very long files/packages (because the resulting number tends to grow a lot). It was only used to illustrate the principle. The most common algorithms used in practice are described on Wikipedia, and the answers you’ve already gotten.
@Diegosantosseabra Its edition brought a bit of controversy: http://meta.pt.stackoverflow.com/q/5629/132
– Victor Stafusa