Your notion is correct. You already know how registrars handle numerical data varies. I will base myself on in this IBM document to answer.
The problem
#include <stdio.h>
#include <string.h>
int main(int argc, char* argv[]) {
FILE* fp;
/* Our example data structure */
struct {
char one[4];
int two;
char three[4];
} data;
/* Fill our structure with data */
strcpy(data.one, "foo");
data.two = 0x01234567;
strcpy(data.three, "bar");
/* Write it to a file */
fp = fopen("output", "wb");
if (fp) {
fwrite(&data, sizeof (data), 1, fp);
fclose(fp);
}
}
I put in the Github for future reference.
This code has problems of endianess. Depending on the platform it presents a different result because each platform organizes the bytes of a numerical type with the byte more significant first or last. On some platforms the structure would be mounted like this:
foo..4Vxbar.
In others, like this:
foo.xV4.bar.
Note that the characters are not affected, only the numbers. So if you write a file with data in numerical format it can be read erroneously on another platform. If it is known that a file may circulate between platforms, it will have to ensure that the number format in the file is identified and converted properly before its use.
How to solve the problem
It is not always possible to identify the significance of bytes. The file must inform (itself or by documentation) what significance is used and the programs that access it must understand this information. In network communication there must be a protocol that passes this information or documents it. There is not always control over these possibilities.
Even for ease it is common to solve the problem by recording/transmitting the numerical data in a text format. When using a text you are sure that each byte will be in the order you expect. When I speak in text, there can be two solutions:
transform a number into a textual representation of the number, that is, an integer that is worth 12359
is represented by a string "12359"
. This is not very efficient in terms of size and processing but solves the problem
treat the bytes as characters and how to mount the number
This doesn’t mean you can just make one cast of the text in plain numbers. A cast is precisely what causes the problem because he considers the significance of bytes are in the expected order. What is not true if the data came from a platform with different significance.
How the platform works
It is easy to know what is the significance of the platform:
const int i = 1;
#define is_bigendian() ( (*(char*)&i) == 0 )
int main(void) {
int val;
char *ptr;
ptr = (char*) &val;
val = 0x12345678;
if (is_bigendian()) {
printf(“%X.%X.%X.%X\n", u.c[0], u.c[1], u.c[2], u.c[3]);
} else {
printf(“%X.%X.%X.%X\n", u.c[3], u.c[2], u.c[1], u.c[0]);
}
exit(0);
}
I put in the Github for future reference.
How to know the significance of the data to be manipulated
If by documentation or by information provided on a data received you know the significance of what will be worked and you know the significance of the platform, you know if you need to do some transformation, possibly reverse the bytes before using if the option is to use the second form listed above.
If you do not know and have no way to verify what is the significance of the information, there is no way to solve the problem. There is an architectural problem of the application there.
In the latter case it is clear that it is possible to ask for the user’s help. In some cases the user may be able to identify some values whether they are correct or not. Then you have to present the data obtained in both ways and ask which one makes the most sense. This may be flawed but it is an extreme solution if nothing else is possible. The degree of reliability will be given by the user’s ability to identify correctly. I’m not saying this is recommended, but it’s the only solution.
Of course, data that assuredly came from the platform itself is not a concern.
So the lesson is that data that has the potential to be passed to other platforms - and this is very common nowadays - needs to have significance information somewhere along with the transmitted data or should it be well documented.
Example of reordering
The document example shows the use of TCP that is documented that works with big endian. Porting platforms with this significance can be slightly more efficient by not having to reorder bytes.
A way to reorder bytes:
const int i = 1;
#define is_bigendian() ( (*(char*)&i) == 0 )
short reverseShort(short s) {
unsigned char c1, c2;
if (is_bigendian()) {
return s;
} else {
c1 = s & 255;
c2 = (s >> 8) & 255;
return (c1 << 8) + c2;
}
}
int main (int argc, char* argv[]) {
printf("%d", reverseShort((short)1234));
}
I put in the Github for future reference.
The problem does not happen with any number.
Evidently numbers with only one byte do not have this problem. This usually affects short
, int
, long
, float
and double
, with or without signal (the signal obviously produces even more wrong result, if this is possible :) )
Completion
To facilitate everyone uses ready-made solutions, formats and protocols that treat this and leave transparent to the application. You only have to worry about this if you’re doing something low-level.
Wikipedia article.
https://arqufs2008.wordpress.com/2008/05/26/little-endian-vs-big-endian/
– Gabriel Rodrigues
Okay, but what does this entail in portability?
– jlHertel
Suppose I had to send you a phone number. But it has to be through sheets of paper, in which each sheet only takes 1 digit. If you don’t know the order in which I wrote my papers, when you get them you can’t redo the number.
– pmg
20150212
data "big endian" (with the "big end" in the beginning);12022015
data "little endian" (with the "little end" in the beginning)– pmg