What is big-endian and what is the difference to little-endian?

Asked

Viewed 5,442 times

14

I have a basic notion of what the big-endian and the little-endian are, but I cannot see exactly how the different form of storage can complicate portability.

What portability problem occurs due to big-endian vs little-endian and what common ways to solve it?

Note that I am annotating the question both in C and C++ because it is a problem that I believe affects both languages. In this way a response addressing any of the languages will be accepted.

  • https://arqufs2008.wordpress.com/2008/05/26/little-endian-vs-big-endian/

  • Okay, but what does this entail in portability?

  • Suppose I had to send you a phone number. But it has to be through sheets of paper, in which each sheet only takes 1 digit. If you don’t know the order in which I wrote my papers, when you get them you can’t redo the number.

  • 20150212 data "big endian" (with the "big end" in the beginning); 12022015 data "little endian" (with the "little end" in the beginning)

1 answer

13


Your notion is correct. You already know how registrars handle numerical data varies. I will base myself on in this IBM document to answer.

The problem

#include <stdio.h>
#include <string.h>

int main(int argc, char* argv[]) {
    FILE* fp;

    /* Our example data structure */
    struct {
        char one[4];
        int  two;
        char three[4];
    } data;

    /* Fill our structure with data */
    strcpy(data.one, "foo");
    data.two = 0x01234567;
    strcpy(data.three, "bar");

    /* Write it to a file */
    fp = fopen("output", "wb");
    if (fp) {
        fwrite(&data, sizeof (data), 1, fp);
        fclose(fp);
    }
}

I put in the Github for future reference.

This code has problems of endianess. Depending on the platform it presents a different result because each platform organizes the bytes of a numerical type with the byte more significant first or last. On some platforms the structure would be mounted like this:

foo..4Vxbar.

In others, like this:

foo.xV4.bar.

Note that the characters are not affected, only the numbers. So if you write a file with data in numerical format it can be read erroneously on another platform. If it is known that a file may circulate between platforms, it will have to ensure that the number format in the file is identified and converted properly before its use.

Big Endian

Little Endian

How to solve the problem

It is not always possible to identify the significance of bytes. The file must inform (itself or by documentation) what significance is used and the programs that access it must understand this information. In network communication there must be a protocol that passes this information or documents it. There is not always control over these possibilities.

Even for ease it is common to solve the problem by recording/transmitting the numerical data in a text format. When using a text you are sure that each byte will be in the order you expect. When I speak in text, there can be two solutions:

  • transform a number into a textual representation of the number, that is, an integer that is worth 12359 is represented by a string "12359". This is not very efficient in terms of size and processing but solves the problem

  • treat the bytes as characters and how to mount the number

    This doesn’t mean you can just make one cast of the text in plain numbers. A cast is precisely what causes the problem because he considers the significance of bytes are in the expected order. What is not true if the data came from a platform with different significance.

How the platform works

It is easy to know what is the significance of the platform:

const int i = 1;
#define is_bigendian() ( (*(char*)&i) == 0 )

int main(void) {
    int val;
    char *ptr;
    ptr = (char*) &val;
    val = 0x12345678;
    if (is_bigendian()) {
        printf(“%X.%X.%X.%X\n", u.c[0], u.c[1], u.c[2], u.c[3]);
    } else {
        printf(“%X.%X.%X.%X\n", u.c[3], u.c[2], u.c[1], u.c[0]);
    }
    exit(0);
}

I put in the Github for future reference.

How to know the significance of the data to be manipulated

If by documentation or by information provided on a data received you know the significance of what will be worked and you know the significance of the platform, you know if you need to do some transformation, possibly reverse the bytes before using if the option is to use the second form listed above.

If you do not know and have no way to verify what is the significance of the information, there is no way to solve the problem. There is an architectural problem of the application there.

In the latter case it is clear that it is possible to ask for the user’s help. In some cases the user may be able to identify some values whether they are correct or not. Then you have to present the data obtained in both ways and ask which one makes the most sense. This may be flawed but it is an extreme solution if nothing else is possible. The degree of reliability will be given by the user’s ability to identify correctly. I’m not saying this is recommended, but it’s the only solution.

Of course, data that assuredly came from the platform itself is not a concern.

So the lesson is that data that has the potential to be passed to other platforms - and this is very common nowadays - needs to have significance information somewhere along with the transmitted data or should it be well documented.

Example of reordering

The document example shows the use of TCP that is documented that works with big endian. Porting platforms with this significance can be slightly more efficient by not having to reorder bytes.

A way to reorder bytes:

const int i = 1;
#define is_bigendian() ( (*(char*)&i) == 0 )

short reverseShort(short s) {
    unsigned char c1, c2;
    if (is_bigendian()) {
        return s;
    } else {
        c1 = s & 255;
        c2 = (s >> 8) & 255;
        return (c1 << 8) + c2;
    }
}

int main (int argc, char* argv[]) {
    printf("%d", reverseShort((short)1234));
}

I put in the Github for future reference.

The problem does not happen with any number.

Evidently numbers with only one byte do not have this problem. This usually affects short, int, long, float and double, with or without signal (the signal obviously produces even more wrong result, if this is possible :) )

Completion

To facilitate everyone uses ready-made solutions, formats and protocols that treat this and leave transparent to the application. You only have to worry about this if you’re doing something low-level.

Wikipedia article.

  • 1

    An interesting article that I read a while ago and that shows how we sometimes complicate without necessity the byte sorting treatment is this here (In English). It is interesting how it presents a solution that is independent of the order of your machine and is simpler than the one normally used, like the one you presented.

  • @OP, the essence of the answer (code) is correct, but the macro "is_bigendian" is dangerous even having valid Assembly (don’t ask me why).

  • Use a Union of char[2] and short = 0x1234 and check if char[0] = 0x12. This code worked for me when I was studying on the subject in C++.

  • @Lucashenrique Why? : ) I put more to illustrate the shape according to the article I was using. I never had to deal with it but if needed I would certainly research the best ways to arrive at an acceptable result.

  • @mustache right. I researched them for you to use it in the best way :P.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.