Int Size, Endianness and Serialisation

Started by
3 comments, last by swiftcoder 9 years, 10 months ago

I'm trying to write a function that takes an std::vector of game objects and saves them all as an image.

The game I'm making revolves around placing game objects in a specific way and I wanted users to be able to share different placements easily. Since there won't be much data, I decided that for every game object, each of it's applicable data members would be converted to a colour, and saved in the image as a pixel. To load game states my game would open the image and read each pixel from left to right, top to bottom and re-construct each game object.

The bit that I'm a little stuck on is how to convert a number to a colour, and back again. For example the x coordinate of a specific game object could be 42, I need a way to take 42, and convert it to 4 individual bytes (r,g,b,a).

I've realised that this, by itself, isn't that difficult. But I was wondering If I need to take the platform's integer size and endianness into consideration when I'm building this? And If so what would be a good way to deal with all that?

Thanks.

Advertisement

Well, it depends on how you read and write the color value. If you write out the color as an array of bytes, with one byte for red followed by one byte of green followed by one byte of blue followed by one byte of alpha, then you don't have to concern yourself with endianness since you're only dealing with bytes. If you encode the RGBA value as a 4 byte wide integer and use bit masking/shifting operations to get the 8 bit color channel values then you will have to think about the endianness of the machine, because an integer is 4 bytes wide and thus which byte is the least significant byte matters.

Typically, when you need to deal with unknown endianness you would write a byte order marking at the top of the file, which when read back by another machine can indicate if the content of the file needs to be endian swapped. A common byte order marking is the 16 bit integer 0xfeff. If you read a byte order mark from a file and it comes out as 0xfffe, then you have to reverse the endianness of the contents of that file. If it gets read in as 0xfeff, then you won't need to perform any endian swapping.

EDIT: Note that endian problems don't come up much unless you're saving data on one machine and reading that data back on a separate machine with a different endianness. If you're only targeting one platform, then you won't have endian problems. A four byte integer written to disk and read from the disk will come out unchanged. But if you write a four byte integer to disk on a little endian machine and then read it from disk on a big endian machine, it'll come out backwards on the big endian machine.

You don't need to worry about endian-ness unless you are switching machine families or using Java.

The Java comment is because quite a lot of its serialization and networking code automatically translates things to big endian for transmission and storage.

If Java is not part of the equation...

If you are saving it and loading all on x86-based PCs, everything will be the same; the x86 family is little endian. You can move it to Intel and AMD and other chips, they are all based on the x86 family and use the same endian-ness.

But if you are writing on a PC moving over to an XBox 360 it will be different; the PPC family uses big endian.


So if you are doing everything on your x86 machine running Windows and it will only be run on other similar machines, don't worry about reordering your data.

Thanks for your speedy reply Samith, lots of good info.

Typically, when you need to deal with unknown endianness you would write a byte order marking at the top of the file, which when read back by another machine can indicate if the content of the file needs to be endian swapped. A common byte order marking is the 16 bit integer 0xfeff. If you read a byte order mark from a file and it comes out as 0xfffe, then you have to reverse the endianness of the contents of that file. If it gets read in as 0xfeff, then you won't need to perform any endian swapping.

This in particular was very enlightening. I'll use a byte order marking to check weather to swap all other numbers read.

You don't need to worry about endian-ness unless you are switching machine families or using Java.

It's nice to know that pretty much all conventional PCs use little endain. But I did a bit of searching and found that vc++ has some built in functions for swapping byte order (I think they're quite well optimised as well). With all this info I found that it really wasn't much effort to support big and little endian (unless I've overlooked something) even if it's not likely to be necessary. Good practice I guess.

This is what I managed to come up with so far:

It still needs to be cleaned up, but I think I've not overlooked anything important (other than a load function..)


bool save_as_image( std::string argFilename, std::vector<ggCell>& argCells )
{
    // saves a vector of cells as an image.

    // could pack more into each pixel but it's not necessary

    // calculate the image size needed
    unsigned int dimWidth = 128;
    unsigned int hlpBytesNeeded = GG_BYTES_PER_CELL*argCells.size();
    unsigned int dimHeight = 17 + (hlpBytesNeeded/4)/128;
    out("height is " + uint_to_string(dimHeight));

    // load template header
    sf::Image imgHeader;
    if ( !imgHeader.loadFromFile("data/pattern_header.png") ) {
        return false;
    }

    // create new image with header
    sf::Image imgPattern;
    imgPattern.create( dimWidth, dimHeight, sf::Color(0,0,0));
    imgPattern.copy( imgHeader, 0, 0 );

    // write all pixels
    unsigned int x = 0;
    unsigned int y = 16;
    for ( unsigned int index=0; index<argCells.size(); ++index ) {
        // gen colours
        sf::Color colX, colY, colType;
        colX = int_to_pixel( argCells[index].x );
        colY = int_to_pixel( argCells[index].y );
        colType = int_to_pixel( argCells[index].type );
        // write to image
        imgPattern.setPixel( x, y, colX );
        increment(x,y);
        imgPattern.setPixel( x, y, colY );
        increment(x,y);
        imgPattern.setPixel( x, y, colType );
        increment(x,y);
        // output colours used
        out("[" + std::to_string(index) + "] ");
        out("X: ( " + std::to_string(colX.r) + ", " + std::to_string(colX.g) + ", " + std::to_string(colX.b) + ", " + std::to_string(colX.a) + " )\n");
        out("[" + std::to_string(index) + "] ");
        out("X: ( " + std::to_string(colY.r) + ", " + std::to_string(colY.g) + ", " + std::to_string(colY.b) + ", " + std::to_string(colY.a) + " )\n");
        out("[" + std::to_string(index) + "] ");
        out("X: ( " + std::to_string(colType.r) + ", " + std::to_string(colType.g) + ", " + std::to_string(colType.b) + ", " + std::to_string(colType.a) + " )\n");
    }

    // save image
    imgPattern.saveToFile( argFilename );
    return true;

}

sf::Color int_to_pixel( int number ) {
    // code number to look better as an image
    number += GG_INT_CODE;
    // convert to pixel
    int r_single = number & 0xff000000;
    int g_single = number & 0x00ff0000;
    int b_single = number & 0x0000ff00;
    int a_single = number & 0x000000ff;
    int r = r_single >> 24;
    int g = g_single >> 16;
    int b = b_single >> 8;
    int a = a_single;
    sf::Color colour((unsigned char)r,(unsigned char)g,(unsigned char)b,(unsigned char)a);
    return colour;
}

int pixel_to_int( sf::Color colour, bool swap ) {
    int r = (int) colour.r;
    int g = (int) colour.g;
    int b = (int) colour.b;
    int a = (int) colour.a;
    int number = r << 24;
    number = number | g << 16;
    number = number | b << 8;
    number = number | a;
    if ( swap ) {
        number = (int)_byteswap_ulong( number );
    }
    // decode
    number -= GG_INT_CODE;
    return number;
}

void increment( unsigned int& x, unsigned int& y ) {
    ++x;
    if ( x >= 128 ) {
        x = 0;
        ++y;
    }
}


it will be different; the PPC family uses big endian.

Be (somewhat) careful: not *all* PowerPCs use big endian.

The chips themselves can flip between big/little endian mode, in some cases even on a per-page basis. And, at least in theory, can run little endian programs while the OS itself is big endian.

Tristam MacDonald. Ex-BigTech Software Engineer. Future farmer. [https://trist.am]

This topic is closed to new replies.

Advertisement