• entries
97
112
• views
84939

Fun with ANSI C and Binary Files

2264 views

Hello Everyone,

After that last rant post I felt obligated to actually post something useful. I feel horrible when I rant like that but sometimes it just feels necessary.
On a side note however, yes I still hate VS 2012 Express. After all these years you think Microsoft would Update their damn C compiler ugh.

Ok so on to the meat of the post. Though my various browsings of the forums I have seen people with an interest in pure C programming. It really makes me feel good inside because it really is a nice language. So many people say it is ugly and hackish and very error prone. I tend to disagree I actually feel it is much less error prone then C++. We will get into why in a few moments. First before I get into code let me explain a bit why I love Pure C despite its age.

The first thing I really like about C is the simplicity. It is a procedural language which makes you think in steps instead of abstractions and objects. In other words it causes you to think more like the actual computer thinks in a general perspective. I think this is great for beginners because it forces you to think in algorithms which are nothing but a series of steps.

The next part I like about it is the very tiny standard library. It is so small you can actually wrap your head around it without a reference manual. This does come with some downfalls as you don't get the robust containers and other things C++ comes with esenssially in C you have to write your own ( Not as bad as it sounds ).

Lastly raw memory management. No worrying about whether or not you are using the right smart pointer or not etc... Now I know what people are going to say that C is more prone to memory leaks then C++ becuase of the lack of smart pointers. Sure you can leak memory but it is a lot harder to do so in C IMHO. The thing is again C is procedural without OOP. This means when programming in a procedural way you are not going to be accidentally copying your raw pointers. So the only way really to leak is to forget to free the memory. Which under standard C idiom is rather hard to do. In C the moto goes what creates the memory frees the memory. What this mean is if you have a module say a storage module that dynamically allocates with malloc that module must be responsible for cleaning up the memory it created. You will see this in action next.

As I said ANSI C allows you to think in the terms of algorithms without the sense of having to abstract everything.
To provide an example I created a very basic .tga image loader based off of nothing but the Specification.

Keep in mind this is simple particularly for using in a texture. Basically I skipped a bunch of uneeded header elements and extension elements because they are not needed as I am not saving a new copy of the file so I just grab the useful bits.

So from a design perspective this is what we need.
A structure that will store our image data.
A function to load the data
Finally a Function to clean up our dynamically allocated memory (Due to the above best practice)

From this we get the following header file.
tgaimage.h
#ifndef TGAIMAGE_H#define TGAIMAGE_H/** Useful data macros for the TGA image data.* The data format is layed out by the number of bytes* each entry takes up in memory where* 1 BYTE takes up 8 bits.*/#define BYTE unsigned char /* 1 BYTE 8 bits */#define SHORT short int /* 2 BYTES 16 bits *//** TGA image data structure* This structure contains the .tga file header* as well as the actual image data.* You can find out more about the data this contains* from the TGA 2.0 image specification at* http://www.ludorg.net/amnesia/TGA_File_Format_Spec.html*/typedef struct _tgadata { SHORT width; SHORT height; BYTE depth; BYTE *imgData;} TGADATA;/** Load .tga data into structure* params: Location of TGA image to load* return: pointer to TGADATA structure*/TGADATA* load_tga_data(char *file);/** Free allocated TGADATA structure* return 0 on success return -1 on error*/int free_tga_data(TGADATA *tgadata);#endif
The above should be self explanitory due to the comments provided.
I created 2 #define Macros to make it easier to manage the typing. The specification defines the size of the data at each offset which all revolves around either 8 or 16 bits.

Now we have the implementation of our functions. Here is that file.
tgaimage.c
#include #include #include "tgaimage.h"TGADATA* load_tga_data(char *file){ TGADATA *data = NULL; FILE *handle = NULL; int mode = 0; int size = 0; handle = fopen(file, "rb"); if (handle == NULL) { fprintf(stderr, "Error: Cannot find file %s\n", file); return NULL; } else { data = malloc(sizeof(TGADATA)); /* load header data */ fseek(handle, 12, SEEK_SET); fread(&data->width, sizeof(SHORT), 1, handle); fread(&data->height, sizeof(SHORT), 1, handle); fread(&data->depth, sizeof(BYTE), 1, handle); /* set mode variable = components per pixel */ mode = data->depth / 8; /* set size variable = total bytes */ size = data->width * data->height * mode; /* allocate space for the image data */ data->imgData = malloc(sizeof(BYTE) * size); /* load image data */ fseek(handle, 18, SEEK_SET); fread(data->imgData, sizeof(BYTE), size, handle); fclose(handle); /* * check mode 3 = RGB, 4 = RGBA * RGB and RGBA data is stored as BGR * or BGRA so the red and blue bits need * to be flipped. */ if (mode >= 3) { BYTE tmp = 0; int i; for (i = 0; i < size; i += mode) { tmp = data->imgData; data->imgData = data->imgData[i + 2]; data->imgData[i + 2] = tmp; } } } return data;}int free_tga_data(TGADATA *tgadata){ if (tgadata == NULL) { return -1; } else { free(tgadata->imgData); free(tgadata); return 0; }}

Lets start at the top with the tga_load_image function.

In C the first thing we need to do is set up a few variables.
We have one for our structure, the file, the mode and the size. More on the mode and size later.

We use fopen with "rb" to open up the file to read binary data.
If the file open was successful we can go ahead and start getting data.

The first thing we do here is use malloc to reserve memory for our structure and use sizeof so we know how much memory we need.

Now we load the header data. I use the fseek function to get in position for the first read.
fseek in the first arument takes a pointer to our opened file. The second argument is actually the first offset we want to read from and SEEK_SET says to count that offset from the beginning of the file. An offset is the number of bytes into a file. The specification for the tga file tells us that the width of the image starts at offset 12. It is two bytes in size so we ensure we only read 2 bytes from the file with sizeof(SHORT) and tell it to do 1 read of that size. Then the internal pointer for file position is now at offset 14 which is where our hight is. We do the same then finally read the depth which is one byte in size placing us at offset 17.

Now that the header data we need is read and stored we need to handle that actual image data which is tricky. This is where our mode and size variables come into play.

You find the mode of the image data by dividing the depth by 8. So if you have a 24 bit depth and divide it by 8 you get a mode of 3.
This mode is actually the number of components each pixel in the data has. The tga spec defines a mode of 3 as BGR and a mode of 4 as BGRA. Blue Green Red and Blue Green Red Alpha respectivly. Now the actual size of the section of image data varies depending on the image so we need to calculate the size of that section so we don't read to far into the file corrupting our data. To do this we need the width, height, and mode. By multiplying them together we get the size of the section. 3 bytes per pixel for each pixel defined by width and height. Hope that makes sense.

Now that we have the size of this image data section we can dynamically allocate our imgData section of the structure to the appropriate memory size.

We then need to fseek to the appropriate section of the file which is offset 18 for this data and we read in the full section because it is defined as a run of bytes.

Now we have the data ensure the file is closed to free the memory allocated by fopen.

Ok remember just above I said mode 3 and 4 are BGR and BGRA respectivly. This is not good because if we use this as a texture is say OpenGL it needs to be in RGB or RGBA format. So we ensure the mode here and we need to flip the red and blue bytes around in the data.
To flip the bytes we are doing some very basic index math because the data in the pointer is technically an array it allows us to hop the right number of indicies in this case 2 because RGB will always be a triplet and we don't care about A or G because the are in the proper location. If you don't understand the pointer and array nomenclature feel free to ask in the comments or read the K&R book if you can get a hold of a copy.
Finally we return our structure to the caller.

Our last function is free_tga_data this one is important due to the rules above. The tgaimage module allocated data so it is it's responsibility to provide the means to clean it up.

Here is really simple we take in the structure as an argument and make sure it is not NULL as calling free on a NULL pointer is undefined and will likley segfault the application. If all is good we FIRST clean up the imgData portion of the structure. If we don't do this it will leak as it was a separate malloc. Then we free the rest of the tgadata structure from memory.

Hopefully this post was helpful to some of the C programmers out there. This is a very nice example to demonstrate how clean a solution in C can be as well as allows for a nice demonstration on how best to avoid memory leaks in C applications due to various best practices. Not only this but it also demonstrates how to traverse a binary file using the files Offsets from nothing more then the specification.

That is all for now have a great day.

Good stuff! I was actually just looking for a simple C image loader (for any common file format) and this seems to fit the bill quite nicely.

Couple of suggestions though:[list]
[*]Check the return value on malloc and fail-fast if malloc didn't succeed
[*]Check the return value on those fread and fseek calls and fail gracefully if they're not what you'd expect. You never know what you're reading
[*]Maybe do a sanity check on data->width, data->height, data->depth after reading? In case I'll pass a non-TGA file these may get insanely high.
[*]Change the load function signature to "TGADATA* load_tga_data([b]const[/b] char *file)". I may have a const string with the file path and with strict compiler setting I wouldn't be able to use that as path otherwise.
[/list]
Anyway thanks for the writeup and code!

Glad you liked it and thanks for the suggestions I will keep those in mind as I better refine the code for later use in my project I am planning.