Compiled Bitmaps

Programming

Graphics and GPU Programming

Published October 19, 1999 by John Amato, posted by Myopic Rhino

Do you see issues with this article? Let us know.

[font="Verdana, Tahoma, Arial"][size="2"]Drawing simple, transparent bitmaps is still a very common task even in today's 3D games. Gauges, radar screens, text, and power-meters are but a few of the possible elements that may need high-speed blitting, and optimizing this process can help bump up a lagging frame-rate. Here I discuss probably the fastest general method of drawing transparent bitmaps under the context of the most commonly-used PC videogame development platform, the Watcom IDE and its 32-bit DOS4GW extender. First we'll talk about the method in a "pure C" environment, and then we'll make assembly-level optimizations and produce a bitmap compiler for use with the much-detested, but nonetheless convenient WASM assembler.

[/font]

[font="Verdana, Tahoma, Arial"][size="2"]* * *

[/font]
[font="Verdana, Tahoma, Arial"][size="2"]
The standard method of drawing a bitmap with transparency against a background buffer (be it an off-screen buffer or video RAM) is to set up a nested loop and simply traverse the bitmap just as one would read a book: top to bottom, left to right. For each pixel in the source image, the process checks to see if the pixel value is the designated "transparent" value (almost always black, which is conventionally represented by zero). If zero, the pixel is skipped; if non-zero, the pixel is plotted onto the background buffer.

This algorithm works fine, but it is very slow. The biggest speed issue is the fact that every pixel is tested for zero before it is written. A better alternative is to simply generate a sequence of data moves that plots only the non-zero pixels into the correct positions in the background buffer. This can be done by simply pre-processing each bitmap image into a separate function. This function takes as arguments the x and y coordinates of the current sprite position, and perhaps a pointer to the destination buffer (although if you know ahead of time that the destination buffer will always be the same, it is probably faster to use a global pointer than passing it as an argument every time one of these functions is called). The function then uses the x and y coordinates to calculate a starting offset into the destination buffer, and then starts moving actual pixel data into the appropriate positions in the background buffer. Each non-zero pixel in a bitmap image has an offset from the first position in the image, and that offset is used by the function to plot it onto the background buffer. This is best seen with an example of a very small bitmap. Listing 1 shows a "blitter" function for one of the missiles in Drone. The odd-looking calculation at the beginning is a well-known optimization of the commented line preceding it. The actual image of the missile is shown in Figure 1. This particular bitmap graphic is a good learning example because it is only one pixel wide, and you can see the obvious pattern in the offsets into the background buffer.

The result is that all of the testing for zero is gone and furthermore, only the non-zero pixels are actually "processed" (drawn) by the drawing function. The blitter function takes x and y coordinates, and a pointer to a background buffer. This particular example is a very small bitmap; larger bitmaps will obviously generate much larger blitter functions. This blitter was generated using a bitmap compiler (or sprite compiler, as they are sometimes called). I wrote my own bitmap compiler for my game development and it is displayed in Listing 2. This function takes a pointer to a bitmap and a sprite name, and produces a .c file with the name of the sprite you passed to it. This .c file is a C blitter that is ready to be compiled and linked into your program. When you want to draw that bitmap, you simply call the function with the x and y coordinates of the sprite's position, and the image is blitted to your background buffer.

Since most sprites have multiple frames of animation, you will want to use this bitmap compiler to generate a blitter for each frame of your sprites. Nomenclature becomes important here: I named each blitter as the sprite name followed by an underscore and a number which is the index for that frame. The missiles in this particular game each have three frames, so I named the blitter functions for these miss_0, miss_1, and miss_2. When I go to draw the missiles, I call these functions with the appropriate parameters just like any other function. For example, to draw the first frame (frame 0) of a missile in the center of the screen, the function call looks like the following:

[font="Courier New, fixedsys"][size="2"][color="#000088"]miss_0(160, 100, background_buffer); [/color][/font]
[/font][hr][size="3"]Listing 1: [font="Verdana, Tahoma, Arial"][size="2"]

void miss_0 (int x, int y, char *vscreen)

{

 char *buffer;



//buffer = vscreen + (320 * y) + x;

buffer = vscreen + (y << 8) + (y << 6) + x;



*(buffer + 0) = 12;

*(buffer + 320) = 12;

*(buffer + 640) = 12;

*(buffer + 960) = 12;

*(buffer + 1280) = 56;

*(buffer + 1600) = 24;

*(buffer + 1920) = 24;



};

[hr][/font][size="3"]Figure 1:

[font="Verdana, Tahoma, Arial"][size="2"][hr][/font][size="3"]Listing 2: [font="Verdana, Tahoma, Arial"][size="2"]

////////////////////////// "C" sprite compiler: //////////////////////



int Compile_Sprite (char *spritename, char *bitmap, int width, int height) {



// OK - pass this function a string for a filename, a pointer to a single

// frame of a sprite, and the sprite's width and height.  The function will

// then produce a file (of the sprite's name) containing C code which will

// compile to the fastest way to blit a bitmap while still in pure C...



int x, y, offset, bitmap_offset;

FILE *outfile;

char *filename;



filename = strdup(spritename);

strcat(filename, ".c");



if((outfile = fopen(filename,"wt")) == NULL)

{

  printf("Could not open output file %s\n", filename);

  return(1);

};



fprintf(outfile,"\n\nvoid %s (int x, int y, char *vscreen)\n{\n\n",spritename);

fprintf(outfile,"char *buffer;\n\n");

fprintf(outfile,"buffer = vscreen + (y << 8) + (y << 6) + x;\n\n");



bitmap_offset = 0;



for (y = 0; y < height; y++) {

  offset = y * 320;

  for (x = 0; x < width; x++) {

	if (*(bitmap + bitmap_offset) != 0) {

  	fprintf(outfile,"*(buffer + %d) = %d;\n", offset, *(bitmap + bitmap_offset));

	};

	bitmap_offset++;  

	offset++;

  };

};



fprintf(outfile,"\n};\n\n");



fclose(outfile);



return(0);



};

[hr]I have assumed the popular 320x200 resolution mode for this discussion, but the notion can easily be extended to different (higher) resolutions, and the necessary changes for this should be obvious in the code.

Although significantly faster than the scanning algorithm, there are two big drawbacks to this method: one is that clipping to the viewport is just about impossible. The other is that these blitter functions take up significantly more memory than standard bitmaps. This latter result shouldn't be an issue since by now you've realized that extended-DOS and Windows are the only ways to go for developing games... right? So the only real problem is the clipping issue.

If objects in your game "wrap around" to the other side of the viewport when they exceed an edge, then you're OK for the left and right sides. If you think about it for a while, you can come up with a trick to get the top and bottom edges to work in a similar way.

But most modern games don't "wrap" objects around when they cross a screen boundary, so we need a way to clip our bitmaps. The answer is one you probably won't like: we're going to keep both versions of the sprite-drawing methods and use the blitters only when the object in question is totally within the viewport boundaries, and we'll use the standard sprite-drawing function when the object needs clipping. In most games, the objects spend most of their lives either totally within or totally outside the screen boundaries, and only a small percent of the time are they partially visible. So we can probably trim quite a bit of time off of the rendering section by using the blitters when we can, and the clipping version only when necessary. Therefore, in every iteration of our event loop, we must check each object to see if it is totally within the viewport. If so, we set a flag in that sprite (we'll call it "clipped") to false (0). Otherwise, we set it to true (1). In the case where the object is totally outside the viewport, the standard bitmap-drawer should be designed to catch this and simply not draw the image. (Such a sprite-drawing function can be found in almost any book that covers PC graphics, including my personal favorite, Black Art of 3D Game Programming, Andre LaMothe, Waite Group Press).

When we go to draw all of our objects, we test each object's clipped flag. If true, we make a call to the standard sprite-drawing function which will clip the bitmap. Otherwise, we make a call to the appropriate blitter.

Sound simple? Well it is, except for one key word in that last sentence: appropriate. Since a blitter is only good for one frame of a sprite, each sprite will have a blitter for each frame. Our bitmap compiler function will generate blitters with unique names (if we're careful about how we use it)... how do we know which of the blitters to call for any given sprite-draw?

Well, we know which frame the sprite is on by its curr_frame field, so we could set up huge switch statements with the appropriate blitter call for each frame index. This would work, but there would have to be a case statement for each frame of each sprite object in the entire game. This is not an attractive solution.

The real answer - as you might have guessed - is to use pointers. In C, you can actually define pointers to functions. The syntax is a little goofy, because when you define a pointer to a function you have to provide information about the parameters the function will take. Any good reference or even introductory book on C/C++ should cover this notion.
[hr][/font][size="3"]Listing 3: [font="Verdana, Tahoma, Arial"][size="2"]

typedef struct sprite_typ

    	{

    	int x,y;  			     	// position of sprite

    	int save_x,save_y;  	     	// saved position of sprite

    	int dx, dy;  		     	// velocity of sprite

    	int width,height;   	     	// dimensions of sprite in pixels



    	int counter_1; 		     	// some counters for timing and animation

    	int counter_2;

    	int counter_3;



    	int threshold_1;   	     	// thresholds for the counters (if needed)

    	int threshold_2;

    	int threshold_3;



    	// This is an array of pointers to the bitmap images:

    	unsigned char *frames[MAX_SPRITE_FRAMES];  



    	// Array of pointers to the blitter functions:

    	void  (*blitters[MAX_SPRITE_FRAMES])(int, int, char *);



    	int curr_frame;          	// current frame index

    	int num_frames;          	// total number of frames

    	int state;           		// state of sprite, alive, dead...



    	char clipped; 	  		// flag to determine if clipped or not



    	int x_clip,y_clip;   		// clipped position of sprite

    	int width_clip,height_clip;  // clipped size of sprite

    	int visible;         		// used by sprite engine to flag whether visible or not



} sprite, *sprite_ptr;

So the next step, of course, is to create a new field in the sprite structure that is an array of pointers to the blitters for that sprite. A typical sprite data structure is in Listing 3, and the new array of function pointers is highlighted, as well as the new "clipped" field. During initialization, the appropriate function pointers to the blitters are placed in this array. When we go to call the blitter in our sprite-drawing section, we index into this array using the curr_frame field. The actual code for this will look something like this:

for (i = 0; i < MAX_MISSILES; i++) {

	if (missiles.state == SPRITE_ALIVE) {

    	if (missiles.clipped == 0) 

        	missiles.blitters[missiles.curr_frame] (missiles.x, missiles.y, background_buffer);

    	else

     		Sprite_Draw_Clip((sprite_ptr)&missiles);

	};

};

That's it!

[/font]

[font="Verdana, Tahoma, Arial"][size="2"]* * *

[/font]
[font="Verdana, Tahoma, Arial"][size="2"]
There are additional improvements that can be made. The sprite compiler I have listed here generates only C code. While much faster than the standard sprite-draw function, the C blitters still only move one byte (one pixel) at a time. On a 32-bit machine, that's 25% of potential capacity! If you take the time to study some very basic machine language, you will find that it's possible to compile your sprites directly to assembly, where you can take advantage of being able to move 2 bytes (one word) at a time, and even 4 bytes (a double word) at a time. This can increase the sprite-drawing performance by yet another 100% - 300%.

[/font][size="5"]Assembly Compiled Bitmaps: The Pragma Auxiliary Method

OK - are you ready to make the jump to assembly-compiled bitmaps? While developing Drone, I was targeting the 486 as my platform. That idea has been abandoned, but at the time I wanted to increase performance any way possible, and so I actually wrote another bitmap compiler that wrote directly to assembly language. Since I was using Watcom's C/C++ compiler, I had to learn all about their seemingly ridiculous pragma aux syntax for inserting in-line assembly code. At first, I hated it and couldn't imagine their reasons for using the pragma aux syntax when everyone else in the world (Microsoft, Borland, etc.) all use the _ASM syntax. Anyway, the upshot is that Watcom provides you with much greater optimization potential by giving you control over how arguments are passed during function calls. This may sound trivial, but in today's development world, when everyone is so happy over the object-oriented paradigm, function-call overhead can eat your performance alive. Hence, by passing the arguments to your functions directly into registers you can save even more precious clock cycles per call. It can add up. Believe me.

Especially when you consider that you can control which registers get which arguments. This only makes sense if you have a reasonable understanding of assembly programming and how computers really work. Most CPU instructions expect certain registers to have certain values pre-loaded into them. Since most compilers pass arguments on the stack, your first job inside the function is to extract ("pop") the necessary arguments off the stack and move them into the appropriate registers. Under Watcom, you can specify that these arguments are already in the correct registers, and this saves the extra pops from the stack.

Anyway, on to the bitmap compiler. I am assuming you are already thoroughly familiar with the C bitmap compiler (described above). If not, you'd better go re-read it until you do completely understand it, because the assembly-compiler is based on it and is significantly more complicated.

Why are we moving to assembly language? In case you forgot, the C version would only plot a single byte (one pixel) at a time. In assembly, we'll be able to move 4 bytes (4 pixels) at a time when possible. Theoretically, assuming your game sprites are relatively "normal" in shape, this will speed up the drawing of these bitmaps by about 300% on average. The idea is like this: we have the ability to move 4 bytes (a double-word) at a time, 2 bytes (a single-word) at a time, and one byte at a time. These are our choices under the 32-bit x86 architecture. If you think about it for a while, and perhaps play with some examples, you will quickly realize that you cannot compile any given bitmap to a series of nothing but 4-byte moves. The simplest case is when you happen to be processing a row that has an odd number of pixels.

See what I mean?

So, we need an algorithm that will optimize the 4-byte, 2-byte, and 1-byte moves for any given bitmap. Here goes:

We start out just like we did in the C bitmap compiler: looking for the first non-zero pixel. Once we've found one, we simply store it in a variable and continue to see if the next one is non-zero. If the next pixel is zero, our present job is over because we have an isolated pixel. The best we can do in this case is a single-byte move, so we write that line out.

However, if the next pixel is non-zero, then we already know we can do at least a 2-byte move. But we should still look ahead more to see if we can do a 4-byte move, so we store that pixel in yet another variable and look at the next pixel. If this next one is zero, the best we can do is the 2-byte move, so we write that line out and start again with the remainder of the current line. If the third pixel is non-zero, however, we store it in a third variable and continue to see if the fourth pixel is non-zero.

If the fourth pixel is non-zero, we're in the best of luck and can go ahead and write out a 4-byte move and start all over again with the remainder of the current line. But, if the fourth pixel is zero, we have a case of three pixels in a row and the best we can do is write out a 2-byte move followed by a 1-byte move.
[hr][size="3"]Listing 4:

///////////////// Optimized Assembly sprite compiler: ////////////////////



int Compile_Sprite_ASM (char *spritename, char *bitmap, int width, int height) {



// OK - pass this function a string for a filename, a pointer to a single

// frame of a sprite, and the sprite's width and height.  The function will

// then produce a file (of the sprite's name) containing ASM code which will

// compile to the fastest possible way to blit a bitmap PERIOD (without

// specialized hardware)!	...in Watcom/DOS4GW.



int x, y, offset, offset1, offset2, offset3, bitmap_offset;

FILE *outfile;

char byte1, byte2, byte3, byte4;

char *filename;



filename = strdup(spritename);

strcat(filename, ".c");



if((outfile = fopen(filename,"wt")) == NULL)

{

  printf("Could not open output file %s\n", filename);

  return(1);

};



fprintf(outfile,"\n\nvoid %s (int x, int y, char *vscreen);\n\n",spritename);

fprintf(outfile,"#pragma aux %s =                 		\\ \n", spritename);

fprintf(outfile,"    	\"push	ecx\"                	\\ \n");

fprintf(outfile,"    	\"mov 	ecx,ebx\"            	\\ \n");

fprintf(outfile,"    	\"shl 	ecx,08H\"            	\\ \n");

fprintf(outfile,"    	\"shl 	ebx,06H\"            	\\ \n");

fprintf(outfile,"    	\"add 	edi,ecx\"            	\\ \n");

fprintf(outfile,"    	\"add 	edi,ebx\"            	\\ \n");

fprintf(outfile,"    	\"add 	edi,eax\"            	\\ \n");



bitmap_offset = 0;



for (y = 0; y < height; y++) {

  offset = y * 320;



  x = 0;

  while (x < width) {



	if (*(bitmap + bitmap_offset) != 0) {

  	// Pixel is non-black; save it and try to get more:  

  	byte1 = *(bitmap + bitmap_offset);

  	offset1 = offset;

  	bitmap_offset++;

  	offset++;

  	x++;

  	if ((*(bitmap + bitmap_offset) != 0) && (x < width)) {

    	// 2nd pixel is non-black; save it and try to get more:  

    	byte2 = *(bitmap + bitmap_offset);

    	offset2 = offset;

    	bitmap_offset++;

    	offset++;

    	x++;

    	if ((*(bitmap + bitmap_offset) != 0) && (x < width)) {

      	// 3rd pixel is non-black; save it and try to get one more:  

      	byte3 = *(bitmap + bitmap_offset);

      	offset3 = offset;

      	bitmap_offset++;

      	offset++;

      	x++;

      	if ((*(bitmap + bitmap_offset) != 0) && (x < width)) {

        	// BEST CASE!  Write a double-word:

        	byte4 = *(bitmap + bitmap_offset);

        	fprintf(outfile,"    	\"mov 	dword ptr [edi+%xH],0%x%x%x%xH\"  \\  \n",

 	offset1, byte1, byte2, byte3, byte4);

      	}

      	else {

        	// Fourth pixel is black; need a word write for first two pixels

        	// and a byte write for the third:

        	fprintf(outfile,"    	\"mov 	word ptr [edi+%xH],0%x%xH\"  \\  \n", 

 	offset1, byte1, byte2);

        	fprintf(outfile,"    	\"mov 	byte ptr [edi+%xH],0%xH\"  \\  \n", offset3, 

 	byte3);

      	}

    	}

    	else {

      	// Third pixel black; write a word for first two pixels:

      	fprintf(outfile,"    	\"mov 	word ptr [edi+%xH],0%x%xH\"  \\  \n", offset1, 

   byte1, byte2);

    	}  

  	}

  	else {

    	// WORST CASE!  Second pixel black; need to write only first pixel:

    	fprintf(outfile,"    	\"mov 	byte ptr [edi+%xH],0%xH\"  \\  \n", offset1, 

 		byte1);

  	}

	}

	else {

  	// Encountered black pixel; skip it:

  	bitmap_offset++;  

  	offset++;

  	x++;

	}



  };  // end while



};  // end for



fprintf(outfile,"    	\"pop 	ecx\"                	\\ \n");

fprintf(outfile,"    	parm [eax] [ebx] [edi]   		\\ \n");

fprintf(outfile,"    	modify [edi ebx ecx ebx];\n\n");

fprintf(outfile,"\n};\n\n");



fclose(outfile);



return(0);



};

[hr] And this is how we progress through each line. The source code for the assembly bitmap compiler I wrote is in Listing 4. It works just like the C version, but generates a .h file that should be copied into one of the header files in your project. Listing 5 shows the C and assembly versions of the blitter for a blue "probe" missile in one of my games. The bitmap image is shown in Figure 2. Here, you can plainly see the results of our new bitmap compiler in action. The assembly version has 6 less copy instructions in it, and it's because it's able to draw more than one pixel at a time.

[size="3"]Figure 2:

[size="3"]Listing 5:

void probe_0 (int x, int y, char *vscreen)

{



// C-blitter for probe (frame 0)



char *buffer;



buffer = vscreen + (y << 8) + (y << 6) + x;



*(buffer + 1) = 44;

*(buffer + 320) = 44;

*(buffer + 321) = 44;

*(buffer + 322) = 46;

*(buffer + 640) = 44;

*(buffer + 641) = 45;

*(buffer + 642) = 46;

*(buffer + 960) = 44;

*(buffer + 961) = 46;

*(buffer + 962) = 47;

*(buffer + 1280) = 45;

*(buffer + 1281) = 46;

*(buffer + 1282) = 47;

*(buffer + 1600) = 44;

*(buffer + 1601) = 46;

*(buffer + 1602) = 47;

*(buffer + 1920) = 44;

*(buffer + 1921) = 45;

*(buffer + 1922) = 46;

*(buffer + 2241) = 44;



};







// assembly-blitter for probe (frame 0)



void probe_0 (int x, int y, char *vscreen);



#pragma aux probe_0 =                              	\ 

    	"push	ecx"                              	\ 

    	"mov 	ecx,ebx"                          	\ 

    	"shl 	ecx,08H"                          	\ 

    	"shl 	ebx,06H"                          	\ 

    	"add 	edi,ecx"                          	\ 

    	"add 	edi,ebx"                          	\ 

    	"add 	edi,eax"                          	\ 

    	"mov 	byte ptr [edi+1H],02cH"           	\  

    	"mov 	word ptr [edi+140H],02c2cH"       	\  

    	"mov 	byte ptr [edi+142H],02eH"         	\  

    	"mov 	word ptr [edi+280H],02c2dH"       	\  

    	"mov 	byte ptr [edi+282H],02eH"         	\  

    	"mov 	word ptr [edi+3c0H],02c2eH"       	\  

    	"mov 	byte ptr [edi+3c2H],02fH"         	\  

    	"mov 	word ptr [edi+500H],02d2eH"       	\  

    	"mov 	byte ptr [edi+502H],02fH"         	\  

    	"mov 	word ptr [edi+640H],02c2eH"       	\  

    	"mov 	byte ptr [edi+642H],02fH"         	\  

    	"mov 	word ptr [edi+780H],02c2dH"       	\  

    	"mov 	byte ptr [edi+782H],02eH"         	\  

    	"mov 	byte ptr [edi+8c1H],02cH"         	\  

    	"pop 	ecx"                              	\ 

    	parm [eax] [ebx] [edi]                     	\ 

    	modify [edi ebx ecx];

The placement of in-line function definitions in Watcom is a confusing scheme, and the only way I know to get it to work consistently is to place them as I've shown in one of your headers. At any rate, this resulting blitter is the absolute fastest possible way to draw a bitmap transparently against a background buffer without specialized hardware. Bank on it.

An interesting thing to be aware of, and this is what really disappointed me about the pragma aux facility: there is a limit on the amount of in-line assembly code you can insert using it. And it's none too generous, either. I got about 20 lines into compiling an old version of Drone with a large assembly blitter and it barfed announcing I had blown the limit. Sort of makes you wonder just what good the pragma aux syntax is, no?

Anyway, the bottom line is that this in-line method will only work for fairly small bitmaps. This is disappointing, but still worth it for smaller objects in a game such as missiles, bullets, explosion debris, or perhaps small font letters. For larger bitmaps you'll either have to settle for the C blitters, or figure out a way to convert these assembly blitters into WASM - Watcom's bundled macro assembler, and then figure out how to link everything into your project. It's a real pain in the throne. If you really don't want to do that (and I wouldn't blame you), there is another possibility. You could conceivably compile as much of a bitmap as possible into one in-line function, and then continue to compile the next portion of the bitmap into yet another (uniquely-named) in-line function, etc. until the whole bitmap is compiled across multiple in-line functions. Then, you could create a "wrapper" function (which could be purely C) which calls all of the component blitters that make up that one bitmap. You'd be adding some function call overhead by calling the wrapper function, which in turn would call the several component blitter functions. But for very-large bitmaps, this may be a viable option that could still trim a lot of processing time off of your drawing section, while at the same time keeping you from having to get your hands dirty messing with the external assembler.

I sincerely hope Watcom will correct this flaw in the next version of their compiler, but don't count on it: Watcom has been sold twice in the last couple of years, and is currently owned by PowerSoft (yes, creators of Powerbuilder and Sybase). I suspect this vendor has a totally different agenda for the Watcom compiler, and therefore will probably not elect to spend its time helping game developers produce in-line assembly code. It's a shame. But such is the way of things.

[size="5"]Assembly Compiled Bitmaps: The WASM Method

OK - so now you really need to create compiled bitmaps in pure assembly, right? One thing you should probably start getting used to is the notion that assembly language is really not as tough as it seems when you first start looking at it. There is a lot to know, but the best way to get started is to create a simple function in C, and then let the compiler "disassemble" it. Every compiler should offer this functionality. When you disassemble the code for a C function, the compiler should offer you the ability to create a .ASM file that is in proper syntax to be fed into the external assembler. In this way, you can read the resulting file and get a good feel for how to write your own assembly functions that can be assembled and linked into your C project.

Every compiler integrates all this mess in its own way. Since we are interested in high-performance real-time simulations (games), we will focus here on the Watcom 10.x Integrated Development Environment (IDE). When I develop, I use Watcom's Windows IDE - even though my targets are always DOS4GW executables. But everything discussed here can be done in the command-line environment as well, and the Windows IDE almost always displays the corresponding command-line equivalents of what it's doing in the results window.

If you take a C-blitter generated using our original C-bitmap compiler, and disassemble it into a .ASM file, you will see some header information, some startup overhead instructions, and then a long series of instructions that look like the following:

[font="Courier New"][color="#0000FF"]mov byte_ptr [EAX + ] [/color][/font]

Each of these mov instructions corresponds to a line from the original C-blitter source code. The inefficiency is the fact that we are always moving single bytes at a time. We saw in the previous section how to optimize one, two, and four-byte moves to create the fastest possible blitters. The only problem was we were trying to do this in-line using the pragma auxiliary instead of through an external assembler, and we were then horrified to learn that the pragma auxiliary places a very restrictive limit on the amount of assembly code that can placed in-line. So all we need to do is hack the in-line assembly bitmap compiler into something that will generate a true .ASM file that is in proper syntax for the external assembler (WASM). Then, instead of including the .C file for that blitter function, just include the .ASM file into your project. When you go to re-build your project, the IDE will automatically invoke WASM and then link in your assembly blitters. It's really pretty simple.

So the rest of our job is essentially just changing the "before" and "after" sections of the assembly bitmap-compiler to produce a native .ASM file for WASM, using a disassembled C-blitter as an example. The source code for the new WASM-bitmap-compiler is in Listing 6.

I also enhanced this compiler to take an entire sprite structure instead of only a single frame at a time. Now, you pass to the compiler a sprite pointer and a name (which will end up being a filename, so keep it 8 characters or less and let the compiler generate the .ASM extension). The rest of the information, including the number of frames, the pointers to the bitmaps for those frames, and the width and height are all in the sprite structure. The resulting .ASM file will contain the blitters for all the frames of that sprite, each named with the passed-in filename followed by an underscore and the frame index... just like we're used to. You will need to include prototype definitions for these blitters in one of your header files.
[hr][size="3"] Listing 6:

///////////////// Optimized WASM sprite compiler: ////////////////////



int Compile_Sprite_WASM (sprite_ptr the_sprite, char *spritename) {



// OK - pass this function a sprite pointer and a string for a filename.

// The function will then produce a .ASM file (of the sprite name) containing ASM code

// in proper syntax for WASM.  All source code is copyright (C) 1996 John Amato



int i, x, y, offset, offset1, offset2, offset3, bitmap_offset;

FILE *outfile;

char byte1, byte2, byte3, byte4;

char *filename;

unsigned char *bitmap;



filename = strdup(spritename);

strcat(filename, ".ASM");



if((outfile = fopen(filename,"wt")) == NULL)

{

  printf("Could not open output file %s\n", filename);

  return(1);

};



fprintf(outfile,".386p\n");

fprintf(outfile,"            	NAME	%s\n", spritename);

fprintf(outfile,"DGROUP      	GROUP   CONST,CONST2,_DATA,_BSS\n");

fprintf(outfile,"_TEXT       	SEGMENT PARA PUBLIC USE32 'CODE'\n");

fprintf(outfile,"            	ASSUME  CS:_TEXT ,DS:DGROUP,SS:DGROUP\n");



for (i=0; i < the_sprite->num_frames; i++)

  fprintf(outfile,"            	PUBLIC  %s_%d_\n", spritename, i);



for (i=0; i < the_sprite->num_frames; i++) {

  fprintf(outfile,"%s_%d_: 	push	ECX\n", spritename, i);

  fprintf(outfile,"            	mov 	ECX,EDX\n");

  fprintf(outfile,"            	shl 	ECX,08H\n");

  fprintf(outfile,"            	shl 	EDX,06H\n");

  fprintf(outfile,"            	add 	EBX,ECX\n");

  fprintf(outfile,"            	add 	EDX,EBX\n");

  fprintf(outfile,"            	add 	EAX,EDX\n");  



  bitmap_offset = 0;



  bitmap = the_sprite->frames;



  for (y = 0; y < the_sprite->height; y++) {

	offset = y * 320;



	x = 0;

	while (x < the_sprite->width) {



  	if (*(bitmap + bitmap_offset) != 0) {

    	// Pixel is non-black; save it and try to get more:  

    	byte1 = *(bitmap + bitmap_offset);

    	offset1 = offset;

    	bitmap_offset++;

    	offset++;

    	x++;

     	if ((*(bitmap + bitmap_offset) != 0) && (x < the_sprite->width)) {

      	// 2nd pixel is non-black; save it and try to get more:  

      	byte2 = *(bitmap + bitmap_offset);

      	offset2 = offset;

      	bitmap_offset++;

      	offset++;

      	x++;

      	if ((*(bitmap + bitmap_offset) != 0) && (x < the_sprite->width)) {

        	// 3rd pixel is non-black; save it and try to get one more:  

        	byte3 = *(bitmap + bitmap_offset);

        	offset3 = offset;

        	bitmap_offset++;

        	offset++;

        	x++;

        	if ((*(bitmap + bitmap_offset) != 0) && (x < the_sprite->width)) {

          	// BEST CASE!  Write a double-word:

          	byte4 = *(bitmap + bitmap_offset);

          	fprintf(outfile,"            	mov 	dword ptr +0x%x[EAX],0x%02x%02x%02x%02xH\n",

              	offset1, byte4, byte3, byte2, byte1);

        	}

        	else {

          	// Fourth pixel is black; need a word write for first two pixels

          	// and a byte write for the third:

          	fprintf(outfile,"            	mov 	word ptr +0x%x[EAX],0x%02x%02xH\n",

              	offset1, byte2, byte1);

          	fprintf(outfile,"            	mov 	byte ptr +0x%x[EAX],0x%02xH\n",

              	offset3, byte3);

        	}

       	}

      	else {

        	// Third pixel black; write a word for first two pixels:

        	fprintf(outfile,"            	mov 	word ptr +0x%x[EAX],0x%02x%02xH\n",

            	offset1, byte2, byte1);

      	}  

    	}

    	else {

      	// WORST CASE!  Second pixel black; need to write only first pixel:

      	fprintf(outfile,"            	mov 	byte ptr +0x%x[EAX],0x%02xH\n",

          	offset1, byte1);

    	}

  	}

  	else {

    	// Encountered black pixel; skip it:

    	bitmap_offset++;  

    	offset++;

    	x++;

  	}



	};  // end while x



  };  // end for y





  fprintf(outfile,"            	pop 	ECX\n");

  fprintf(outfile,"            	ret\n");

  fprintf(outfile,"            	nop\n");



}; // end for frames







// Closing statements:

fprintf(outfile,"_TEXT       	ENDS\n\n");



fprintf(outfile,"CONST       	SEGMENT DWORD PUBLIC USE32 'DATA'\n");

fprintf(outfile,"CONST       	ENDS\n\n");



fprintf(outfile,"CONST2      	SEGMENT DWORD PUBLIC USE32 'DATA'\n");

fprintf(outfile,"CONST2      	ENDS\n\n");



fprintf(outfile,"_DATA       	SEGMENT DWORD PUBLIC USE32 'DATA'\n");

fprintf(outfile,"_DATA       	ENDS\n\n");



fprintf(outfile,"_BSS        	SEGMENT DWORD PUBLIC USE32 'BSS'\n");

fprintf(outfile,"_BSS        	ENDS\n\n");



fprintf(outfile,"            	END");



fclose(outfile);



return(0);



};

[hr]The difference in the amount of executable code between the C blitters and the optimized assembly blitters can be big. To show this, we must consider larger bitmap graphics. Figure 3 shows one of the animation frames of an explosion from Drone.

[size="3"]Figure 3: An explosion bitmap from Drone

[size="3"]Listing 7: C blitter code for above explosion bitmap.

See exp_c.txt in attached file

[size="3"]Listing 8: Optimized assembly-blitter code for above explosion bitmap.

See exp_asm.txt in attached file

Listing 7 shows the C blitter for this bitmap, and Listing 8 shows the assembly blitter. The overwhelming majority of instructions inside these blitters is the actual pixel data moves. Counting these, the C version has 609 pixel moves, and the assembly version has only 203. Amazingly for this example, that's exactly a factor of three. Therefore, we can expect the assembly blitter to draw this bitmap approximately three times faster than the C version. Not bad for an afternoon of work, eh?

So you see, this assembly language stuff really can make a big difference.

That's just about all the tricks I can think of for transparent bitmaps. Some points to be aware of:

These compilers currently are designed to be used in a 320x200 screen resolution mode. If you want to compile bitmaps for a higher resolution, you will need to modify the compiler. The changes should be pretty obvious to you by now.
In general, compiled bitmaps are not really useful in a 3-D game, because these blitters don't scale the bitmaps. You could conceivably store multiple scaled bitmaps for each frame as well as for each viewing angle, but now we're talking about some pretty outrageous memory requirements. Then again, these days, 32 and 64 MB systems are starting to become mainstream. As is always the case, technology becomes cheaper and more abundant by the day. Perhaps by the time your game is ready for marketing, the everyday household platform will have progressed through at least a generation of microprocessors and will have at least double the amount of RAM as the one you developed on. A wise programmer will always count on technology expanding.
Some images don't really need to be rendered from every viewpoint. Explosions are one example. If you think about it, a typical explosion looks pretty much the same from all sides, so you can get away with only scaling the images depending on how close the camera is to the explosion. You probably could afford to create blitters for all reasonable scales of all frames of your explosions, and since explosions tend to be large images, this would really buy you some more processing time between frames.
Compiled bitmaps are also still very useable for objects that don't need to be scaled, such as gauges, radar screens, status bars, font letters, and other things that might be found on, say, an instrument panel.
The notion of a 64-bit CPU and bus is not silly - it will be here sooner than you think. The compiler can be easily altered to optimize for 8-byte moves as well 4, 2, and single byte moves.

At any rate, the point to take home from this is that no matter how powerful the tools and languages get, you will always benefit from understanding the bottom-line language that all PC programs execute in - native protected-mode machine language. The coders who take the time to delve into the actual assembly that gets generated by the compiler will find ways to beat the pants off of the higher-level C or C++ code. This means they'll be able to draw cooler graphics and special effects faster, which translates into a superior game. Looking at disassembled C code is a great way to start. Indeed, this is how even experienced assembly coders optimize key functions within their programs. And this technique will prove even more valuable and effective when chip makers introduce next-generation processors. By looking at the disassembly, you will get a direct look at how the new technology is used in the generated assembly code, which will give you an opportunity to optimize it for your specific needs. Intel's much-anticipated MMX technology is finally here. The programmers who figure out how to manipulate this new technology will have the first opportunity to utilize it in a game implementation, while those who refuse to leave the realm of C/C++ will be forced to rely on their compiler to fashion assembly code around the new technology. As we've seen, compilers often do not do this as well as we'd like.

Assembly language is not old-fashioned, and people who insist that it is are the ones who really don't understand how computers work. In fact, assembly language is on the bleeding edge of advances in CPU technology. When CPU technology changes, new instructions are usually added to the chip's core instruction set... or existing ones are modified. Once the machine code has been tweaked to accommodate the new chip feature, all the C/C++ and other compilers have to be re-written to take advantage of the new assembly instructions so that the rest of the programming world has access to the new chip feature. This is precisely the reason your compiler offers you the choice of compiling for 286, 386, 486, or Pentium code. Each of these chips builds new features onto the previous generation, and the compiler needs to know whether or not to attempt to take advantage of, say, the Pentium's advanced pipelining capabilities, or the 486's floating-point accelerator. The compiler will, in a very general sense, try to generate efficient code using the assembly language constructs it has at its disposal. But the best optimizer is between your ears, and you will almost always be able to find ways to use the technology specifically for your particular procedure that the designers of the compiler couldn't possibly have preconceived. And we've seen what can (and usually does) happen when you let the compiler decide how to construct assembly code for a high-performance library function.

End of lecture. Happy coding!

* * *

A final caveat - if you attempt to WASM-compile very-large bitmaps (especially with multiple frames), the resulting .ASM file may be so large that it will breech a protected mode code segment.

WHAT! Did you say code segment? Yes, I said code segment. Even in protected mode we are not completely free from segmentation. The CPU still has to separate code from data, and in the process it ends up segmenting the code altogether. But take heart, because even though there are segments, they are HUGE segments. Also, there is an unlimited number of them... at least, up to the amount of space you have on the target machine (or 4GB, which is the absolute ceiling a 32-bit architecture can address).

If you happen to compile a series of bitmap frames that are large enough to breech a code segment, you'll know when you go to run your completed project and it generates a GP fault. I found it alarming that WASM did not generate even so much as a warning that the code segment was being exceeded... maybe there's something more to it that I just don't know about. Anyway, all you have to do is break the blitter functions down into several separate .ASM files that are smaller. You can test pretty easily for what frame the sprite object is crashing on. If it's crashing at frame 4, but is working fine for frames 0 through 3, then you know the .ASM file is okay from frame 0 through frame 3, and that each .ASM file can hold up to 4 frames worth of blitters. Just create a new .ASM file that has frames 4 through 7, and another one that has frames 8 through 11, and so on. Then just make sure you are including into your project all of the .ASM files you've created. That's all for now, maybe next time I'll show you how to render compressed bitmaps... [font="Verdana, Tahoma, Arial"][size="2"]

- John Amato [/font]

Compiled Bitmaps

Comments

Recommended Tutorials

Other Tutorials by Myopic Rhino

Compiled Bitmaps

Comments

Recommended Tutorials

Other Tutorials by Myopic Rhino

Reticulating splines