DirectDraw rendering loop problems (Gurus please help!)

Started by
5 comments, last by Evil 24 years, 7 months ago
I guess the first thing that needs to be mentioned is that DirectDraw Blt and BltFast are asynchronous operations. That means that the function returns immediately after the function call, even though the Blt isn't done yet. The extra line that you added to wait for when the Blt is complete will slow it down alot. That's the main reason your second code example is running slower.

Now... let's see if we can make it run faster. I am going to assume that you are using a 2D array to store your tile information. There is nothing wrong with that. The problem may lie, however, in how you are referencing your tiles in that array. You are probably referencing the tile information via indexes. For example:

for(int Y = 0; Y < VIS_HEIGHT; Y++)
for(int X = 0; X < VIS_WIDTH; X++){
//Calculate the destination rect
//Calculate the source rect
//BitBlt the tile
}

That piece of code, although clean and easy to read, is slow and inefficient.

Here's a segment of a drawing routine that is better optimized:

//------------------------------
//Assumed:
// Map - 2D array of TILEINFO

TILEINFO *cur_tile;
long num_tiles, cnt;
int X, Y;
RECT dest_rect, src_rect;

//Set the dest_rect to the top left coord
SetRect(&dest_rect, 0, 0, TILE_WIDTH, TILE_HEIGHT);

//Initialize some variables
num_tiles = VIS_WIDTH * VIS_HEIGHT;
Y = 1; cnt = 0; X = 1;
cur_tile = Map; //Set the pointer to the beginning of the array

while(cnt < num_tiles){
//Get the source rect
//BitBlt your tile

X++; //Increment the X position
OffsetRect(&dest_rect, TILE_WIDTH, 0);
if(X >= VIS_WIDTH){
//Wrap to the next line
X = 1;
Y++;
SetRect(&dest_rect, 0, dest_rect.bottom, TILE_WIDTH, dest_rect.bottom + TILE_HEIGHT);
}

cur_tile++; //Move the pointer to the next element in the array
}
//------------------------------

Now not everything is there, but the general idea is that you should limit the number of calculations your doing. Also, for loops have a reputation for being slow. Notice how the destination rectange simply slides
to the right until the last tile is drawn. At that point it wraps to the next row and starts drawing that row.

The next important part of this is that we are not referencing the array via indexes. We start off with a pointer to the beginning of the array and simply traverse the entire array. In memory, a 2D array is still linear.

Since you are doing 3 layers, you may want 3 dest_rects. That way you're not recalculating the destination rectangle for each blt. You should be simply moving it in one direction until it has to wrap around.

There are probably a few minor bugs in here simply because I just typed it up and didn't compile it to see if it's perfect. It should give you the basic idea, though.


Hope this helps.

------------------
Dino M. Gambone
http://www.xyphus.com/users/dino

Good judgement is gained through experience. Experience, however, is gained through bad judgement.


Dino M. Gambone
Good judgment is gained through experience. Experience, however, is gained through bad judgment.

Currently working on Rise of Praxis MUD: http://www.riseofpraxis.net/

Advertisement
Thanks for the reply Dino! You've given me some good things to think about.

In your reply, you implied that I was waiting until the blit completed in my second code example. I may not have been clear enough in my pseudo code, or it could have been the loss of indentation, but I don't think the second version does any waiting. In fact, I wrote it specifically to take advantage of the asynchronous nature of the blits. I'll try to explain what I should have in my first post:

The idea was to actually do something (calculate RECTs) during the time I would normally be waiting for the BLT to be accepted by the video card. My understanding of the DDBLT_WAIT flag was that it would cause the BLT call not to return until any previous/current BLTs had finished. If it doesn't cause waiting at the BLT command, MS really poorly described its function.

So, in the second set of code I tried handing off blits, but only if the surface was ready... Otherwise, the code would just calculate another blit and try again. See, I thought I was making use of cycles that were normally wasted in by DDBLT_WAIT. Apparently, I'm either mistaken about what DDBLT_WAIT actually does, or queueing/dequeing several thousand RECTs is more innefficient than my video card (sounds more likely).

Another thing I tried, that I didn't mention in the first post was a version of the code like the second, except that I didn't use GetBltStaus. In this third version, I simply did the blit (without DDBLT_WAIT) and only dequeued the RECTs when the blit was successful. This too, was very slow.

On to other things....

Regarding the storing of my tile map in memory: I'm 'almost' using a 2D array. I originally wanted to use a 2D array, but then, thanks to compile time errors, I found that you cannot create a 2D array of dynamic size at run time. Why this isn't allowed, I can't fathom. I'm writing my engine to handle any size of map, so 2D arrays were no good for me.

Instead, I manually allocate the memory and load the map data into it. I access the map like a single dimensional array. When I want a specific tile, I just say tile=((column*row)+row).

At first, I liked your idea of simply traversing the entire array when getting the tiles to be rendered, but my tile map is bigger than the visible area. Traversing the entire array would cause me to check many times more cells than I'm doing at the moment.

I don't think I can use your idea of 3 dest_rects either, because all of my background layers can be positioned independant of each other (for parallax scrolling and such). So, they don't necessarily share destination RECTs.

You certainly got me looking at my tile-map code though. I have several hacks in it right now that were just meant to get it working with a raw file format I threw together. It just reads ASCII files at the moment. I need to create a better binary format... I'm sure I could lose one or two multiplications I'm currently doing.

Later,
Evil

------------------------
E.N.D. - http://listen.to/evil
I'm thinking that you may be drawing the 3 various tile sets back to front, so that the front most set writes over pixels of the back sets. If this is true, you should switch it around so that the front set is drawn first, then the second is drawn, and then the 3rd. To make it turn out right, so that the 2nd layer doesn't overwrite what is on the 1st layer, you have to use Ddraw to set a destination color key - almost the same as setting a source color key for sprites. That info is in the help docs.

If you're not already doing that, and there's a fair amount of overdraw, than that is probably the biggest cause of your speed problems.

Another idea...

You said that you can't really use Dino's method of traversing the tile array because your maps are larger than the screen. What you could do is quickly copy all of the tile IDs that ARE on the screen into a little buffer that is the exact size of the screen - use memcpy to copy the IDs per row. Call it once per tile row. Then you will have a tile map that IS the size of the screen, and you can use Dino's method. I don't actually know if that will speed things up or slow them down because I didn't really look over Dinos code well, but you can decide.

Also, if you have to clip tiles to the edge of the screen, you should draw all of the tiles that are full first without clipping them, and then go around the edge of the screen, drawing the tiles that require clipping. As opposed to just using a single blitter that checks ALL tiles for clipping.

Also, the reason that you can't use 2D arrays and have dynamic map sizes at the same time is because variable setup is done at the beginning of your program execution - only once. So the compiler has to know how much data to allocate for your 2D array.

Your main problems lies in the fact that your doing too many blts per frame. Each blit requires a lock, and locks are expensive. At 16x16 per tile x 3 you do about 3600 locks per frame on a 640x480 screen, and many more at higher resoultions. This will ultimately limit your framerate. I suggest doing serveral things, if you can change your tile size. For instance just doubling them, to 32x32 will reduce the number of blits to 900 a 4 fold reduction. And if you double them again, you'll get another 4 fold reduction. Thats a powerful technique but maybe unacceptable as you need 16x16 tiles. I'm hesitant to suggest this as i dont know the exact memory layout of your surfaces, which would greatly affect this optimization. Since your doing so many small blts you lose the advantage of hardware acceelration, with the locking overhead. So its proably faster to do your own memory copies, locking the surfaces only once and coping the tiles into the larger buffer. For this to work, i suggest all surfaces reside in system memory, and your destination buffer not be the primary, but a backbuffer which you will eventually blit onto the primary with one blt call. Also there isnt any point in blting more than you need to. You should clip your surface to the screen size, irrespective of your tile layout. Even with as little as 5% overdraw you increase your number of blits to 3900. Another possible optimization, is dynamicly creating sets. Sets are collection of tiles which rarely change, but exist at different levels of your tile layer. You remove the overhead of reblitting so many small tiles, for the overhead of an additional surface, which on todays machines another 256x256x16 surface isnt going to even slow it down, however as you have learned it can do with those 256 blits/frame 8^). I've nerver tried this but i suppose its worth exploring, Good Luck!

-ddn

Hello Gurus! This is my first post here. I appreciate any advice you can give me.

To put it bluntly: Is DirectDraw’s Blt too slow to use for several (3 at the moment) layers of 16x16 tiles?

I’ve been playing around with a game engine I’ve written, trying to improve performance, and I’m not getting the speed I should.

I started with a simple loop for blitting them. Here’s some pseudo-code of what I’m doing in the renderer, the actual program is written in Visual C++.

Do while more tiles
Calculate Source RECT
Calculate Destination RECT

DDSurface->Blt(DDBLT_WAIT)
EndDo

Then, in an attempt to get more speed, I tried to do a portion of the blitting concurrently with tile calculations, like so:

Do while more tiles
Calculate Source RECT
Calculate Destination RECT

Push RECTs onto tile Stack

If DDSurface->GetBltStatus(Not Busy)
Pull First Tile from Stack
DDSurface->Blt()

EndDo
// Blt remaining tiles
Do until stack empty
Pull First Tile from Stack
DDSurface->Blt(DDBLT_WAIT)
EndDo

But that code actually runs SLOWER than the original. I guess the few extra cells I render during the calculations don’t make up for all that push pulling I’m doing from the tile stack (actually a linked list).

I don’t think my routines which calculate the RECTs are doing any unneccessary math. But I’d guess I’m only getting 15ish frames per second fullscreen and half that windowed.

I had considered trying to write my own blit operation using memcpy, but most of my tiles are transparent. Also, I load all the tiles into video memory, so I would expect a video-to-video blt to be faster than anything I could write in C++.

Can anyone think of some beginners mistakes I may have made?

Thanks in advance!

------------------------
E.N.D. - http://listen.to/evil
I re-read your second example and I see what you are trying to accomplish. I don't know all the specifics of what you are trying to accomplish so I'm can't really tell if it's actually better doing it your way. I'll assume it's the same

--------------------------------------------------

The second issue that you said was that you could traverse the array because it's too big. Well, here's a little trick that may help you walk through the array so that only the visible portion is drawn.

//------------------------------
//Assumptions:
// mMap is an array of TILEINFO
// mMapWidth, mMapHeight are the dimensions of the map
// mVisWidth, mVisHeight are the dimensions of the screen (in tiles)
// mVisLeft, mVisTop is the top-left tile in the visible area

TILEINFO *cur_tile, *row_start;
long num_tiles, cnt;
int X, Y;
RECT dest_rect, src_rect;

//Set the dest_rect to the top left coord
SetRect(&dest_rect, 0, 0, TILE_WIDTH, TILE_HEIGHT);

//Initialize some variables
num_tiles = mVisWidth * mVisHeight;
Y = 1; cnt = 0; X = 1;

//Get the pointer to the top-left tile
row_start = Map[(mVisTop * mMapWidth) + mVisLeft];
cur_tile = row_start

while(cnt < num_tiles){
//Get the source rect
//BitBlt your tile

X++; //Increment the X position
OffsetRect(&dest_rect, TILE_WIDTH, 0);
if(X >= VIS_WIDTH){
//Wrap to the next line
X = 1;
Y++;
row_start += mMapWidth; //Move to the next row
cur_tile = row_start;
SetRect(&dest_rect, 0, dest_rect.bottom, TILE_WIDTH, dest_rect.bottom + TILE_HEIGHT);
}else
cur_tile++; //Move the pointer to the next element in the array
}
//------------------------------

The change was the addition of row_start. row_start points to the first tile in the row. When you reach the end of the visible row, then you simply add mMapWidth to move the pointer to the next row. This should help you out with traversing the array.

--------------------------------------------------

The parallex scrollig isn't something I have ever tried to implement so I'm not even going to try and take that one one.

--------------------------------------------------

There are a few technical articles on isometric programming on my web site which could give you some ideas. Stop by and read one or two.

Good Luck

------------------
Dino M. Gambone
http://www.xyphus.com/users/dino

Good judgement is gained through experience. Experience, however, is gained through bad judgement.


Dino M. Gambone
Good judgment is gained through experience. Experience, however, is gained through bad judgment.

Currently working on Rise of Praxis MUD: http://www.riseofpraxis.net/

This topic is closed to new replies.

Advertisement