Archived

This topic is now archived and is closed to further replies.

pimstead

Here's the Deal - 2D tile based blit w/ scroll.

Recommended Posts

Okay. Here''s what I came up with for a scrolling algorithm that blits 32x32 tiles from a map into any screen width and height with scrolling. I can run two of these backgrounds full of tiles in 640x480 in 16bit mode. Any more than that and it cuts the frame rate in half. I''ve already gone through this code a couple of times and took out almost all multiplies and all divides. If people could please take a look at this code and see if there is any way to speed it up I would appreciate it. Here''s the code. Width and Height are the width and height of the tile map in tiles. The pointer into the map was changed so the pointer was updated instead of indexed into (example Map[x][y]). As some of you may or may not know indexing is very very very slow compared (Just look at the asm it produces).
  
void GenericTileHandler(ObjectPtr Obj)
{
	int TileX, TileY, XStart, SrcLeftOffset, Width, Height;
	char *Map, *TempMap;
	LPDIRECTDRAWSURFACE	DDSurface;
	RECT SrcRect, DestRect;

	DDSurface = Obj->ObjectSurface;
	Width = Obj->Width;
	Height = Obj->Height;

	XStart = Obj->X_World >> 5;
	TileY = Obj->Y_World >> 5;
	TileX = XStart;
	SrcLeftOffset = Obj->X_World & 31;
	Map = (char*)Obj->Ptr1;
	Map += (TileY * Width) + TileX;
	TempMap = Map;

	SrcRect.top = Obj->Y_World & 31;
	SrcRect.bottom = 32;
	SrcRect.left = *TempMap << 5;
	SrcRect.right = SrcRect.left + 32;
	SrcRect.left += SrcLeftOffset;

	DestRect.top = 0;
	DestRect.bottom = SrcRect.bottom - SrcRect.top;
	DestRect.left = 0;
	DestRect.right = SrcRect.right - SrcRect.left;

	// y loop

	for(;;)
	{
		// x loop

		for(;;)
		{
			DDBackBufferSurface->Blt(&DestRect, DDSurface, &SrcRect, 0, NULL);
			
			// next screen column

			TileX++;
			TempMap++;
			DestRect.left = DestRect.right;
			// check if we are past the screen width or done rendering the available map

			if((DestRect.left >= ScreenWidth) || (TileX >= Width))
				break;
			// clip the right with the right edge of the screen if necessary

			if((DestRect.left + 32) >= ScreenWidth)
				DestRect.right = ScreenWidth;
			else
				DestRect.right = DestRect.left + 32;
			
			SrcRect.left = *TempMap << 5;
			// match the src clip with the dest clip

			SrcRect.right = (DestRect.right - DestRect.left) + SrcRect.left;
		}
		// next screen row

		TileY++;
		Map += Width;
		TempMap = Map;
		DestRect.top = DestRect.bottom;
		SrcRect.top = 0;
		TileX = XStart;
		SrcRect.left = *TempMap << 5;
		SrcRect.right = SrcRect.left + 32;
		SrcRect.left += SrcLeftOffset;
		DestRect.left = 0;
		DestRect.right = SrcRect.right - SrcRect.left;

		// check if we are past the screen height or done rendering the available map

		if((DestRect.top >= ScreenHeight) || (TileY >= Height))
			break;
		// clip with the bottom of the screen if necessary

		if((DestRect.top + 32) >= ScreenHeight)
			DestRect.bottom = ScreenHeight;
		else
			DestRect.bottom = DestRect.top + 32;
		// match the src clip with the dest clip

		SrcRect.bottom = DestRect.bottom - DestRect.top;
	}
}
  
So again. Any input on speeding this baby up would be appreciated. Realistically a tile based game should be able to handle 3 or 4 layers of tiles (although not necessarily a tile in every x,y) and the layers above would be blitted with transparency. So we need speed speed speed. For anyone reading this who is trying to figure out how to scroll and finds that the tutorials are kinda lame, please feel free to use this code. However I am looking for some experienced people to give some feedback on speeding it up. Thanks in advance.

Share this post


Link to post
Share on other sites
Guest Anonymous Poster
You''ld be better of just using an array containing the tile rects and another array containing the tile numbers, Dont worry about the speed since even a Pentium 233MMX can handle it with no slow-down.

rcTileRects[y][x]; // 0,0,32,32 32,0,64,32 etc..
nTileLayer[y][x];

for(y=0;y<480/32;y++)
{
for(x=0;x<640/32;x++)
{
Blit...start at the XStart and YStart offset
}
}

Share this post


Link to post
Share on other sites
I appreciate your response. However, this does not take into account several things. First, different screen sizes, second, actual scrolling (when you scroll the edges aren''t whole tiles and you will NEVER know in advance what your rect sizes are for the 4 edges), third, speed IS an issue. You DO need to be able to run several simultaneous background layers without slowdown to the frame rate. Even this code (which has been somewhat optimized) isnt fast enough to run more than two full layers without losing frame rate.

If your suggestion was to precompute all the needed tile ids for the screen blit and precompute all of the rects for each tile blit, then that is a fine idea. Only problem is that it wouldnt be one iota faster since you would still be precomputing every single frame.

More suggestions are appreciated. Also please limit the discussion to the implementation of the code. It is not necessary to go over memory implementations and surface locations, etc. I am just looking for ways to speed up the render cycle.

Thanks again.

Share this post


Link to post
Share on other sites
Guest Anonymous Poster
No matter what system you use to blit the tiles, it will always slow down. The main slowdown facter is the actual ->Blt function itself, which in-turn is directly related to the video card you use. Try it on a GeForce which are capable of many layers of blits with frame rates ranging from 150 to 600, you will find that you can have 4 layers with no noticable slow down at all. Just limit your FPS by the monitors refresh rate (75FPS).

As for the clipping, if the screen width is 640, you would blit 672 pixels worth of tile (one extra tile at the end) to account for the 32Pixel offset gap. Instead of calling ->Blt within your function, create your own blit routine that clips everything you throw at it, trust me, this will save a hell of a lot of repeated code later on. Better still, create a Graphics wrapper with alpha, clipping etc.. built in.

When calculating what tile goes where, start at 0:0 (minus the current 32Pixel offset) and work your way to 672:512 (or whatever your screen size is) incrementing 32 pixels at a time. Use your tile array to figure out which tile goes where using offset/32 . If the current horz offset is say 1000, you would read the array from nTile[0][1000], then nTile[0][1001] and so on until 672 is reached.

There was a scrolling tutorial somewhere on the web, I think it was a dutch site (in english)...

What you should be doing is coding your routines as basic as possible so its all working correctly, then optimize the code if the performance is an issue, which it shouldnt be in this case, but its up to you.

Share this post


Link to post
Share on other sites
At this point the code I wrote works well. However, it does not and will eventually have to take into account for transparency and translucency.

I did wait to optimize the code until it was working and then did the cleanup. I see a lot of people throwing around divides and multiplies like it was free candy. I''m not here to criticize and I do quite appreciate the help. However, no professional company developing on a console (ps2, gamecube, agb) will accept this coding. People need to get into the habbit of performing bitwise operations when you can deal with numbers in the power of 2. Divide by 32 >>5. Multiply by 64 <<6. Mod by 8 &7. These are very powerful coding techniques to employ that will get you hired over another person, or get your code running faster. Especially when you see the assembly that is generated by other coding.

I do understand that I cant write code to speed up the actual blit. However all of the overhead calculating which tile to blit where and how to clip it for scrolling and screen size is where you can speed up the code. I know that even this simple code that blits a screen full of tiles and has to generate rects for clipping and scrolling can create a performance hit. When you have a game like Starcraft that has tons of things happening at once and many objects on top of layers of tiles a performance hit is just unacceptable. The code needs to be lean and mean. Anyone who doesn''t care about performance down to the cycle and memory down to the byte will never program anything other than a pc and definitely wont develop games professionally. The AGB has 32k internal ram and only 256k external ram for example. The performance is just as important in the coding as the product.

If anyone can offer some insight into speeding up this code or making it more elegant (simpler, yet faster) I would appreciate it. If anyone really knows about how professional companies writing 2d tiled DirectX games put their stuff together I would appreciate knowing if I am even close. Otherwise please don''t point me to tutorials on this site. I appreciate their value for beginners however they are no where near advanced enough to really dig into this subject.

Another thing I thought of out of this discussion is the need for an assembly section on this website. Even as windows programmers writing DirectX code we are not above writing assembly language routines where extra performance is required. I think it would be appropriate to have information on this site on setting up and calling directx and windows functions from assembly.

Thanks again to the people who have helped out on this.

Share this post


Link to post
Share on other sites
I''m not sure if this will help you at all, but have you tried doing this in d3d instead? I''m not good at c++ or directx (yet) but I noticed a high increase of speed when coding my tile-engine in d3d with textures instead of blocks of "sprites" and directdraw (Don''t know the actual name for it in english).

Please mail me or add me on icq, I would be glad to discuss this techniques with you!

Best regards
Fredrich

Share this post


Link to post
Share on other sites
Guest Anonymous Poster

I am working on a multi-layer tile-engine and something I''ve done that improved the speed is this: For every layer in front of the first, as I loop through my Y and X, for each tile, I check to see if it is tile zero. If it is, I skip it, and go to the next tile. Since more and more of the tiles are zero/empty as the layers increase, it becomes quicker to do the check, than to blit all those empty tiles.

Share this post


Link to post
Share on other sites
Guest Anonymous Poster
faster up with assembly ...
and you dont want 8-------D ????

Share this post


Link to post
Share on other sites
The only real bottleneck I can see here is the Blt function. Writing your own will probably improve performance depending on your needs. I am assuming that you have done the standard memory optimizations and are either keeping all of your surfaces in video memory, or all of them in system memory. If they are all in video memory then your speed is limited by the graphics card''s bandwidth, not too much you can do about that. On the other hand if your surfaces are in system memory it will be faster to write your own Blt function that is hand-optimized for the routine you are using it for. In your case the blt function does a whole bunch of extra work because it cannot assume anything about the tiles, while you know beforehand that you have 32x32 tiles and the vast majority of tiles aren''t clipped. That leads to another optimization, call blt without the clipping rectangle for all the non-clipped tiles and special case the edges of the screen. Hope this helps.

Premandrake

Share this post


Link to post
Share on other sites
I''m not sure what kind of issue you have with using the nested for loops to loop through the tiles on a scan-line. I did a generic 2d engine that used alpha transparency and two layers per tile. It could do any sized tiles and it could handle any sized screen. At 640x480 with 64x64 tile sizes it was running fine at about 80 fps (no game code attached... with the game code... including a very cool pixel-based collision detection and a script-based AI - read the script from a textfile and interpret it - it dropped to about 48 fps). Just told the loops to go from 0 to (screendim/tilesize)+1 at tilesize steps with an offset on each tile to handle the scrolling (so each tile would be rendered at X+XOffset, Y+YOffset).

This was done in Visual Basic 6 with DX8... so, you''re bound to get more speed out of C++. I tried doing it with four layers (two for the architecture and then two in the background for parllax) and that also worked fine, but we dropped the parallax layers at the end.

''Doing the impossible is kind of fun'' - Walt Disney

Share this post


Link to post
Share on other sites
Hey everyone. Thanks for the additional help. For the person that mentioned skipping blank tiles, that will be there for more layers. Right now the focus was the render loop. For the guy sending me to the obsessive compulsive website, =-). Truth be known though, production code does require this much attention to detail so deal with it! For the person with the asm suggestion: that''s definitely a valid idea. I was checking into it too.

Someone comenting on my loops, i''m not quite sure what your point is. But if you are complaining about my for(; then you should take a look at the asm you end up with from different kinds of loops. A compiler will produce slower assembly for a while(1) than a for(;. Besides you never know your loop size on this type of thing if you have different possible display modes and different size tile maps. Precalculation will just eat up cpu cycles.

I hope some people see the importance of this extra work. As an example Starcraft can run happily on a p166. Now that''s optimized code!

Share this post


Link to post
Share on other sites
I don''t think that "for(;" will compile.


Seriously though, do you want to try to optimize a finished game, or do you want to try to finish an optimized game? A lot of obsessive coders (myself included) don''t know when to say "when", so they end up making an unfinished game with perfect code.

You may be right that optimization techniques can give you an edge over the other job applicants. However, if those other guys have already created three slow games before you could finish one fast one, then guess who the company is going to hire? Not "the obsessive one."

Trying to fix a problem before it actually becomes a problem (ie. optimizing code that hasn''t been proven to be a bottleneck in the finished game) is a hallmark of unhealthy obsession. Take it from someone who knows; I sent you to the OC Foundation for a reason.

Share this post


Link to post
Share on other sites