Poor Textured Quad Performance

Started by
8 comments, last by Zxylin 20 years, 8 months ago
Hi, I'm working on a 2D tile engine in C# using DX9. I've gotten to the point where I can draw the tiles on the screen and have found that my framerate is horrible. I'm getting around 20 fps and all I'm doing is drawing one set of tiles (32x20 = 640 tiles). Without the tiles, I get well over 200 fps with my test data structures and I initially had around 700 fps with a single triangle. I've narrowed the performance problem down to the section of code that actually puts the data in the vertex buffers.

stm = vbScreenObjects[X,Y].Lock(0, 0, 0);

// Calculate Screen Co-ords

LeftX = (X * TileSizeX) * iScaleFactor;
RghtX = (X * TileSizeX + TileSizeX) * iScaleFactor;
TopY = (Y * TileSizeY) * iScaleFactor;
BtmY = (Y * TileSizeY + TileSizeY) * iScaleFactor;

// Calculate Texture Co-ords

LeftTexX = (float)TileSizeX/(float)TileSetX * (ScreenData[X,Y]
	% (TileSetX/TileSizeX));
RghtTexX = ((float)TileSizeX/(float)TileSetX * (ScreenData[X,Y]
	% (TileSetX/TileSizeX))) + 
	((float)TileSizeX/(float)TileSetX);
TopTexY = (float)TileSizeY/(float)TileSetY * 
	(ScreenData[X,Y]/(TileSetX/TileSizeX))
	+ (ScreenData[X,Y]/((float)TileSizeX/(float)TileSetX));
BtmTexY = (float)TileSizeY/(float)TileSetY * 
	(ScreenData[X,Y]/(TileSetX/TileSizeX))
	+ (ScreenData[X,Y]/((float)TileSizeX/(float)TileSetX))
	+ ((float)TileSizeY/(float)TileSetY);

// Put some Vertices in

verts[0].SetPosition(new Vector4(LeftX,TopY,0.0f,1.0f));
verts[0].Tu = LeftTexX;verts[0].Tv = TopTexY;
				
verts[1].SetPosition(new Vector4(RghtX,TopY,0.0f,1.0f));
verts[1].Tu = RghtTexX; verts[1].Tv = TopTexY;

verts[2].SetPosition(new Vector4(LeftX,BtmY,0.0f,1.0f));
verts[2].Tu = LeftTexX; verts[2].Tv = BtmTexY;

verts[3].SetPosition(new Vector4(RghtX,BtmY,0.0f,1.0f));
verts[3].Tu = RghtTexX; verts[3].Tv = BtmTexY;

stm.Write(verts);
vbScreenObjects[X,Y].Unlock();
Now I don't really need to update all the VertexBuffers every frame, but I'm a little concerned that as I get further into it, my performance will continue to get worse. Any suggestions on how to speed this up? Open your eyes and free your mind That you may see the subtle wonder Of the worlds we live in... [edited by - zxylin on August 5, 2003 1:27:40 AM]
Open your eyes and free your mindThat you may see the subtle wonderOf the worlds we live in...
Advertisement
hi - it''s hard for me to tell from your code, but are you doing the following:

1: write tile to dynamic vertex buffer
2: render tile
3: repeat 1 until all quads are rendered

If so, then this would be one cause of your slowdown.

Looking at your code below, it looks like you use only 1 quad and you keep changing it to suit your needs. When you want to change your quad, then you update it dynamically in your vertex buffer, then you render, then you repeat this process 640 times...? Wow...

At the least, what you can do is this:

1: define a single textured quad in a single vertex buffer. Make the quad 1 unit high and 1 unit wide.
2: use a scaling and a transformation matrix to scale and position the tile where you want it. I think that 1x1 equates to 64 pixels x 64 pixels, so to render a sprite that has a texture that is 128 x 128, then your scaling matrix would double the size of your original quad -- I''m typing this from memory, so I might be wrong about the figures, but you get the concept...)
3: render

The above gets rid of the locking and unlocking of the dynamic vertex buffer, but it is still very inefficient (lots of small drawindexedprimitive calls will overwhelm even a fast cpu -- there''s an nvidia paper on this on the nvidia website -- You should also sort your textures before rendering them (because switching textures can be slow) -- draw all of your grass textures, then all of your rock textures, etc...)

Also consider (and this will take some work) building large vertex buffers before you render. Don''t update your vertex buffer for each of your tiles, but update your tiles in an array in advance and then copy the entire array to the dynamic vertex buffer in one call and using the appropriate flags (D3DLOCK_DISCARD).

I know I''m being vague here, but you''ve actually asked a complicated question -- the key is to render as many tiles as possible with as few vertex buffer updates, set texture calls, and drawindexedprimitive calls as possible. The problem is that there is no ideal way to do this -- in other words, you''ll have to experiment with different methods until you find the one that works most efficiently for your application.

At any rate, I hope this helps-

BM
Thanks BM, that does help...

My code is doing exactly as you said except I''m using 640 separate quads with each one being updated.

I was thinking of building chains of tiles to reduce the number of draw calls (much like you suggested). I''ll pursue that and see what happens.
Open your eyes and free your mindThat you may see the subtle wonderOf the worlds we live in...
I couldn't really tell if you are doing the rendering this way or not..

but the fastest way I've found is to render the onscreen map tiles all at once with one SetTexture call and one DrawPrimitive call using one static vertex buffer for all the map tile information. If you are doing more than one call of these functions for the map tiles per frame then it is going to start slowing down.

[edited by - OneBitWonder on August 5, 2003 11:30:27 AM]
One common misconception of 2D in 3D coders is that you shouldn''t use a dynamic vertex buffer. You should really look into using one, the thing to remember is that the vertex buffer is just that, a buffer. It''s just a temporary place for your vertices to be located before being sent to render. That way all the animation and positioning of quads can be done before you send them into the buffer saving doing many transforms and render calls.
If you''re just doing 2d using quads, there is no need to continuously lock the vertex buffer (which creates a massive overhead). Actually, you should never lock the buffer during a rendering loop, but that''s a whole entire issue alltogether.

Here''s some pseudocode of how I''d do it.

(this would be your initialization function)


Initialize()
{
Lock the buffer
Load vertices
Load whatever else you need.
}


(this would be the rendering function)


Render()
{
Move the quad to the specified position with a simple vertex transformation (ideally, do this in a vertex buffer as it will be considerably faster)

Draw the quad using a vertex shader that adjusts the texture coordinates.
}


Judging from what I''m seeing here, you''re essentially locking the vertex buffer 640 times every frame and only writing 4 vertexes to it. I''m going to assume that you also render each vertex seperately as well. That''s a massive overhead.

If the idea suggested above is too much, do this once per frame instead:


Render()
{
Lock the buffer
Write all 640 tiles (3840 vertices if you''re using triangle lists) to the buffer. Ideally, write them in an order such that they generate a single triangle strip.
Render the buffer
}


Essentially, try to avoid locking the buffer as much as possible, as it''s a terribly slow operation. You can count on each lock operation costing you as much as 1 millisecond per lock (usually less, but it depends on how many vertices you write each lock). 1 millisecond may not seem like much, but when you consider that a 60fps scene renders each frame in 16ms, you''re talking about a 3-4fps loss for EVERY additional lock you perform. That adds up very quickly.

---------------------------Hello, and Welcome to some arbitrary temporal location in the space-time continuum.

quote:Essentially, try to avoid locking the buffer as much as possible, as it''s a terribly slow operation. You can count on each lock operation costing you as much as 1 millisecond per lock (usually less, but it depends on how many vertices you write each lock). 1 millisecond may not seem like much, but when you consider that a 60fps scene renders each frame in 16ms, you''re talking about a 3-4fps loss for EVERY additional lock you perform. That adds up very quickly.


That''s basically what I discovered. I''ve modified my code to work like your second suggestion except that I split out the locking/writing of the vertex buffers and the rendering since my future plans are for a turn-based game and I''ll only need to relock/write whenever the player does something.



Open your eyes and free your mind
That you may see the subtle wonder
Of the worlds we live in...
Open your eyes and free your mindThat you may see the subtle wonderOf the worlds we live in...
you should also batch all your triangles(or quads) so that each VB contains 1000 or so vertices.
"Let Us Now Try Liberty"-- Frederick Bastiat
quote:
you should also batch all your triangles(or quads) so that each VB contains 1000 or so vertices.


So instead of having one VB with 3840 vertices in it, break it out in to chunks of around 1000?


Open your eyes and free your mind
That you may see the subtle wonder
Of the worlds we live in...
Open your eyes and free your mindThat you may see the subtle wonderOf the worlds we live in...
according to the MSDN 1000 vertices per buffer is the right combination of batching while still dodging concurrency issues, but try it for yourself.
"Let Us Now Try Liberty"-- Frederick Bastiat

This topic is closed to new replies.

Advertisement