Sign in to follow this  
synth_cat

how many Lock() calls is prohibitive?

Recommended Posts

I'm sure this must seem like a somewhat dumb question to ask, but I need advice from people who have actually done this before, rather than from some MSDN article... Anyway, I was discussing in another thread a situation I had with dynamic vertex buffers. I would create vertex arrays which I would write to with AddQuad() functions I wrote. Then at the end of each step I would Lock() the vertex buffers and memcpy() the data from the vertex arrays into the true vertex buffers. Then at the beginning of the next step, I would ZeroMemory() the two "proxy" vertex arrays, preparing them for new AddQuad() calls. For one thing, I was thinking about converting the vertex arrays into std::vectors because I'm afraid of a stack overflow. Would it be worth the bother? The other problem - the dynamic vertex buffers I have been talking about draw quads, and the quads are separated into six different drawing calls (low-additive, high-additive, and stuff like that) between the two vertex buffers. This means that I have to keep track of which position I am supposed to write to in the vertex arrays and remember a bunch of indeces of where the vertices for a certain call end or begin. So what I was considering doing was taking the two vertex arrays and splitting them into six arrays, one for each call. This would mean I would have to Lock() and memcpy() to the vertex buffers a total of six times instead of twice. Would that make a big difference? Would it be a big performance hit on some computers? My problem is fully described here, for reference: http://www.gamedev.net/community/forums/topic.asp?whichpage=2&pagesize=25&topic_id=404549 Thanks for any help! -synth_cat

Share this post


Link to post
Share on other sites
Well, someone advised me once that you should look to keep your calls to Lock down below 200-300 per frame so if that was correct advice, you should be fine. Sounds a bit suspicious now I come to pass that on to someone else though.

[EDIT Just been reading your other post so appreciate the next paragraph is a bit redundant.]

Can I just ask though why you don't just write to the vertex buffers directly? Seems like a waste of memory to me to duplicate the buffer in system memory then lock and copy like that.

There is an article on GameDev in the DirectX Graphics section that describes how to batch quads using the vertex buffer directly.

[Edited by - EasilyConfused on July 22, 2006 3:36:36 AM]

Share this post


Link to post
Share on other sites
Quote:
Original post by EasilyConfused
Well, someone advised me once that you should look to keep your calls to Lock down below 200-300 per frame
That doesn't sound right to me [oh] 200-300 drawing calls per frame would be more common advice, but that many locks is going to really hurt.

Two bits of advice from my own personal experience:

1. Lazy Evaluation - buffer all quads in system memory and only update them to VRAM when its absolutely essential (e.g. you need to draw them). Ideally try and structure it so that you dont draw until you absolutely have to... It takes some getting used to, but writing efficient D3D code is not always the most intuitive to the programmer. Simple straight-forward code is nice, but doesn't necessary get you the best performance.

2. Consider the Draw*PrimitiveUP() calls. I've found them to be as fast and in some cases faster than rendering from VB's. If you're making lots of changes to the data then it might be faster to skip the whole VB/IB thing and thus avoid any overhead from locking/modifying. VRAM resources are best with static (or mostly static) resources and aren't necessarily the best in all other cases.

WRT to #2 you should be able to convert to std::vector quite easily.

hth
Jack

Share this post


Link to post
Share on other sites
You should keep the number of locks to a minimum, but the difference between 6 and 2 is negligible. Also, unless you are doing something really tricky, it would be better write to the VB directly.

Share this post


Link to post
Share on other sites
More important than the raw number of Lock calls how they are called. If you double buffer (so a vertex buffer that is being rendered this frame, will not be locked and updated until next frame), to ensure the GPU is never using the data you are locking, they should be very quick (though I'm sure there is some small CPU overhead of calling them even then).

[Edited by - griffin2000 on July 22, 2006 2:30:13 PM]

Share this post


Link to post
Share on other sites
Thanks for all the replies!

I think it sounds best to write directly to the vertex buffer (in other words, write through a VERTEX* pointer between a Lock() and Unlock() call.)

However, I still need a way of storing the vertex info outside of the true vertex buffer. This is because my game makes AddQuad() calls throughout the entire game (and it's not just as simple as doing one AddQuad() call for each entity.) Basically, if I were to rewrite my AddQuad() functions so that they would write to a vertex buffer that has already been locked, I would have to stick my entire game logic process between a Lock() and an Unlock() call.

I assume that using up a lot of clock cycles while a vertex buffer is locked is bad - am I right?

So I had an idea - why not just store an array of QUAD structs outside of the array, where all AddQuad() does is write to the array, and then at the end of each step Lock() the vertex buffer and run through all the quads, writing directly to the vertex buffer.

Here's what I mean:



struct QUAD
{
bool alive;
bool face_camera;
D3DXVECTOR3 center;
float u, v;
DWORD color;
float radius;

QUAD(bool face_camera, D3DXVECTOR3 center, float u, float v, DWORD color)
face_camera(face_camera) : center(center) : u(u) : v(v) : color(color)
{}
};


//the actual vertex buffers
LPDIRECT3DVERTEXBUFFER9 vb_cells
LPDIRECT3DVERTEXBUFFER9 vb_fx;

//the things that will store vertex data until needed - see that I am using
//six different arrays, one for each call (my other thread explains this more //fully)
QUAD fx_temp[2][2][NUM_FX_QUADS];
QUAD cells_temp[2][NUM_CELLS_QUADS];

void AddFXQuad(
bool high, bool additive,
D3DXVECTOR3 center, float u, float v, DWORD color,float radius)
{
for(int new_quad=0; new_quad<NUM_FX_QUADS; new_quad++)
{
if(fx_temp[high][additive][new_quad].alive==false)
{
fx_temp[high][additive][new_quad].alive=true;

fx_temp[high][additive][new_quad].center=center;
fx_temp[high][additive][new_quad].color=color;
//and so on . . .

break;
}
}

}

void AddCellQuad(
bool high,
D3DXVECTOR3 center, float u, float v, DWORD color,float radius)
{
for(int new_quad=0; new_quad<NUM_CELL_QUADS; new_quad++)
{
if(cells_temp[high][new_quad].alive==false)
{
cells_temp[high][new_quad].alive=true;

cells_temp[high][new_quad].center=center;
cells_temp[high][new_quad].color=color;
//and so on . . .

break;
}
}

}


//THE GAME LOOP//
while(1)
{
//refresh the quad-storers
ZeroMemory(&fx_temp,sizeof(fx_temp));
ZeroMemory(&cells_temp,sizeof(cells_temp));

//do game logic, handling cells, explosions, shadows, etc.
//AddFXQuad() called at various points
//AddCellQuad() called at various points


/////////FILL AND DRAW THE TWO VERTEX BUFFERS///////////////////
//////////////////////////////////////////////////////////////

//now we get to the drawing part (this part may not be written completely
//correct - I've just written this quickly to give an idea of what I'm doing

VERTEXFX* vb_fxpoint;
for(int fx_height=0; fx_height<2; fx_height++)
{
for(int fx_blendmode=0; fx_blendmode<2; fx_blendmode++)
{

vb_fx->Lock(&vb_point . . .)

//writing directly to vertex buffer
for(int fx_quad=0; fx_quad<NUM_FX_QUADS; fx_quad++)
{
if(fx_temp[fx_height][fx_blendmode][fx_quad].alive)
{
//use QUAD to define two tris in the buffer
vb_fxpoint[index]. . . .
vb_fxpoint[index]. . . . //and so on
//. . . . //
}
}

vb_fx->Unlock();
vb_fx->DrawPrimitive();
}
}

VERTEXCELLS* vb_cellspoint;
for(int cell_height=0; cell_height<2; cell_height++)
{
vb_cells->Lock(&vb_cellspoint . . .)

//writing directly to vertex buffer
for(int cell_quad=0; cell_quad<NUM_FX_QUADS; cell_quad++)
{
if(cell_temp[cell_height][cell_quad].alive)
{
//use QUAD to define two tris in the buffer
vb_cellspoint[index]. . . .
vb_cellspoint[index]. . . . //and so on
//. . . . //
}
}

vb_cells->Unlock();
vb_cells->DrawPrimitive();
}



}







So does the method above look OK to you guys? Please tell me if I'm going in the right direction with this!

Thanks a lot!

-synth_cat

Share this post


Link to post
Share on other sites
I'm sorry to keep dragging this up, but I only really want to know a little:

Is there a problem with -

Handling dynamic buffers used for drawing quads (or whatever) by creating a "copy" of the dynamic buffer (an array of vertices) and writing to it throughout the game, finally dumping it into the actual vertex buffer just before it is drawn, using memcpy().

That's all I really wanted to know.

Thanks!

-synth_cat

Share this post


Link to post
Share on other sites
Quote:

Handling dynamic buffers used for drawing quads (or whatever) by creating a "copy" of the dynamic buffer (an array of vertices) and writing to it throughout the game, finally dumping it into the actual vertex buffer just before it is drawn, using memcpy().



From my understanding a better approach would be to either:

- Use IDirect3DDevice9::DrawPrimitiveUP to render your Quads straight from the array your code is accessing (internally the device will copy the data to a location the GPU can access directly, but it will avoid the lock).
- Double buffer your vertices. So you have two vertex buffers, one is locked at the start of the frame and your AddQuad calls add directly to this, one is being rendered. Every frame you swap over so the quads that were added last frame are now rendered, and the quads that were rendered last frame are overwritten with the quads that are to be added.

Share this post


Link to post
Share on other sites
Are there any bad things about DrawPrimitiveUP()?

How do you swap data between two vertex buffers?

I recently tried splitting my my Lock() calls from only two into six, and I used six different arrays for storing temporary quads (recall that my engine requires six draw-quad calls). So I had a vb_cells_temp[2][NUM_CELL_QUADS] and a vb_fx_temp[2][2][NUM_FX_QUADS]. (I believe this is more or less what you advised me to do, Zahlman.) At Draw(), I would memcpy() each segment into the buffer and draw it. However, the results were terrible - I couldn't even see most of the quads and the ones I could see were flickering very badly.

Is memcpy() slow, and, if so, would that explain why the above method failed so spectacularly?

Thanks,
synth_cat

Share this post


Link to post
Share on other sites
Quote:
Original post by synth_cat
How do you swap data between two vertex buffers?

You don´t swap the data between the vertex buffers, you simply have 2 vertex buffers for every vertex buffer you had before (and were changing of course).
Then you use the first for the first frame, the second for the second frame, first for the third frame and so on. Essentially you got one vertex buffer that receives the new data for the _next_ frame and one vertex buffer that is used to draw the _current_ frame. These two swap their roles every frame, but not their data.

Quote:
Original post by synth_cat
I recently tried splitting my my Lock() calls from only two into six, and I used six different arrays for storing temporary quads (recall that my engine requires six draw-quad calls). So I had a vb_cells_temp[2][NUM_CELL_QUADS] and a vb_fx_temp[2][2][NUM_FX_QUADS]. (I believe this is more or less what you advised me to do, Zahlman.) At Draw(), I would memcpy() each segment into the buffer and draw it. However, the results were terrible - I couldn't even see most of the quads and the ones I could see were flickering very badly.

Is memcpy() slow, and, if so, would that explain why the above method failed so spectacularly?


I don´t think that memcpy() or your Lock() / Unlock() calls themselves would be causing such behaviour, though I don´t have much experience with dynamic VBs (only used them once yet, which was terribly slow, but worked). The things you describe sound more like an error with the arrangement of the vertices in memory. This could cause wrong winding order for triangles (you wouldn´t see those triangles) or triangles being drawn using vertices that were not supposed to form one (flickering and perhaps some triangles that are too big and so on).
However, as I mentionded, I don´t have too much experience with dynamic VBs, so I might be wrong.

Hope that helped,
good luck!

Share this post


Link to post
Share on other sites
Quote:

Are there any bad things about DrawPrimitiveUP()?


It effectively does what you are doing internally (it will copy the data you pass it into a location the GPU can access, just as you are copying your quad data from you array to the vertex buffer). So if you pass the same data to it each frame it will be alot less efficent than using a vertex buffer because it will be doing an unnessacary copy each frame. But in your case it should be more effecient as it avoids locking the vertex buffer (though prob. not as efficent as double buffering).

Quote:

How do you swap data between two vertex buffers?

The way I suggested you don't have to. One frame you fill buffer A with quads. The next frame you render buffer A while you are filling buffer B with quads. The next frame you render buffer B, and OVERWRITE the contents of the buffer A (which was rendered the previous frame and can now be discarded) with new data for this frame. This way you never have to transfer data between buffers, you just alternate which buffer is being rendered and which is being filled.

Share this post


Link to post
Share on other sites
Quote:

- Use IDirect3DDevice9::DrawPrimitiveUP to render your Quads straight from the array your code is accessing (internally the device will copy the data to a location the GPU can access directly, but it will avoid the lock).

I have researched this option and discovered that it can be slow and/or exhibit other problems, so unfortunately I won't be able to use it.

Okay, I tried yet another approach at drawing quads dynamically.

This time I tried to cut out the use of a copy of my vertex buffer (and the subsequent need to make a memcpy() call) and instead created arrays of QUADS, which are written directly to the vertex buffer at draw time. Here is my code, below:


//DRAW QUADS
VERTEXCELLS* vb_cells_pt=0;
d3ddev->SetRenderState(D3DRS_DESTBLEND,D3DBLEND_INVSRCALPHA);
d3ddev->SetTexture(0, t_cells);
d3ddev->SetStreamSource(0, vb_cells, 0, sizeof(VERTEXCELLS));
d3ddev->SetFVF(VERTEX_CELLS);



//draw quads1[]
vb_cells->Lock(0, 3*sizeof(VERTEXCELLS), (void**)&vb_cells_pt, D3DLOCK_DISCARD);
//write quad1 contents to buffer
for(int quad1=0; quad1<NUM_CELL_QUADS; quad1++)
{
vb_cells_pt[quad1*6].color=quads1[quad1].color;
vb_cells_pt[quad1*6].u=quads1[quad1].u;
vb_cells_pt[quad1*6].v=quads1[quad1].v;
vb_cells_pt[quad1*6].pos=quads1[quad1].pos;

vb_cells_pt[quad1*6+1].color=quads1[quad1].color;
vb_cells_pt[quad1*6+1].u=quads1[quad1].u;
vb_cells_pt[quad1*6+1].v=quads1[quad1].v;
vb_cells_pt[quad1*6+1].pos=quads1[quad1].pos;
vb_cells_pt[quad1*6+1].pos.x+=100;
vb_cells_pt[quad1*6+1].pos.z+=0;

vb_cells_pt[quad1*6+2].color=quads1[quad1].color;
vb_cells_pt[quad1*6+2].u=quads1[quad1].u;
vb_cells_pt[quad1*6+2].v=quads1[quad1].v;
vb_cells_pt[quad1*6+2].pos=quads1[quad1].pos;
vb_cells_pt[quad1*6+2].pos.x+=100;
vb_cells_pt[quad1*6+2].pos.z+=100;

vb_cells_pt[quad1*6+3].color=quads1[quad1].color;
vb_cells_pt[quad1*6+3].u=quads1[quad1].u;
vb_cells_pt[quad1*6+3].v=quads1[quad1].v;
vb_cells_pt[quad1*6+3].pos=quads1[quad1].pos;
vb_cells_pt[quad1*6+3].pos.x+=0;
vb_cells_pt[quad1*6+3].pos.z+=0;

vb_cells_pt[quad1*6+4].color=quads1[quad1].color;
vb_cells_pt[quad1*6+4].u=quads1[quad1].u;
vb_cells_pt[quad1*6+4].v=quads1[quad1].v;
vb_cells_pt[quad1*6+4].pos=quads1[quad1].pos;
vb_cells_pt[quad1*6+4].pos.x+=100;
vb_cells_pt[quad1*6+4].pos.z+=100;

vb_cells_pt[quad1*6+5].color=quads1[quad1].color;
vb_cells_pt[quad1*6+5].u=quads1[quad1].u;
vb_cells_pt[quad1*6+5].v=quads1[quad1].v;
vb_cells_pt[quad1*6+5].pos=quads1[quad1].pos;
vb_cells_pt[quad1*6+5].pos.x+=0;
vb_cells_pt[quad1*6+5].pos.z+=100;

}
vb_cells->Unlock();
d3ddev->DrawPrimitive(D3DPT_TRIANGLELIST, 0, NUM_CELL_QUADS);

//draw quads2[]
vb_cells->Lock(0, 3*sizeof(VERTEXCELLS), (void**)&vb_cells_pt, D3DLOCK_DISCARD);
//write quad1 contents to buffer
//write quad1 contents to buffer
for(int quad2=0; quad2<NUM_CELL_QUADS; quad2++)
{
vb_cells_pt[quad2*6].color=quads2[quad2].color;
vb_cells_pt[quad2*6].u=quads2[quad2].u;
vb_cells_pt[quad2*6].v=quads2[quad2].v;
vb_cells_pt[quad2*6].pos=quads2[quad2].pos;

vb_cells_pt[quad2*6+1].color=quads2[quad2].color;
vb_cells_pt[quad2*6+1].u=quads2[quad2].u;
vb_cells_pt[quad2*6+1].v=quads2[quad2].v;
vb_cells_pt[quad2*6+1].pos=quads2[quad2].pos;
vb_cells_pt[quad2*6+1].pos.x+=100;
vb_cells_pt[quad2*6+1].pos.z+=0;

vb_cells_pt[quad2*6+2].color=quads2[quad2].color;
vb_cells_pt[quad2*6+2].u=quads2[quad2].u;
vb_cells_pt[quad2*6+2].v=quads2[quad2].v;
vb_cells_pt[quad2*6+2].pos=quads2[quad2].pos;
vb_cells_pt[quad2*6+2].pos.x+=100;
vb_cells_pt[quad2*6+2].pos.z+=100;

vb_cells_pt[quad2*6+3].color=quads2[quad2].color;
vb_cells_pt[quad2*6+3].u=quads2[quad2].u;
vb_cells_pt[quad2*6+3].v=quads2[quad2].v;
vb_cells_pt[quad2*6+3].pos=quads2[quad2].pos;
vb_cells_pt[quad2*6+3].pos.x+=0;
vb_cells_pt[quad2*6+3].pos.z+=0;

vb_cells_pt[quad2*6+4].color=quads2[quad2].color;
vb_cells_pt[quad2*6+4].u=quads2[quad2].u;
vb_cells_pt[quad2*6+4].v=quads2[quad2].v;
vb_cells_pt[quad2*6+4].pos=quads2[quad2].pos;
vb_cells_pt[quad2*6+4].pos.x+=100;
vb_cells_pt[quad2*6+4].pos.z+=100;

vb_cells_pt[quad2*6+5].color=quads2[quad2].color;
vb_cells_pt[quad2*6+5].u=quads2[quad2].u;
vb_cells_pt[quad2*6+5].v=quads2[quad2].v;
vb_cells_pt[quad2*6+5].pos=quads2[quad2].pos;
vb_cells_pt[quad2*6+5].pos.x+=0;
vb_cells_pt[quad2*6+5].pos.z+=100;

}
vb_cells->Unlock();
d3ddev->DrawPrimitive(D3DPT_TRIANGLELIST, 0, NUM_CELL_QUADS);

//draw quads3[]
vb_cells->Lock(0, 3*sizeof(VERTEXCELLS), (void**)&vb_cells_pt, D3DLOCK_DISCARD);
//write quad1 contents to buffer
//write quad1 contents to buffer
for(int quad3=0; quad3<NUM_CELL_QUADS; quad3++)
{
vb_cells_pt[quad3*6].color=quads3[quad3].color;
vb_cells_pt[quad3*6].u=quads3[quad3].u;
vb_cells_pt[quad3*6].v=quads3[quad3].v;
vb_cells_pt[quad3*6].pos=quads3[quad3].pos;

vb_cells_pt[quad3*6+1].color=quads3[quad3].color;
vb_cells_pt[quad3*6+1].u=quads3[quad3].u;
vb_cells_pt[quad3*6+1].v=quads3[quad3].v;
vb_cells_pt[quad3*6+1].pos=quads3[quad3].pos;
vb_cells_pt[quad3*6+1].pos.x+=100;
vb_cells_pt[quad3*6+1].pos.z+=0;

vb_cells_pt[quad3*6+2].color=quads3[quad3].color;
vb_cells_pt[quad3*6+2].u=quads3[quad3].u;
vb_cells_pt[quad3*6+2].v=quads3[quad3].v;
vb_cells_pt[quad3*6+2].pos=quads3[quad3].pos;
vb_cells_pt[quad3*6+2].pos.x+=100;
vb_cells_pt[quad3*6+2].pos.z+=100;

vb_cells_pt[quad3*6+3].color=quads3[quad3].color;
vb_cells_pt[quad3*6+3].u=quads3[quad3].u;
vb_cells_pt[quad3*6+3].v=quads3[quad3].v;
vb_cells_pt[quad3*6+3].pos=quads3[quad3].pos;
vb_cells_pt[quad3*6+3].pos.x+=0;
vb_cells_pt[quad3*6+3].pos.z+=0;

vb_cells_pt[quad3*6+4].color=quads3[quad3].color;
vb_cells_pt[quad3*6+4].u=quads3[quad3].u;
vb_cells_pt[quad3*6+4].v=quads3[quad3].v;
vb_cells_pt[quad3*6+4].pos=quads3[quad3].pos;
vb_cells_pt[quad3*6+4].pos.x+=100;
vb_cells_pt[quad3*6+4].pos.z+=100;

vb_cells_pt[quad3*6+5].color=quads3[quad3].color;
vb_cells_pt[quad3*6+5].u=quads3[quad3].u;
vb_cells_pt[quad3*6+5].v=quads3[quad3].v;
vb_cells_pt[quad3*6+5].pos=quads1[quad3].pos;
vb_cells_pt[quad3*6+5].pos.x+=0;
vb_cells_pt[quad3*6+5].pos.z+=100;

}
vb_cells->Unlock();
d3ddev->DrawPrimitive(D3DPT_TRIANGLELIST, 0, NUM_CELL_QUADS);




Not only does this run much slower than my old write-to-copy-of-vertex-buffer method, but I don't even see any quads! Plus, this new method seems to be crunching the whole app - my DirectInput starts to read keystates incorrectly, causing the camera to keep moving for a second after I let go of the move key, etc. Can anyone please explain why it does this?

I am amazed at how difficult it has become just to figure out how to draw my quads.

-DrawPrimitiveUP won't work because it's too slow

-Memcpy()'ing a copy of the vertex buffer to the vertex buffer at draw time doesn't work because it behaves strangely and won't work properly when I do it more than once with multiple copies - something I must be able to do.) - See my previous post.

Can anyone please help me out here? All I want to do is to be able to call some sort of AddQuadToBeDrawn() function from anywhere within the game (within the handling of cells, shots, lightning, weather, etc.) I cannot just lock the vertex buffer and then iterate through all my entities, writing directly to the vbuffer, for two reasons.

#1, the quads need to be split into 6 different DrawPrimitive() calls based on whether they are additive/subtractive/source_blend/high/low.

#2, adding quads in my game is more complicated than just "draw a quad for each entity" - in short, if I were to just write directly to the vbuffer while it was locked, there would need to be a large amount of game logic run while the vbuffer was locked, which would obviously be bad.

I'm sorry to keep bothering you all about this. . .

Thanks!
-synth_cat

Share this post


Link to post
Share on other sites
Quote:

-DrawPrimitiveUP won't work because it's too slow

Personally I'd test this.. Its very easy to implement (you just have to pass your quad arrays directly to DrawPrimitiveUP)

Quote:

#1, the quads need to be split into 6 different DrawPrimitive() calls based on whether they are additive/subtractive/source_blend/high/low.

If you went the double buffering route each vertex buffer would have to be doubled-up.


Quote:

need to be a large amount of game logic run while the vbuffer was locked, which would obviously be bad.

If you are not trying to render while this is going on that should not be a big problem.


Share this post


Link to post
Share on other sites
Well, after a lot of tinkering with this, I have come to the conclusion that all the errors and problems resulted from locking a vertex buffer more than once per step.

To tell the truth, I like my method of writing to an array of vertices and then memcpy()'ing it into my vertex buffer at draw time. This method allows me to only lock my vertex buffer once, which is good for performance.

So I have only one question to ask - Is there any problem at all with doing memcpy() to a vertex buffer? (Does it crash on some video cards, is it extraordinarily slow on some video cards, etc.)

Thanks to all of you for helping me out with this problem - I just hope that the answer to the above question can put this issue to rest for good!

Thanks,
synth_cat

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this