how many Lock() calls is prohibitive?

Started by
13 comments, last by synth_cat 17 years, 8 months ago
I'm sure this must seem like a somewhat dumb question to ask, but I need advice from people who have actually done this before, rather than from some MSDN article... Anyway, I was discussing in another thread a situation I had with dynamic vertex buffers. I would create vertex arrays which I would write to with AddQuad() functions I wrote. Then at the end of each step I would Lock() the vertex buffers and memcpy() the data from the vertex arrays into the true vertex buffers. Then at the beginning of the next step, I would ZeroMemory() the two "proxy" vertex arrays, preparing them for new AddQuad() calls. For one thing, I was thinking about converting the vertex arrays into std::vectors because I'm afraid of a stack overflow. Would it be worth the bother? The other problem - the dynamic vertex buffers I have been talking about draw quads, and the quads are separated into six different drawing calls (low-additive, high-additive, and stuff like that) between the two vertex buffers. This means that I have to keep track of which position I am supposed to write to in the vertex arrays and remember a bunch of indeces of where the vertices for a certain call end or begin. So what I was considering doing was taking the two vertex arrays and splitting them into six arrays, one for each call. This would mean I would have to Lock() and memcpy() to the vertex buffers a total of six times instead of twice. Would that make a big difference? Would it be a big performance hit on some computers? My problem is fully described here, for reference: http://www.gamedev.net/community/forums/topic.asp?whichpage=2&pagesize=25&topic_id=404549 Thanks for any help! -synth_cat
Greg Philbrick, Game Developercoming soon . . . Overhauled CellZenith
Advertisement
Well, someone advised me once that you should look to keep your calls to Lock down below 200-300 per frame so if that was correct advice, you should be fine. Sounds a bit suspicious now I come to pass that on to someone else though.

[EDIT Just been reading your other post so appreciate the next paragraph is a bit redundant.]

Can I just ask though why you don't just write to the vertex buffers directly? Seems like a waste of memory to me to duplicate the buffer in system memory then lock and copy like that.

There is an article on GameDev in the DirectX Graphics section that describes how to batch quads using the vertex buffer directly.

[Edited by - EasilyConfused on July 22, 2006 3:36:36 AM]
Quote:Original post by EasilyConfused
Well, someone advised me once that you should look to keep your calls to Lock down below 200-300 per frame
That doesn't sound right to me [oh] 200-300 drawing calls per frame would be more common advice, but that many locks is going to really hurt.

Two bits of advice from my own personal experience:

1. Lazy Evaluation - buffer all quads in system memory and only update them to VRAM when its absolutely essential (e.g. you need to draw them). Ideally try and structure it so that you dont draw until you absolutely have to... It takes some getting used to, but writing efficient D3D code is not always the most intuitive to the programmer. Simple straight-forward code is nice, but doesn't necessary get you the best performance.

2. Consider the Draw*PrimitiveUP() calls. I've found them to be as fast and in some cases faster than rendering from VB's. If you're making lots of changes to the data then it might be faster to skip the whole VB/IB thing and thus avoid any overhead from locking/modifying. VRAM resources are best with static (or mostly static) resources and aren't necessarily the best in all other cases.

WRT to #2 you should be able to convert to std::vector quite easily.

hth
Jack

<hr align="left" width="25%" />
Jack Hoxley <small>[</small><small> Forum FAQ | Revised FAQ | MVP Profile | Developer Journal ]</small>

You should keep the number of locks to a minimum, but the difference between 6 and 2 is negligible. Also, unless you are doing something really tricky, it would be better write to the VB directly.

John BoltonLocomotive Games (THQ)Current Project: Destroy All Humans (Wii). IN STORES NOW!
More important than the raw number of Lock calls how they are called. If you double buffer (so a vertex buffer that is being rendered this frame, will not be locked and updated until next frame), to ensure the GPU is never using the data you are locking, they should be very quick (though I'm sure there is some small CPU overhead of calling them even then).

[Edited by - griffin2000 on July 22, 2006 2:30:13 PM]
Thanks for all the replies!

I think it sounds best to write directly to the vertex buffer (in other words, write through a VERTEX* pointer between a Lock() and Unlock() call.)

However, I still need a way of storing the vertex info outside of the true vertex buffer. This is because my game makes AddQuad() calls throughout the entire game (and it's not just as simple as doing one AddQuad() call for each entity.) Basically, if I were to rewrite my AddQuad() functions so that they would write to a vertex buffer that has already been locked, I would have to stick my entire game logic process between a Lock() and an Unlock() call.

I assume that using up a lot of clock cycles while a vertex buffer is locked is bad - am I right?

So I had an idea - why not just store an array of QUAD structs outside of the array, where all AddQuad() does is write to the array, and then at the end of each step Lock() the vertex buffer and run through all the quads, writing directly to the vertex buffer.

Here's what I mean:

struct QUAD{    bool alive;    bool face_camera;    D3DXVECTOR3 center;    float u, v;    DWORD color;    float radius;    QUAD(bool face_camera, D3DXVECTOR3 center, float u, float v, DWORD color)        face_camera(face_camera) : center(center) : u(u) : v(v) : color(color)       {}};//the actual vertex buffersLPDIRECT3DVERTEXBUFFER9 vb_cellsLPDIRECT3DVERTEXBUFFER9 vb_fx;//the things that will store vertex data until needed - see that I am using//six different arrays, one for each call (my other thread explains this more //fully)QUAD fx_temp[2][2][NUM_FX_QUADS];QUAD cells_temp[2][NUM_CELLS_QUADS];void AddFXQuad(      bool high, bool additive,      D3DXVECTOR3 center, float u, float v, DWORD color,float radius){    for(int new_quad=0; new_quad<NUM_FX_QUADS; new_quad++)    {        if(fx_temp[high][additive][new_quad].alive==false)        {            fx_temp[high][additive][new_quad].alive=true;            fx_temp[high][additive][new_quad].center=center;            fx_temp[high][additive][new_quad].color=color;            //and so on . . .             break;        }    }    }void AddCellQuad(      bool high,      D3DXVECTOR3 center, float u, float v, DWORD color,float radius){    for(int new_quad=0; new_quad<NUM_CELL_QUADS; new_quad++)    {        if(cells_temp[high][new_quad].alive==false)        {            cells_temp[high][new_quad].alive=true;            cells_temp[high][new_quad].center=center;            cells_temp[high][new_quad].color=color;            //and so on . . .             break;        }    }    }//THE GAME LOOP//while(1){   //refresh the quad-storers   ZeroMemory(&fx_temp,sizeof(fx_temp));   ZeroMemory(&cells_temp,sizeof(cells_temp));   //do game logic, handling cells, explosions, shadows, etc.   //AddFXQuad() called at various points   //AddCellQuad() called at various points   /////////FILL AND DRAW THE TWO VERTEX BUFFERS///////////////////   //////////////////////////////////////////////////////////////   //now we get to the drawing part (this part may not be written completely    //correct - I've just written this quickly to give an idea of what I'm doing   VERTEXFX* vb_fxpoint;      for(int fx_height=0; fx_height<2; fx_height++)   {       for(int fx_blendmode=0; fx_blendmode<2; fx_blendmode++)       {            vb_fx->Lock(&vb_point . . .)            //writing directly to vertex buffer            for(int fx_quad=0; fx_quad<NUM_FX_QUADS; fx_quad++)            {                if(fx_temp[fx_height][fx_blendmode][fx_quad].alive)                {                  //use QUAD to define two tris in the buffer                  vb_fxpoint[index]. . . .                  vb_fxpoint[index]. . . .  //and so on                  //. . . . //                }            }            vb_fx->Unlock();            vb_fx->DrawPrimitive();       }   }   VERTEXCELLS* vb_cellspoint;      for(int cell_height=0; cell_height<2; cell_height++)   {            vb_cells->Lock(&vb_cellspoint . . .)            //writing directly to vertex buffer            for(int cell_quad=0; cell_quad<NUM_FX_QUADS; cell_quad++)            {                if(cell_temp[cell_height][cell_quad].alive)                {                  //use QUAD to define two tris in the buffer                  vb_cellspoint[index]. . . .                  vb_cellspoint[index]. . . .  //and so on                  //. . . . //                }            }            vb_cells->Unlock();            vb_cells->DrawPrimitive();   }}



So does the method above look OK to you guys? Please tell me if I'm going in the right direction with this!

Thanks a lot!

-synth_cat
Greg Philbrick, Game Developercoming soon . . . Overhauled CellZenith
What I'm also wondering is: would it be better to use a vector or just an array for these QUADS?
Greg Philbrick, Game Developercoming soon . . . Overhauled CellZenith
I'm sorry to keep dragging this up, but I only really want to know a little:

Is there a problem with -

Handling dynamic buffers used for drawing quads (or whatever) by creating a "copy" of the dynamic buffer (an array of vertices) and writing to it throughout the game, finally dumping it into the actual vertex buffer just before it is drawn, using memcpy().

That's all I really wanted to know.

Thanks!

-synth_cat
Greg Philbrick, Game Developercoming soon . . . Overhauled CellZenith
Quote:
Handling dynamic buffers used for drawing quads (or whatever) by creating a "copy" of the dynamic buffer (an array of vertices) and writing to it throughout the game, finally dumping it into the actual vertex buffer just before it is drawn, using memcpy().



From my understanding a better approach would be to either:

- Use IDirect3DDevice9::DrawPrimitiveUP to render your Quads straight from the array your code is accessing (internally the device will copy the data to a location the GPU can access directly, but it will avoid the lock).
- Double buffer your vertices. So you have two vertex buffers, one is locked at the start of the frame and your AddQuad calls add directly to this, one is being rendered. Every frame you swap over so the quads that were added last frame are now rendered, and the quads that were rendered last frame are overwritten with the quads that are to be added.
Are there any bad things about DrawPrimitiveUP()?

How do you swap data between two vertex buffers?

I recently tried splitting my my Lock() calls from only two into six, and I used six different arrays for storing temporary quads (recall that my engine requires six draw-quad calls). So I had a vb_cells_temp[2][NUM_CELL_QUADS] and a vb_fx_temp[2][2][NUM_FX_QUADS]. (I believe this is more or less what you advised me to do, Zahlman.) At Draw(), I would memcpy() each segment into the buffer and draw it. However, the results were terrible - I couldn't even see most of the quads and the ones I could see were flickering very badly.

Is memcpy() slow, and, if so, would that explain why the above method failed so spectacularly?

Thanks,
synth_cat
Greg Philbrick, Game Developercoming soon . . . Overhauled CellZenith

This topic is closed to new replies.

Advertisement