VertexBuffer.Lock is eating my memory...

Started by
6 comments, last by GroZZleR 18 years, 10 months ago
Hey all, I'm in the midst of creating my 2D engine. I've decided to go with a fairly large vertex cache and filling that up instead of a bunch of independent vertex buffers. This let's me batch triangles in huge doses and is (excluding this) working quite nicely. This is what happens every render update: Sort sprites (Layer then Y then Texture) for every texture: lock vertex buffer fill in all sprite data unlock draw (position in cache, number of polygons) I added some timing code today and found out I'm running at a whopping 20 frames per second. I was shocked because I was only using 2 textures, so I shouldn't of been being hit by the fairly expensive lock calls. I downloaded a profiler and ran it. I found out that every single VertexBuffer.Lock() call is allocating up to 36KB of memory every frame... My allocation time-chart spends more time allocating this memory then rendering the frame itself so this has to be the problem. I'm using Managed DirectX and my lock call looks like this: vbData = (Direct3D.CustomVertex.PositionColoredTextured[])_vertices.Lock(0, 0); I know that's locking the whole buffer and I will fix that later. Can anyone make any recommendations on what I can do? Surely the Sprite interface uses a similar locking system to achieve its goals.
Advertisement
Locking is generally slow as all the data goes over the GPU-system bus. You should at all times avoid locking during a render loop unless it cannot be done any other way. Surely the sprite interface does not do this but uses billboarding instead.

Why must your buffer be locked? Can the data be static? Can it be computed by transforming static data only? Etc.

Greetz,

Illco
Have you tried running your application through the debug runtimes? often they'll scream and shout if you try and do an inefficient Lock() operation.

If you hadn't realised it, you can get very different performance characteristics by specifying different memory pools (Default/SystemMem/Managed..) and different usage flags (Dynamic, WriteOnly, Discard). The debug runtime will often tell you (in the form of a warning) if what you're doing is not following these "best practices".

But, as Illco said, if you can find a way of avoiding the locks and/or resource modification then do so. Even a simple case where you cache the VB's for any/every object that doesn't change every frame.

hth
Jack

<hr align="left" width="25%" />
Jack Hoxley <small>[</small><small> Forum FAQ | Revised FAQ | MVP Profile | Developer Journal ]</small>

I don't see why you access the vertex buffer at all. The way I envision it you place the necessary sprites into the buffer during initialization, then only allocate an index buffer to determine which sprites get drawn.

If the sprites additionally need to move about, then you can use the world-transform while rendering each of those sprites.
Quote:Original post by LoreKeeper
If the sprites additionally need to move about, then you can use the world-transform while rendering each of those sprites.

This is usually the problem. A self-rolled TL-Vertex based implementation cannot use the transform (or lighting) pipeline, thus world-transform matrix (D3DTS_WORLD) is useless.

I don't use ID3DXSprite, but I think I read somewhere that it can accept 2D transformations so as to achieve the same effect - but as far as I can tell of a closed-source solution it'll be doing something similar internally...

EDIT: Then again, Direct3D.CustomVertex.PositionColoredTextured, is not a TL-Vertex so i'm wrong... [sad]

Jack

<hr align="left" width="25%" />
Jack Hoxley <small>[</small><small> Forum FAQ | Revised FAQ | MVP Profile | Developer Journal ]</small>

I'm locking with D3DLOCK_DISCARD, filling the buffer each time with new data and had never any performance loss. In contrary, i can lock and unlock tens of times per frame and not see -any- loss in fps. And i'm even working with a low-end graphics card.
Maybe setting your buffer to NULL (nil) before locking will help..
Quote:Original post by jollyjeffers
If you hadn't realised it, you can get very different performance characteristics by specifying different memory pools (Default/SystemMem/Managed..) and different usage flags (Dynamic, WriteOnly, Discard). The debug runtime will often tell you (in the form of a warning) if what you're doing is not following these "best practices".
I'm guessing that your biggest problem and solution lies here. I have a program at work that requires perfect synchronization with the monitor refresh rate, and I lock/unlock a VB and IB multiple times per frame to manually add in vertices that are more or less already transformed/lit (though I still use the view/projection transforms). This works perfectly fine, and I'm guessing that the difference between your buffer locks and mine are that A) my buffers were created with dynamic usage in the default memory pool, and are locked with the discard flag. It seems that the dynamic usage keeps memory around on both the video card and in system memory in a way that allows really quick access to either. And discarding the whole buffer when you lock it means that it doesn't actually have to lock the video memory buffer, but merely the system memory buffer. Then it can write the system memory out to the video card when you're done.
"We should have a great fewer disputes in the world if words were taken for what they are, the signs of our ideas only, and not for things themselves." - John Locke
Thanks everyone for the help, changing the locking flags was the main solution. I also had a bottleneck within the loop itself (note: logging debug information to a file every rendering frame is not efficient).

The batching solution is on-par with the sprite interface and I have the full control I'm seeking.

Thanks again.

This topic is closed to new replies.

Advertisement