DrawIndexedPrimitive spike

Started by
12 comments, last by RobM 16 years, 5 months ago
This may be a rhetorical question, but DrawIndexedPrimitive is synchronous when called from c++ isn't it? I'm assuming it is, but some strange behaviour is making me think maybe it's not. If I start a high performance timer prior to a bunch of DrawIndexedPrimitive calls (around 25 or so) and take the reading when the calls are finished, is that indicative of how long in milliseconds my actual rendering literally takes? I'm running in optimized release mode using DX9 and unmanaged c++. Without going into too much detail, my view is drawing a terrain which is held in around 25 vertex buffers (60 bytes per vertex, 37249 vertices per buffer). The terrain is drawn using simple filled tris (no textures) and everything is indexed. There are around 288 tris per vertex buffer, so around 7200 tris in total for the entire draw set. I know this can [and will] be optimized, but as I haven't implemented the different LODs yet, I took the average LOD for the whole view. The issue I'm seeing is a sporadic spike in the millis it takes to draw the terrain (25 DrawIndexedPrimitive calls). Mostly it's around 0.5-1ms but every second or so it jumps up to between 10-30ms. I haven't put NVPerfHUD on it yet, that's my next task, but I thought i'd throw it out there to see if anyone can think of anything or have seen similar things happen. (I'm running on a DELL Latitude D820 with 2gb RAM, and a NV250 Quadro 120M with 250mb of VRAM - which I think may actually litterally be 128mb - apparently some kind of marketing ploy). Thanks for any help/suggestions
Advertisement
DrawIndexedPrimitive is asyncronous. However, it takes a reasonably long time to make the call. DrawIndexedPrimitive makes a transition to kernel mode I belive, which is a pretty expensive thing to do. It then pushes the data into a buffer for the video card driver to handle, and returns. The video card then reads batches out of the buffer and renders them.

Timing the duration of individual D3D calls in pointless, because it's mostly async. NVPerfHUD and/or PIX should help point out what the problem is.


How many DrawIndexedPrimitive calls are you making per frame? You really want to keep it under 500 or so.
Hi, thanks for the response

I'm only making 25 calls per frame - it's the only thing I'm drawing at the moment. If it's asynchronous, that makes more sense, but I'll hook up NVPerfHUD and see if I can see what's happening.

I would have thought 25 DrawIndexedPrimitive calls with a total of 7200 tris with no texturing would be super quick, so I guess there must be something not quite right.

Cheers
I believe DrawIndexedPrimitive() doesn't always do the swap to kernel mode - it'll only do that every so often to improve performance. Also some of the rendering will probably be done during the Present().

Those long delays could well be texture uploads, or vertex buffer uploads - you have around 56MB of vertex buffers there...
Actually, thinking about this a bit more, my loop is currently simple:

1) Adjust geometry and indices (~1ms) (this includes locking vertex/index buffers)
2) Draw geometry (DrawIndexedPrimitive x 25)

I assume my timing of part 1) is realistic, unless vertex/index buffer locking is asynchronous which it can't be. I don't use any kind of frame timing at the moment so frames are drawn frame after frame regardless of timing.

So if the DIP calls are asynchronous, it could be that the second, third, umpteenth, etc. call to DIP is locking against itself causing a build-up and, consequently, the spike.

How are you supposed to know when it is okay to make the next set of DIP calls? Is there some kind of signal or callback in the API that will let you know when it's next available for drawing?
Hi Adam

Yes, it's around 56mb of vertices (xyz, normal, diffuse + 4 sets of texture coords), but shouldn't these stay in VRAM once created?

I do lock a number of the vertex buffers each frame (the new ones that come into the view frustum) and adjusting the y values (on the CPU, not the vertex shader). If I lock the entire contents of a vertex buffer and change the values, does that mean it has to go across the bus to the GPU again when it's used in a DIP call? If so, that'll be my problem.
They should stay in VRAM as long as you haven't run out of VRAM, when it will then swap out data on a least recently used basis. Managed resources may also not be uploaded until they are first used.

Locking a non-dynamic buffer will force an upload of the new vertices to the GPU the next time it's used. I'd suggest either doing all the modification up front, or using the vertex shader, since you probably don't want the terrain in a dynamic buffer.
My terrain is completely represented by dynamic vertex buffers. I don't see any other way of doing it. To have static buffers pre-built would take up far too much memory, so I rotate the use of a set number of vertex buffers. Most of them don't get updated each frame (the portions that remain on screen), I only update the buffers that include new 'chunks' into the frustum.
You should make sure you read and understand the Accurately Profiling Direct3D API Calls (Direct3D 9) paper in the SDK documentation. Performance measurement for D3D applications is an extremely complex beast - not only have you got coarse level parallelism (CPU/GPU) but the GPU has a complex pipeline that can be off-balance...

PIX for Windows and NVPerfHUD are invaluable [grin]

Jack

<hr align="left" width="25%" />
Jack Hoxley <small>[</small><small> Forum FAQ | Revised FAQ | MVP Profile | Developer Journal ]</small>

Ahh, in that case there's a simple way to speed things up.

The simplest solution is to make sure you're locking with the NOOVERWRITE or DISCARD flags as appropriate, and that the vertex buffer is write only.

If that doesn't fix it, or isn't possible then you'll probably need to manage the buffers manually:

What's probably happening is that when you go to lock your dynamic VB the hardware is sometimes still drawing from it. When that happens the CPU has to wait for the GPU to finish before it can obtain the write lock.

Normally using the right lock flags will get round that, as long as the buffer is write only. The other way around that is to avoid reusing a buffer until a frame or two after it was last drawn from, which means you'll probably need an extra buffer or two spare for that purpose.

This topic is closed to new replies.

Advertisement