Sign in to follow this  
RobMaddison

DrawIndexedPrimitive spike

Recommended Posts

This may be a rhetorical question, but DrawIndexedPrimitive is synchronous when called from c++ isn't it? I'm assuming it is, but some strange behaviour is making me think maybe it's not. If I start a high performance timer prior to a bunch of DrawIndexedPrimitive calls (around 25 or so) and take the reading when the calls are finished, is that indicative of how long in milliseconds my actual rendering literally takes? I'm running in optimized release mode using DX9 and unmanaged c++. Without going into too much detail, my view is drawing a terrain which is held in around 25 vertex buffers (60 bytes per vertex, 37249 vertices per buffer). The terrain is drawn using simple filled tris (no textures) and everything is indexed. There are around 288 tris per vertex buffer, so around 7200 tris in total for the entire draw set. I know this can [and will] be optimized, but as I haven't implemented the different LODs yet, I took the average LOD for the whole view. The issue I'm seeing is a sporadic spike in the millis it takes to draw the terrain (25 DrawIndexedPrimitive calls). Mostly it's around 0.5-1ms but every second or so it jumps up to between 10-30ms. I haven't put NVPerfHUD on it yet, that's my next task, but I thought i'd throw it out there to see if anyone can think of anything or have seen similar things happen. (I'm running on a DELL Latitude D820 with 2gb RAM, and a NV250 Quadro 120M with 250mb of VRAM - which I think may actually litterally be 128mb - apparently some kind of marketing ploy). Thanks for any help/suggestions

Share this post


Link to post
Share on other sites
DrawIndexedPrimitive is asyncronous. However, it takes a reasonably long time to make the call. DrawIndexedPrimitive makes a transition to kernel mode I belive, which is a pretty expensive thing to do. It then pushes the data into a buffer for the video card driver to handle, and returns. The video card then reads batches out of the buffer and renders them.

Timing the duration of individual D3D calls in pointless, because it's mostly async. NVPerfHUD and/or PIX should help point out what the problem is.


How many DrawIndexedPrimitive calls are you making per frame? You really want to keep it under 500 or so.

Share this post


Link to post
Share on other sites
Hi, thanks for the response

I'm only making 25 calls per frame - it's the only thing I'm drawing at the moment. If it's asynchronous, that makes more sense, but I'll hook up NVPerfHUD and see if I can see what's happening.

I would have thought 25 DrawIndexedPrimitive calls with a total of 7200 tris with no texturing would be super quick, so I guess there must be something not quite right.

Cheers

Share this post


Link to post
Share on other sites
I believe DrawIndexedPrimitive() doesn't always do the swap to kernel mode - it'll only do that every so often to improve performance. Also some of the rendering will probably be done during the Present().

Those long delays could well be texture uploads, or vertex buffer uploads - you have around 56MB of vertex buffers there...

Share this post


Link to post
Share on other sites
Actually, thinking about this a bit more, my loop is currently simple:

1) Adjust geometry and indices (~1ms) (this includes locking vertex/index buffers)
2) Draw geometry (DrawIndexedPrimitive x 25)

I assume my timing of part 1) is realistic, unless vertex/index buffer locking is asynchronous which it can't be. I don't use any kind of frame timing at the moment so frames are drawn frame after frame regardless of timing.

So if the DIP calls are asynchronous, it could be that the second, third, umpteenth, etc. call to DIP is locking against itself causing a build-up and, consequently, the spike.

How are you supposed to know when it is okay to make the next set of DIP calls? Is there some kind of signal or callback in the API that will let you know when it's next available for drawing?

Share this post


Link to post
Share on other sites
Hi Adam

Yes, it's around 56mb of vertices (xyz, normal, diffuse + 4 sets of texture coords), but shouldn't these stay in VRAM once created?

I do lock a number of the vertex buffers each frame (the new ones that come into the view frustum) and adjusting the y values (on the CPU, not the vertex shader). If I lock the entire contents of a vertex buffer and change the values, does that mean it has to go across the bus to the GPU again when it's used in a DIP call? If so, that'll be my problem.

Share this post


Link to post
Share on other sites
They should stay in VRAM as long as you haven't run out of VRAM, when it will then swap out data on a least recently used basis. Managed resources may also not be uploaded until they are first used.

Locking a non-dynamic buffer will force an upload of the new vertices to the GPU the next time it's used. I'd suggest either doing all the modification up front, or using the vertex shader, since you probably don't want the terrain in a dynamic buffer.

Share this post


Link to post
Share on other sites
My terrain is completely represented by dynamic vertex buffers. I don't see any other way of doing it. To have static buffers pre-built would take up far too much memory, so I rotate the use of a set number of vertex buffers. Most of them don't get updated each frame (the portions that remain on screen), I only update the buffers that include new 'chunks' into the frustum.

Share this post


Link to post
Share on other sites
You should make sure you read and understand the Accurately Profiling Direct3D API Calls (Direct3D 9) paper in the SDK documentation. Performance measurement for D3D applications is an extremely complex beast - not only have you got coarse level parallelism (CPU/GPU) but the GPU has a complex pipeline that can be off-balance...

PIX for Windows and NVPerfHUD are invaluable [grin]

Jack

Share this post


Link to post
Share on other sites
Ahh, in that case there's a simple way to speed things up.

The simplest solution is to make sure you're locking with the NOOVERWRITE or DISCARD flags as appropriate, and that the vertex buffer is write only.

If that doesn't fix it, or isn't possible then you'll probably need to manage the buffers manually:

What's probably happening is that when you go to lock your dynamic VB the hardware is sometimes still drawing from it. When that happens the CPU has to wait for the GPU to finish before it can obtain the write lock.

Normally using the right lock flags will get round that, as long as the buffer is write only. The other way around that is to avoid reusing a buffer until a frame or two after it was last drawn from, which means you'll probably need an extra buffer or two spare for that purpose.

Share this post


Link to post
Share on other sites
After reading the last couple of posts again, I just thought I'd clear up the fact that when I prepare the geometry (by locking/unlocking index and vertex buffers), prior to the DIP calls, there is never any delay.

This process takes around 1ms but is usually less. I'm fairly sure my timing of this is accurate as there are no asynchronous calls involved. This process is:

Lock each vertex buffer that has new viewable data, adjust the height values of a portion of that vertex buffer (my VBs are separated into 9 65x65 vertex chunks) - this generally involves updates to around 7 or 8 vertex buffers (depending on how fast the view is rotating).


I'm using dynamic DEFAULT pool vertex buffers with WRITE_ONLY set. I think if you use DISCARD, it throws away the buffer before writing to it which may be slower - although I haven't tried it yet.

I think what's happening is that the geometry preparation stage is very quick and is immediately followed by the DIP calls. As the DIP calls are asynchronous, the next geometry preparation happens before the DIP calls are finished which might cause lag in the DIP calls. What I think I need to do is:

1) Carry out normal game loop processing/event handling and wait until the card is ready for drawing
2) Prepare geometry
3) Call DIPs

I need to work out how to tell when the card is free for drawing before drawing the next set. Is there something in the API that can tell me when the card is ready for drawing perhaps?

Share this post


Link to post
Share on other sites
Quote:
Original post by RobMaddison
I'm using dynamic DEFAULT pool vertex buffers with WRITE_ONLY set. I think if you use DISCARD, it throws away the buffer before writing to it which may be slower - although I haven't tried it yet.
Have a look at the SDK docs in Direct3D9 -> Programming Guide -> Programming Tips -> Performance Optimizations.

You want to create a large VB and lock it in chunks, with NOOVERWRITE, then once with DISCARD once it's full, which allows D3D to provide a pointer to a virtual buffer, which prevents a stall.

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this