Archived

This topic is now archived and is closed to further replies.

Grugnorr

How many polys to send in a DrawPrimitive() call

Recommended Posts

I have seen many threads talking about this...Of course calling DrawPrimitive()(or DrawIndexedPrimitive()) with only a few polys is bad performance-wise, BUT it is bad using LOTS of polys in a single call too. I have read about using THOUSANDS of polys in a single call. BUT this is what Microsoft say in one of its technical articles about DX8: "The following are key areas to look at when optimizing performance: Batch size. Direct3D is optimized for large batches of primitives. The more polygons that can be sent in a single call, the better. A good rule of thumb is to aim to average over 100 polygons per call. Below that level you''re probably not getting optimal performance, above that and you''re into diminishing returns and potential conflicts with concurrency considerations (see below). Concurrency. If you can arrange to perform rendering concurrently with other processing, then you will be taking full advantage of system performance. This goal can conflict with the goal of reducing renderstate changes. You need to strike a balance between batching to reduce state changes and pushing data out to the driver early to help achieve concurrency. Using multiple vertex buffers in round-robin fashion can help with concurrency." They say to use over 100 polys per call or using a complex multi-thread model of execution. Any comments are GREATLY welcomed... What the hells!

Share this post


Link to post
Share on other sites
I''d also like to know the answer to this question. If you haven''t already seen this in the SDK, MS also reckons 1000''s of vertices or 10''s of vertices are OK, and then conclude that there is no correct answer. Here''s a bit of cut and paste from the SDK :

Dynamic vertex and index buffers have a difference in performance based the size and usage. The usage styles below help to determine whether to use D3DLOCK_DISCARD or D3DLOCK_NOOVERWRITE for the Flags parameter of the Lock method.

Usage Style 1:

for loop()
{
pBuffer->Lock(...D3DLOCK_DISCARD...); //Ensures that hardware
//doesn''t stall by returning
//a new pointer.
Fill data (optimally 1000s of vertices/indices, no fewer) in pBuffer.
pBuffer->Unlock()
Change state(s).
DrawPrimitive() or DrawIndexedPrimitive()
}

Usage Style 2:

for loop()
{
pVB->Lock(...D3DLOCK_DISCARD...); //Ensures that hardware doesn''t
//stall by returning a new
//pointer.
Fill data (optimally 1000s of vertices/indices, no fewer) in pBuffer.
for loop( 100s of times )
{
Change State
DrawPrimitive() or DrawIndexPrimitives() //Tens of primitives
}
}

Usage Style 3:

for loop()
{
If there is space in the Buffer
{
//Append vertices/indices.
pBuffer->Lock(…D3DLOCK_NOOVERWRITE…);
}
Else
{
//Reset to beginning.
pBuffer->Lock(…D3DLOCK_DISCARD…);
}
Fill few 10s of vertices/indices in pBuffer
pBuffer->Unlock
Change State
DrawPrimitive() or DrawIndexedPrimitive() //A few primitives
}

Style 1 is faster than either style 2 or 3, but is generally not very practical. Style 2 is usually faster than style 3, provided that the application fills at least a couple thousand vertices/indices for every Lock, on average. If the application fills fewer than that on average, then style 3 is faster. There is no guaranteed answer as to which lock method is faster and the best way to find out is to experiment.

Share this post


Link to post
Share on other sites
Microsoft should know what they''re talking about, so overall they''re probably right.

As for my own experience, I don''t see any speed difference between sending batches of 500 polys and batches of 10,000 polys, *but* my programs don''t involve multiple threads and complex physics or AI, so stalling the CPU while the video card is drawing isn''t an issue.

If you were writing a serious project which taxed the CPU as much as the video hardware then I''d take Microsoft''s advise and keep batches at a reasonable size.

Share this post


Link to post
Share on other sites
Listen to Microsoft. They know what they''re doing. However, here''s my 2 cents on this issue:
For almost every game people around here are making, send as many polys as you can per call. Unless you''re developing a super-L337 ultra-high sysreq game that needs to be hyper-optimized, you probably don''t need to worry too much about doing your AI at the same time as it''s rendering. And most of us here aren''t developing a game of quite that caliber (Although, don''t get me wrong, I know there are a few). Basically, don''t make it more of a hassle than it needs to be. Go with what works for you.

-------------------------------
NeXe: NeHe DirectX-style.

Follow the orange rabbit.

Share this post


Link to post
Share on other sites