Massing of the primitives

Started by
7 comments, last by Mythics 16 years, 6 months ago
How do render thousands of triangles individually with their own object matrix is my goal. What's the best path to get to that goal? I'm talking non-textured polygons, directx handled lighting, and not much of anything else just yet. Setting a different world matrix per object (with it's own draw call) seems ridiculous of course, but manually editing the entire vertex buffer per frame or so sounds pretty far fetched as well. Any advice?
Advertisement
It sounds like instancing is what you're after. There is a sample in the SDK called Instancing and IIRC a section in the documentation about the various instancing options (some require newer hardware, others require more set up work).

Skinning could also be of use - with indexed triangle list primitives, there is nothing preventing you from having bones control blobs of mesh geometry that aren't connected.

Simon O'Connor | Technical Director (Newcastle) Lockwood Publishing | LinkedIn | Personal site

Quote:Original post by S1CA
It sounds like instancing is what you're after. There is a sample in the SDK called Instancing and IIRC a section in the documentation about the various instancing options (some require newer hardware, others require more set up work).

Until I get a firmer grasp on C#, I'm kinda in the dark ages with VB6 and DX8. In the VB SDK for DX8, I'm not having much luck finding anything regarding instancing.. Are you aware of something along those lines for DX8?

Quote:Original post by S1CASkinning could also be of use - with indexed triangle list primitives, there is nothing preventing you from having bones control blobs of mesh geometry that aren't connected.

Another area I barely know anything of. Any suggested reading material that might apply to DX8?
I do believe HW instancing came about during DX9. Shader instancing requires lots of spare shader constants, and I don't think you're going to get much of those either sub-SM2.0.

You can manually edit the vertex buffer every frame, you just have to be careful about how you do it. Normally locking a vertex buffer that you're going to draw from will stall the GPU, but if you double or triple-buffer it you can avoid stalls. I'm pretty sure that's going to be how you'll have to do it in DX8.
Quote:Original post by MJP
You can manually edit the vertex buffer every frame, you just have to be careful about how you do it. Normally locking a vertex buffer that you're going to draw from will stall the GPU, but if you double or triple-buffer it you can avoid stalls. I'm pretty sure that's going to be how you'll have to do it in DX8.


So, locking and multiplying every single vertex by it's relative matrix is going to be the optimum choice? (Outside of frustrum culling and other little tricks I assume, but I'd rather have steady fps than for it to slow down as you look at tons of moving objects)

Also, double/triple buffering in DX8 is simply:
D3DPP.SwapEffect = D3DSWAPEFFECT_DISCARDD3DPP.BackBufferCount = 3
???
Quote:Original post by Mythics
Also, double/triple buffering in DX8 is simply:
D3DPP.SwapEffect = D3DSWAPEFFECT_DISCARD
D3DPP.BackBufferCount = 3
Yes and no. That's for triple buffering your swap chain (which IMO isn't really worth it these days). What MJP was meaning, is using 2 or 3 vertex buffers in a round robin fasion, as so:
Init: Fill VB 1
Frame 1: Fill VB 2, render from VB 1
Frame 2: Fill VB 3, render from VB 2
Frame 3: Fill VB 1, render from VB 3
Frame 4: Fill VB 2, render from VB 1
Etc

That way, you're never locking a vertex buffer that D3D is using. You could probably get away with double buffering your VB here, but triple buffering might give you slightly better performance, at the host of 3 frames of lag.
D3D has a mechanism for doing this VB round robin for you.

Create your VB with D3DUSAGE_DYNAMIC.

When you lock your VB:

If you're starting from offset 0, lock with flag D3DLOCK_DISCARD. If the VB is in use, you'll magically get a hidden alternate VB. Ensure anything you wanted to render from the VB has been rendered before you lock this way.

If you're locking from another offset, to append to the buffer, lock with flag D3DLOCK_NOOVERWRITE. This is a promise that you won't touch any previous data, just append.


typical usage:

lock 0, discard
write
write more
write more
unlock
Draw from 0
lock n, nooverwrite
write something else
write
unlock
Draw from n
(want to append, but no room...)
lock 0, discard
write
...

If you're writing everything at once per frame you can just always use DISCARD. If you're doing many writes it's preferred to make a larger VB and use NOOVERWRITE more often.

If you're calculating quite a bit to fill in the VB data there are a few tips:

If you can, unlock and draw every few thousand triangles. This lets the GPU work while you're filling in more data.

AGP is slower than optimal unless you write sequentially in 64 byte blocks. Instead of trying to guarantee this everywhere that builds data, just make a local copy. Calculate to a system memory buffer first, lock, and memcpy your data at once (may be the data for just a few thousand triangles). This takes advantage of AGP fastwrites.
Quote:Original post by Evil SteveThat's for triple buffering your swap chain (which IMO isn't really worth it these days).

What's the cost of triple buffering the swap chain? For it to not be worth it, I'm assuming there is some kind of penalty..correct?

Quote:Original post by Evil Steve
That way, you're never locking a vertex buffer that D3D is using. You could probably get away with double buffering your VB here, but triple buffering might give you slightly better performance, at the host of 3 frames of lag.


Thank you for the detailed description. :)

On to getting something accomplished it would seem.

[Edited by - Mythics on October 13, 2007 1:23:44 PM]
Quote:Original post by Namethatnobodyelsetook

Many thanks for all the suggestions. With your help, I'm now at the point where my only issues are having so many matrix transformations per loop. Considering I'm pushing about 8k quads, multiplying each coordinate by it's respective matrix, that's a lot of processing time.

The draw calls mean nothing now, but I need to figure out if there are some cleaner methods of calculating exactly what I'm wanting to fill the VB with. Every frame, every object moves in some direction, and generally gets rotated. I'm the type that wants to get 60fps minimum, but for what I'm doing.. it seems extremely cpu dependent.

This topic is closed to new replies.

Advertisement