A question about how DrawIndexedPrimitive works.

Started by
9 comments, last by Adam_42 13 years, 1 month ago
Hi,

I became confused about how the Direct3D function DrawIndexedPrimitive works, after reading an article recently. For example; lets say we have a vertex buffer of size 1000 and we want to draw only a single triangle from this buffer, say, using the vertices 50, 600 and 900. So the index buffer will hold only 3 values (50,600,900). I thought that in such a case, the hardware will read 50. 600. and 900. vertices from the vertex buffer and then transform them. But according to what I read, it does not do that; instead it reads and transforms all vertices between 50 and 900, meaning processing 851 vertices for a single triangle! Does it work really in this way or could I have read something incorrect? Such a thing must clearly be documented; this would be a great performance hit if not known by the developer and as usually I could not find a single hint about this issue in the Direct3D documentation... 



Thanks in advance.



Advertisement
It's probably pretty rare to load up a vertex buffer full of triangles and then only draw one from it. That's pathological behavior, and it wouldn't surprise me one bit if that ended up being inefficient from a performance perspective.
Mike Popoloski | Journal | SlimDX
Of course it is not a feasible thing to do; I just want to learn how the function behaves. And it is not very rare to use DrawIndexedPrimitive only to render a fairly large subset of a bigger vertex buffer.
Just out of curiosity, what's the reference to the article? I can certainly understand if caching or transferring of the vertex buffer might be involved, e.g., from memory to AGP.

Please don't PM me with questions. Post them in the forums for everyone's benefit, and I can embarrass myself publicly.

You don't forget how to play when you grow old; you grow old when you forget how to play.

It is a gamedev article actually.

Here

This is the part which made me ask this:

"Lets say we have a vertex buffer (VB) that has 500 elements, and we want to draw two triangles. The first has indices 3, 100 and 456 and the second has indices 10, 300 and 375.

If we were to draw the triangles with two separate calls to DIP the first call would transform the 4th vertex all the way up to the 457th vertex. The second call would transform the 11th vertex up to the 376th vertex. The second 365 transformations are thus performed twice."
Thanks for the reference. If that Mark Reilly is Hewlett-Packard's Mark Reilly, I kind of assume he had/has some decent knowledge of the subject. However, note that the article was written in 2002 (possibly based on hardware/software of even an earlier generation than that), nearly a decade ago. Also, he seems to be talking exclusively about triangle strips ( not a triangle list )*. The methods he suggested seem rather crude by today's standards but, perhaps, back then, would've made a performance difference. It may still be true today, I suppose, but I have to believe (if I understand the article) performance ugliness like that would've been optimized out long ago.

*It's way too late at night to try to figure out if that has any bearing whatsoever on the problem he discusses. :blink:

Also, as Mike Prop mentioned above, one would almost have to search for instances where the optimization that Reilly talked about would be applicable. I rarely (I almost think I can say never) use strips, so I certainly don't have any first-hand evidence to add.

Please don't PM me with questions. Post them in the forums for everyone's benefit, and I can embarrass myself publicly.

You don't forget how to play when you grow old; you grow old when you forget how to play.

DrawIndexedPrimitive has the parameters MinIndex and NumVertices which are used, according to the DirectX documentation, for optimizing memory access. In my first example, MinIndex would be 50 and NumVertices would be 851. I don't know what optimization the hardware does by using that information but if it really transforms all the 851 vertices between 5 and 900 then such a behavior must be clearly documented to begin with, since it means a huge performance hit! What is the point of using a small index buffer for drawing from a larger vertex buffer then; if the system would transform all vertices in the range [MinIndex,MaxIndex] ? This makes still no sense to me. Clearly,a clarification is needed on that issue.
It can't be documented because it is an implementation detail.
Maybe one hardware vendor really transforms all vertices in between, maybe with another driver version this behavior is no longer happening.
What the docs tell you, however, is how to arrange your vertices/indices for best performance: as closely together as possible.
Ok, then I have another question here.

I have a big terrain, which I divide into smaller chunks by using QuadTree algorithm. The leaf nodes are assigned the final geometry inside of them. For the time being, my approach is holding the whole terrain geometry data in a big vertex buffer. There is a single, shared index buffer and every leaf node uses it by giving the appropriate offset parameters to the DrawIndexedPrimitive function. But after reading the article, I wonder whether dividing the vertex buffer into smaller chunks for every leaf node and assigning them those geometry chunks directly would be a better idea. By doing that, the vertices for every leaf node will be packed completely together. In my current approach, the minIndex-maxIndex range covers some unused vertices for every leaf node.
The vertex range is primarily a vestigal remnant from pre T&L hardware. There are a few cases it comes up

  • Running on hardware lacking hardware transform. This ends up being various integrated graphics chipsets most of the time, especially ones from Intel.
  • Running on hardware with a shader model newer than the hardware (say wanting to run vertex shader 3.0 on vs 1.1 hardware etc)

Most calls to draw can just set the range on the whole buffer, but multiple calls on the same buffer can be optimized by setting good ranges, or by manually transforming the geometry into a secondary buffer with ProcessVertices() first.
http://www.gearboxsoftware.com/

This topic is closed to new replies.

Advertisement