Sign in to follow this  

A question about how DrawIndexedPrimitive works.

This topic is 2490 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

Hi,

I became confused about how the Direct3D function DrawIndexedPrimitive works, after reading an article recently. For example; lets say we have a vertex buffer of size 1000 and we want to draw only a single triangle from this buffer, say, using the vertices 50, 600 and 900. So the index buffer will hold only 3 values (50,600,900). I thought that in such a case, the hardware will read 50. 600. and 900. vertices from the vertex buffer and then transform them. But according to what I read, it does not do that; instead it reads and transforms all vertices between 50 and 900, meaning processing 851 vertices for a single triangle! Does it work really in this way or could I have read something incorrect? Such a thing must clearly be documented; this would be a great performance hit if not known by the developer and as usually I could not find a single hint about this issue in the Direct3D documentation... 



Thanks in advance.



Share this post


Link to post
Share on other sites
Of course it is not a feasible thing to do; I just want to learn how the function behaves. And it is not very rare to use DrawIndexedPrimitive only to render a fairly large subset of a bigger vertex buffer.

Share this post


Link to post
Share on other sites
Just out of curiosity, what's the reference to the article? I can certainly understand if [i]caching[/i] or transferring of the vertex buffer might be involved, e.g., from memory to AGP.

Share this post


Link to post
Share on other sites
It is a gamedev article actually.

[url="http://www.gamedev.net/page/resources/_/reference/programming/sweet-snippets/concatenating-triangle-strips-r1871"]Here[/url]

This is the part which made me ask this:

"Lets say we have a vertex buffer (VB) that has 500 elements, and we want to draw two triangles. The first has indices 3, 100 and 456 and the second has indices 10, 300 and 375.

If we were to draw the triangles with two separate calls to DIP the first call would transform the 4th vertex all the way up to the 457th vertex. The second call would transform the 11th vertex up to the 376th vertex. The second 365 transformations are thus performed twice."

Share this post


Link to post
Share on other sites
Thanks for the reference. If that Mark Reilly is Hewlett-Packard's Mark Reilly, I kind of assume he had/has some decent knowledge of the subject. However, note that the article was written in 2002 (possibly based on hardware/software of even an earlier generation than that), nearly a decade ago. Also, he seems to be talking exclusively about triangle [b]strips[/b] ( not a triangle [b]list[/b] )*. The methods he suggested seem rather crude by today's standards but, perhaps, back then, would've made a performance difference. It may still be true today, I suppose, but I have to believe (if I understand the article) performance ugliness like that would've been optimized out long ago.

*It's way too late at night to try to figure out if that has any bearing whatsoever on the problem he discusses. :blink:

Also, as Mike Prop mentioned above, one would almost have to [i]search[/i] for instances where the optimization that Reilly talked about would be applicable. I rarely (I almost think I can say never) use strips, so I certainly don't have any first-hand evidence to add.

Share this post


Link to post
Share on other sites
DrawIndexedPrimitive has the parameters MinIndex and NumVertices which are used, according to the DirectX documentation, for optimizing memory access. In my first example, MinIndex would be 50 and NumVertices would be 851. I don't know what optimization the hardware does by using that information but if it really transforms all the 851 vertices between 5 and 900 then such a behavior must be clearly documented to begin with, since it means a huge performance hit! What is the point of using a small index buffer for drawing from a larger vertex buffer then; if the system would transform [i]all[/i] vertices in the range [MinIndex,MaxIndex] ? This makes still no sense to me. Clearly,a clarification is needed on that issue.

Share this post


Link to post
Share on other sites
It can't be documented because it is an implementation detail.
Maybe one hardware vendor really transforms all vertices in between, maybe with another driver version this behavior is no longer happening.
What the docs tell you, however, is how to arrange your vertices/indices for best performance: as closely together as possible.

Share this post


Link to post
Share on other sites
Ok, then I have another question here.

I have a big terrain, which I divide into smaller chunks by using QuadTree algorithm. The leaf nodes are assigned the final geometry inside of them. For the time being, my approach is holding the whole terrain geometry data in a big vertex buffer. There is a single, shared index buffer and every leaf node uses it by giving the appropriate offset parameters to the DrawIndexedPrimitive function. But after reading the article, I wonder whether dividing the vertex buffer into smaller chunks for every leaf node and assigning them those geometry chunks directly would be a better idea. By doing that, the vertices for every leaf node will be packed completely together. In my current approach, the minIndex-maxIndex range covers some unused vertices for every leaf node.

Share this post


Link to post
Share on other sites
The vertex range is primarily a vestigal remnant from pre T&L hardware. There are a few cases it comes up

[list][*]Running on hardware lacking hardware transform. This ends up being various integrated graphics chipsets most of the time, especially ones from Intel.[*]Running on hardware with a shader model newer than the hardware (say wanting to run vertex shader 3.0 on vs 1.1 hardware etc)[/list]
Most calls to draw can just set the range on the whole buffer, but multiple calls on the same buffer can be optimized by setting good ranges, or by manually transforming the geometry into a secondary buffer with ProcessVertices() first.

Share this post


Link to post
Share on other sites
If you want to know how to optimize your mesh there's some guidance at http://tomsdxfaq.blogspot.com/ which might be helpful.

In general modern hardware wants two things:

1. The indices should reference recently used vertices more often that new ones (it caches the results of the last few vertex shaders so if it hits the cache you save work). There are various algorithms you can use to optimize this.

2. The vertex data is read roughly linearly through memory. It's simple to rearrange the vertices based on the index buffer and renumber the index buffer to do this.

The first one is generally more important for good performance.

Share this post


Link to post
Share on other sites

This topic is 2490 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this