• Advertisement
Sign in to follow this  

Is there any way to determine vertex cache size?

This topic is 4305 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

Greetings! I was wondering if it is possible to determine vertex cache size? I do not believe D3DXMeshOptimize does consider the real vertex cache instead I think it takes a value of 16 - not 100% sure about that. But anyway, for NVidia GeForce 6800 I believe the cache size is about 24 vertices or more. Is there any way to query the values?

Share this post


Link to post
Share on other sites
Advertisement
I think the canonical values are 16 for GeForce 2 and 4 MX, 24 for higher. The problem is that these values almost certainly depend on your vertex format, and the cache is probably sized in bytes rather than vertices. There's no direct way to query the size; iirc you can find out indirectly through some clever tricks, but I don't know how offhand.

Share this post


Link to post
Share on other sites
Just one note, if you use vertex cache optimized tristrips the cache size that is required to cause a considerable speed increase grows exponential so cache size of 24-32 entries won t change that much anymore in the near future

Share this post


Link to post
Share on other sites
Not that it's particularly useful right now, but for trivia...

In D3D10 there's ID3D10Device::CheckVertexCache() which in theory returns the information you're after. But I've only been able to use it on the RefRast which apparently doesn't have a vertex cache...

I suppose it's about the same level of information that Muhammad posted, but strikes me as being a bit easier to get hold of [smile]

Cheers,
Jack

Share this post


Link to post
Share on other sites
Are you unable to run a little benchmark with each differently optimized mesh to find the fastest variation?
A vertex size query can tell a lie, and even if you know the true value you need to prove that your optimizations are actually good.

Share this post


Link to post
Share on other sites
Note that Radeons have smaller vertex caches than NVIDIA cards; on the order of 12 entries IICR.

Quote:
if you use vertex cache optimized tristrips


That doesn't even make sense, unless you're re-starting your tri-strips A LOT, which generates tons of degenerate triangles.

A well sorted triangle list will outperform your typical triangle strip on modern cards, because a triangle strip will require 1.0 transformed vertices per triangle generated, whereas a well-cached triangle list (on a regular mesh) can get close to 0.6 transformed vertices per triangle generated. Strips are not worth the trouble on modern PC hardware, and may actually hurt performance.

Share this post


Link to post
Share on other sites
Quote:

That doesn't even make sense, ...


Can a tri in a strip end up reusing old verts in the strip which would be in the cache if used soon enough?

Share this post


Link to post
Share on other sites
There aren t many ways to optimize TriStrip without restarting them.
The degenerated triangles cost you nothing, since their vertices already been transformed and stored in the cache.

Have a look at this paper:
http://research.microsoft.com/~hoppe/tvc.pdf

With vertex cache optimized strips you get a missrate of ~0.5 vertices/triangle

In other words you only transform 0.5 vertices in the average case.

Theres also a powerpoint presentation from Hoppe at the ms research page that describes the individual cache strategies and their influence on tri strips
e.g.:LRU,FIFO

Share this post


Link to post
Share on other sites
Quote:
Original post by hplus0603

A well sorted triangle list will outperform your typical triangle strip on modern cards, because a triangle strip will require 1.0 transformed vertices per triangle generated, whereas a well-cached triangle list (on a regular mesh) can get close to 0.6 transformed vertices per triangle generated. Strips are not worth the trouble on modern PC hardware, and may actually hurt performance.



This is definitely true. Go play with the "Optimized Mesh" example in the DirectX SDK (jack up the mesh count to 36 and make sure you have DirectX release turned on).

On NVidia cards, the vertex cache optimized indexed triangle list outperforms the vertex optimized single strip. ATI cards seem to be about the same (i.e. the performance of the v-cache optimized list is about the same as the v-cache optimized single strip).

Share this post


Link to post
Share on other sites
I have to disagree with Tom's DirectX FAQ at one point:
Quote:
On the face of it, strips should be better than lists, right? Well, not really - if you need to keep making degenerate tris to join "striplets" together, then you'll generate more indices than lists (which don't need degenerates).


This only matters if you have a lot of isolated triangles so the number of degenerated tris needed to add them to a strip is pretty high, however there are very robust algorithms that solve the matter of isolated triangles very well and those algorithms can be used with Hoppes transparent vertex caching optimization where you restart strips every now and then.

He says the number of indices is the only thing that matters, ok
to add a isolated triangle to a strip-end I need 2 duplicate indices, but at the average case I only need 1 index for each triangle following another one in a strip

whereas you need 3 indices per triangle in a indexed list.


To restart a strip you also need 2 indices, so I don t understand how people get to the conclusion that indexed cache optimized triangle lists perform equally to indexed cache optimized strips.

If someone could explain me this circumstance, I would really appreciate it

thx in advance

Share this post


Link to post
Share on other sites
Sign in to follow this  

  • Advertisement