VertexCache question:

Started by
9 comments, last by Krohm 16 years, 11 months ago
Ok, I am considering makeing my own generic priming the vertex cacher (i.e. reorder triangle's to get the most out of a vertex cache of a GPU), but now i am wondering how the caching is decided when a cache hit occurs: is it really simple and dumb FIFO? i.e. regardless of cache hit the front vertex of the cache is dropped from the cache and the next vertex is placed at the back, this method might have that the same vertex is stored several times in the cache. or is it something brighter: if a cache hit occurs, that vertex is placed at the back of the cache and the verices that were newer than it are all pushed up. (so the front vertex gets dropped unless that was the vertex that caused the hit). I imagine that it is the second, but having no docs to really reference, does anyone know?
Close this Gamedev account, I have outgrown Gamedev.
Advertisement
From my understanding of it the post-T&L cache is an associative array; each group of vertex data which is stored has an id number assocated with it (thus why you need to use index'd data to get the most from it).

When each vertex arrives to be transformed the id number is compared with the ids in the cache; if there is a match then the cached data is used, otherwise the data is transformed and the oldest data in the cache is over written.

Basically, to aid caching you need to index/order the data in such a way that you can get maximal reusage of the data in a small number of vertex submissions.
I figured it would be associative, what I am asking is this, how does it decide what to dump from the cache, so for example, lets say the cache size is 6 thingies (lets call them A[0], A[1], .. A[6])... lets label what is inside them associatively, i.e. what vertex index is in the cache. so lets say the cahce is storing:

A[0]=a, A[1]=b, A[2]=c, A[3]=d, A[4]=e, A[5]=f, i.e. A stores the post TnL value for the vertex at VertexArray[ A ]

as the GPU streams along the glDrawWhatever(GL_TRIANGLES, stuff)... the next triangle uses vertices (i.e. vertex indexes), c,d and g

what is the contents of the cache after the triangle is processes? is the cache _very_ dumb and simple FIFO so the results would be (start of cache gets killed)

-->after c gets entered: (notice that c appears twice in the cache now)
A[0]=b, A[1]=c, A[2]=d, A[3]=e, A[4]=f, A[5]=c

-->after d gets entered: (notice both c and d are in the cache twice each)
A[0]=c, A[1]=d, A[2]=e, A[3]=f, A[4]=c, A[5]=d

-->after g gets entered:
A[0]=d, A[1]=e, A[2]=f, A[3]=c, A[4]=d, A[5]=g


or is the cache more intelligent and does something like this:

-->after c gets entered, c was already in the cache, so cache puts c as most recently used:
A[0]=a, A[1]=b, A[2]=d, A[3]=e, A[4]=f, A[5]=c

-->d is also in cache, so d gets put into most recently used slot:
A[0]=a, A[1]=b, A[2]=e, A[3]=f, A[4]=c, A[5]=d

-->g not in cache, so cache drops of oldest element, the fellow stored in A[0]=a
A[0]=b, A[1]=e, A[2]=f, A[3]=c, A[4]=d, A[5]=g


or is it something else?

Close this Gamedev account, I have outgrown Gamedev.
You could always check the source code of the NVidia tri striper. It should tell you how the cache works.
Plane9 - Home of the free scene based music visualizer and screensaver
I believe (I'm not 100% sure, but I've read this several places including some threads here) the cache in modern cards is a simple FIFO cache, but it's not so dumb to include duplicates. So using your example when it comes across vertices c and d it will find them in the cache and use that, then leave the cache untouched. Then when it comes to vertex g it doesn't find it in the cache so it drops vertex a and adds vertex g. So the cache will be [b, c, d, e, f, g] after this new triangle.
All vertex-cache optimization approaches I've seen (which work, so apparently they got it right) assume an LRU replacement scheme. Thus if your cache has size 3 and you access vertices 1,2,1,3,4,5 then vertex 2 will be discarded to make room for vertex 4 and vertex 1 will be discarded to make room for vertex 5. This is pretty easy to implement in hardware so there's no particular reason to use something stupider.

EDIT: Actually, now that I think about it, I recall Nvidia cards in particular using a FIFO instead of an LRU. That was several years ago, though, so I'm not sure if it's still true.
Yes, sorry, when I said 'oldest data' I meant 'least recently used'
so, if the cache is LRU, then it is functionally the same as the second (i.e. not the uber dumb FIFO one) but if it is FIFO but don't allow duplicates, this will significantly complicate my vertex priming algorithm (which is not even written yet)...

as for checking the source for NvTriStripper... (shudders) it is probably a good place to look, but it is from the age of GeForce3, when nVidia *I think* was saying the cache was FIFO... but now adays being up to GeForce8 (and I am targetting GeForce6 class) I don't know.... also NvTriStripper is tries to make triangle strips and prime the cache, not quite the same as just priming the cache...
Close this Gamedev account, I have outgrown Gamedev.
If this seems hard to code, consider not optimizing at all. The vertex processing stage is seldom a bottleneck.

Previously "Krohm"

I, beg to differ to atleast some extent... optimizing vertex cache usage can give a significant performance boost, especially with terrains and models with skinning... though right now I have been using NvTriStripper and another TriStripper out there, the second one wants to make triangle strips only... so.. ehh...
Close this Gamedev account, I have outgrown Gamedev.

This topic is closed to new replies.

Advertisement