Sign in to follow this  
Zorbas-E-

Vertex Cache priming using VBOs

Recommended Posts

Hi all,

first of all I'd like to thank you in advance for your time answering this question of mine.

I've read a lot about cache priming when drawing normal terrain meshes. Everybody is talking about vertex cache entries when it comes to size. E.g geforce6 has a vertex cache of 32 entries. But what about the actual size in bytes?

The things that are proposed take for granted that a single vertex data size corresponds to a vertex cache entry size. For example if we send to the pipeline the following vertex attributes:
vertex coords, normal coords, texture coords thats is 8 floats or 32 bytes. If we have a vertex cache of 12 entries then in order to prime it we have to send 10 initial vertices ASSUMING a 32byte cache entry size.

What if each entry in the cache's FIFO cannot hold a whole vertex (that means not all of its attributes can fit in one cache entry). How do we correlate the size of vertex attributes that we send down the pipeline with the actual size of the vertex cache? Is that rule of 32byte multiples related to that? (padding to multiples of 32bytes for vertex attributes)

Thanks for your reply!

Share this post


Link to post
Share on other sites
To sum up and maybe clarify a little more what I mean:
Lets assume that:
my vertex attributes are 44bytes, your vertex attibutes are 32bytes. How many of my vertices and how many of your vertices a vertex cache of 16 entries hold?

Share this post


Link to post
Share on other sites
On the three or four previous generations of hardware, there was a "slot" like cache like the one you describe. If it has 16 slots (ATI cards) then it holds 16 vertices. If it has 24 slots (nVidia cards), it holds 24 vertices, as easy as that.
It does not matter how large one vertex is in bytes or in attributes, the post transform cache has enough "room" in each slot for MAX_ATTRIBUTES.

On current generation hardware (ATI and nVidia likewise) no such thing exists. Post transform cache is implemented via local memory in a shader unit which is (fast) general purpose memory with a general purpose cache. You don't know how large that local memory is (usually 64kB, but no guarantees), and you don't know how large the cache is. You don't know whether a particular card/driver implements a post transform cache at all, either (although they most likely will, in some form).

Share this post


Link to post
Share on other sites
Samoth, thank you for the brief and accurate reply.

If I understand correctly current geforce9 architectures do not have a special TnL vertex cache, but a general purpose large cache. Therefore a question arises:
Are cache priming techniques meaningfull in such a case? What would a good cache priming technique look like in your opinion? For example if I feed my gforce9500 gpu with 30vertices of degenerated triangles for priming would it have any serious impact on performance?

thanks

Share this post


Link to post
Share on other sites
Quote:
If I understand correctly current geforce9 architectures do not have a special TnL vertex cache

GeForce 8/9 certainly still have a dedicated post-transform cache.

I am not entirely sure about the 2xx series, they're somewhat more modern and more general purpose, but not in the sense as the later models. According to some sources they not only have a dedicated post-transform cache, but also an "increased cache size" (without any details as to how much, though).
Those generations of hardware are mostly specialized GPUs which are first and foremost designed to read textures, transform vertices, and write pixels, but can also be programmed to do "somewhat general purpose stuff" within some limits.

The 4xx series and later series (and the corresponding ATI generations) do not have a dedicated post-transform cache any more, and actually are not so much what you would call GPUs any more.

Those are rather something like massively-parallel-auto-thread-general-purpose-CPUs with dedicated local per-block memory, fast local-main interconnects, and some other special bits here and there. But in principle they are not so much GPUs as parallel CPUs. Which, incidentially, among many other things, can be programmed to implement a graphics pipeline.

Quote:
Are cache priming techniques meaningfull in such a case?
I've asked almost the same question a month or so ago, without a good conclusion.

It is probably still worth to take caching into account (caching is "always good"), but who knows, who can tell.
If caching is only available via some general purpose cache, then your transformed vertices would compete over cache with many other (high bandwidth) resources, such as for example texture reads or pre-transform vertex data. On the other hand, much more memory may be available for caching.
Which might mean that in principle, you don't know anything. Good, bad, whatever, no idea. There seems to be no single piece of decent documentation either.

I would take a little care that vertices are somewhat in order so there's a chance of them being reused, and when it's easy and opportune, such as in a regular grid, I would so some priming, if for no other reason then to account for older cards. It won't be much hassle and won't cost much in any case.
But other than that, I would not do too much, since you really don't know what it's good for in the end.

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this