Alright I ran your numbers, you've convinced me it isn't as big an issue as I thought it to be... but I'm hazy on one figure of yours.I don't know why you insist that much on bandwidth.
edit - also didn't you forget to take into account the Hi-Z buffer bandwidth for per triangle depth culling?
Yes I did. I don't know the exact memory footprint, but 33.33% overhead (like in mipmapping) sounds like a reasonable estimate.
How did you get the 309MB per frame figure? When I did it I'm getting completely different numbers.
edit - specifically the 305MB number.
Thanks for pointing it out.
1.000.000 * 32 bytes = 30.51MB... dammit I added a 0 and considered 10 million vertices.
The 305MB came from 10 million vertices, not 1 million.
Well... crap.
For 10 million vertices it's 35MB of index data, not 3.5MB. But for 1 million vertices, it's 30.51 MB, not 305.5MB
It only makes it easier to prove. Like I said, at 1920x1080 there shouldn't be much more than 2 million vertices (since there would be one vertex per pixel). Maybe 3 million? Profiling would be needed
So if you provide a massive amount of input vertices (such as 10 million vertices), the culler will end up discarding a lot of vertices.