I need help proving this

Started by
10 comments, last by Eli Gottlieb 20 years, 2 months ago
You''d be amazed, but 90% of games out there (give or take) are still CPU limited. That means that the CPU is working harder than the graphics board.

However, if you set up your render loop correctly, the CPU can be doing stuff while the pixels are drawing. Something like:


ClearBuffer
RenderAllTriangles

DO CALCULATION

FlipBuffers



What this does is sets up the render (by passing all the vertex data to the card) then while the pixels are all rendering, it does all the calculations (which need to be done anyway). Then, it flips the buffer to the screen.

FlipBuffers (or whatever it is that your API uses) has to block until the options are done. When you call DrawPrimitive (or whatever), that queues up the data to the card, but calling Flip forces the card to finish all rendering before it proceeds. Since the fillrate has been declared as the problem, doing most of your CPU calculation (sound, AI, physics, etc) after all the geometry is queued up is the best place to put it.

On that note, if you use Hardware T&L (vs. do-it-yourself transforms), that gives you more time for the "DO CALCULATION" part of the game, meaning you can add more physics detail, better AI, more complex sound processing, whatever. Because if you''re not using hardware for the T&L, your "RenderAllTriangles" code takes longer on the CPU, because it has to do all those transformations.

Point is, the more you can offload from the CPU to other devices, the more stuff you can do using the CPU that can''t be done with other things.

Finally, as the Vertex processor and pixel processor are generally two seperate chips on the graphics board, yes they can work in parallel.

Another interesting thing to note (that I got from an ATI optimization paper), is that you can determine which part of the program is the bottleneck, and if you can''t decrease the amount of time that the bottleneck is active, then you can add more to the others (as long as they''re parallel). For instance, if your application really IS fillrate limited, you can generally add more vertices to each object at no cost (up to a point). This is untrue when:

1. The number of vertices increases significantly enough to become a bottleneck (this is rare - on the GPU, vertex processing is, I believe, the least common problem)

2. The number of DrawPrimitive calls increases - each time you call DrawPrimitive, that''s CPU intensive. Thus, the less DrawPrimitive calls, the better. And if you increase the number of calls to add polygons (vertices), then you''ve increased the amount of CPU usage, which DOES become a problem very quickly.


Hope that clears stuff up, and I hope I typed that all correctly. It''s really very late
Advertisement
One thing people are failing to mention is that the GPU is highly pipelined. It is set up to process the vertices of so many triangles while previous triangles are being filled to the screen. The GPU doesn''t simply stop processing vertices to do some pixel filling and vice versa. It does both sequentially like production running down a conveyer belt. What this means is that certain parts of the pipeline can get bored waiting for bottlenecks to clear upstream. I can''t imagine not using T&L if it''s there, and I wouldn''t offload some vertex processing to the CPU ever. I''d look for why I''m processing more vertices than the fragment shader can keep up with. I''d look to simplify meshes or cull more geometry. If I''m more fill bound, I might look to increase the resolution of meshes as the vertex processor is begging for more work. Either that, or I''d look to fill less pixels by reducing overdraw where possible or using simpler pixel shaders.

This topic is closed to new replies.

Advertisement