calculating gpu workload

Started by
3 comments, last by MJP 14 years ago
how do you actually find out the number of operations per second going on in the gpu? i have pretty much everything im doing squished into the pixel shaders and i was wondering how it compared to an fpga so i have been trying to find out how many operations per second were going on in the gpu. is it that for every single pixel on the screen that is within the rectangle runs the pixel shader as long as it is showing something? say if i were to zoom in on fullscreen so the entire screen was covered in a part of the image that is being rendered, then does every single pixel have its own location run with its own pixel shader? so if i were doing say 10 multiplies in the pixel shader and the image was being rendered at 200fps, and at full hd (1920x1080 pixels) then is my gpu doing 1920 X 1080 X 200 X 10 = 4,147,200,000 multiply operations per second? because it seems way too high for the framerates i was seeing with my shader.
Advertisement
Yes it runs the program for each pixel output.
And this could be more than once for each screen pixel if you have triangles behind other triangles or have antialising turned on.

Yes, it's a lot and it's why modern GPUs have so many stream processors so they can run many (possibly several hundred) copies of the program in parallel.
Random Google revealed:
My ATI 5770 has a peak output of 1.36 TFLOPS. that is 1.36 trillion floating point operations a second. The biggest performance hit is likely to be waiting on memory, as the card can manage ~.5 Million math operations per-pixel.

Things to note.

Each pixel may have several shaders run on it due to overfill and blending overlap.

Each vertex requires atleast a matrix multiply and a clipping operation, eating up a lot of math cycles. Especially consider something like GPU skinning, as all those matrix multiplies will add up.

Depending on what you are doing, you could be more limited on texture or vertex memory throughput than you are math operations.
i thought it was but i was just getting some unrealistic results when i calculated that way.

i am using a single pixel shader, and a single vertex shader and the vertex shader deals with 2 triangles to make up a single rectangle and then everything is processed in the pixel shader. it just doesnt seem to make sense though when i do that.

im using a single gtx275 with maybe .5-1 Tflop but i had it fullscreen on full hd running about 1-3 frames/second with >1200 samples/pixel (calls to tex.sample, not including the bilinear interpolation between 4 points etc) and >10,000 operations (*,+,-,/,%,pow,abs). which in just counting the number of symbols is over 20,000,000,000,000/s and thats leaving out all the background work needed for everything.

it could have been my frame counter which counts usb transfers instead of render calls, but for every transfer i sent it to the gpu and processed it, and i could see the image changing so its not like it was off by much if at all. would fraps or pix give me a real solid frame count maybe, i just dont see how the numbers could be so off?
If you run your shader through NVShaderPerf it will give you some relevant stats regarding how many cycles it takes to run your shader, and how many pixels it can shade in a second. For finding out the number of pixel shader invocations you can use D3D11_QUERY_PIPELINE_STATISTICS/D3D10_QUERY_PIPELINE_STATISTICS if you're using D3D10 or D3D11. Keep in mind that it won't be as simple as counting the number of pixels in your quad...you'll always have extra due to GPU's running pixel shaders in 2x2 quads. Early z-cull can also come into play if you're using z-testing.

PerfHUD can also give you a lot more detailed statistics about your performance.

This topic is closed to new replies.

Advertisement