"Big O" notation for (pixel) shader algorithms?

Started by
2 comments, last by Demirug 18 years ago
Bare with me on this one... it's not a fully formed idea, so I don't know if it'd work - which is the primary reason for posting about it here [wink] My current project does a lot of post-processing, and it's quite blatantly pushing my GPU to the limits of it's pixel-processing bandwidth. I was wondering about trying to come up with some statistics regarding the complexity of my algorithms. It's probably not quite the same as the Big O used elsewhere, but it's the closest analogy I could think of. So the basic idea is that you could take an ASM pixel shader, extract the number of ALU and TEX ops in the simplest case. Further analysis could be the cost of these operations, or break them down to dependent reads etc... In the context of texture reading, the application knows the filtering modes and texture formats (e.g. how much raw data need be fetched per pixel) and so on.. In the case of (most) post-processing the number of pixels processed is an easy known constant, but not so much so with arbitrary scene geometry. For that you could fairly easily decompose it into a seperate debug/profiling code-path that splits the scene per-shader and does an accumulation count that can be read back by the CPU. Probably not real-time, but not too important... Thus you could, as I see it, end up with some sort of general metrics on read/write texture bandwidth per-shader, or arithmetic throughput. With some sort of automated/scripted execution of the target application you could probably work out where the "hotspots" were. Known hotspots could allow for optimizations or simply knowledge on which hardware can handle which level of effects (e.g. average bandwidth is greater than a known GPU can deliver)... Bottom line:
  1. Useful?
  2. Meaningful?
  3. Already been done? (I've not seen this level of info in PIX)
Example: My tone-mapping pixel shader boils down to 6 Texture and 45 arithmetic instructions. 5 of those texture reads are 128bit FP, 1 is 64bit FP which pulls in 88 bytes per pixel. For my 640x480 rendering that should generate 25.8mb of texture reads per frame. It'll write out 640x480x32bit = 1.18mb to the frame buffer...

<hr align="left" width="25%" />
Jack Hoxley <small>[</small><small> Forum FAQ | Revised FAQ | MVP Profile | Developer Journal ]</small>

Advertisement
Quote:Already been done? (I've not seen this level of info in PIX)


ATI and NVIDIA both have PIX plugins. ATI is also supposed to have a new tool with lots of info that they presented at GDC (I attended part of that presentation). So you should be able to get a good idea of bottlenecks with current and upcoming tools.

However a general measurement isn't that bad. What we need, though, is some way to correlate this measurement with performance on various cards. The performance profiles of the cards would be more important than the exact measurement. There are quite a few variables for this, too. For example, dynamic branching is relatively efficient on ATI x1x00 cards, but works best when all pixels in a batch take the same branch. That's something that's hard to judge using basic counting.
Thanks for the comments!

Quote:Original post by ET3D
Quote:Already been done? (I've not seen this level of info in PIX)


ATI and NVIDIA both have PIX plugins. ATI is also supposed to have a new tool with lots of info that they presented at GDC (I attended part of that presentation). So you should be able to get a good idea of bottlenecks with current and upcoming tools.
Yup, the PIX plugins (and similar) are very good at getting "real world" data and so on... but I was thinking a step back from that (I think!). A way of taking an arbitrary shader and being able to reason about what sort of performance characteristics it'll require...

Quote:Original post by ET3D
However a general measurement isn't that bad. What we need, though, is some way to correlate this measurement with performance on various cards. The performance profiles of the cards would be more important than the exact measurement.
Agreed - but I suppose this is the big problem against my idea... getting hold of that information isn't easy unless you have a lot of time/resources available. The fact that a new GPU with new characteristics pops up every 3-6 months doesn't help either!

Cheers,
Jack

<hr align="left" width="25%" />
Jack Hoxley <small>[</small><small> Forum FAQ | Revised FAQ | MVP Profile | Developer Journal ]</small>

I have played around with flop calculation for shader and shaderbenchs based on shaders from current games. As ET3D already says there is no 1:1 correlation between this theoretical numbers and the real performances. But I want to improve the system a little bit in that way that it can tell me how much “power” requirements in the different pipeline stages every single draw call would have.

This topic is closed to new replies.

Advertisement