Vertex/pixel shader performance

Started by
5 comments, last by ET3D 16 years, 9 months ago
Hi, I've been wondering why certain vertex/pixel shader operations take much more time then the others; say, pow operation may be much slower then a texture look-up. So I feel I need to learn more about the way on-board vertex/pixel processors work. So far everything I read focused on what shader instructions do, not how they are processed. Is there any decent online article or a book you can recommend? Thank you
Advertisement
Quote:Original post by tokaplan
Hi,

I've been wondering why certain vertex/pixel shader operations take much more time then the others; say, pow operation may be much slower then a texture look-up. So I feel I need to learn more about the way on-board vertex/pixel processors work. So far everything I read focused on what shader instructions do, not how they are processed. Is there any decent online article or a book you can recommend?

In my opinion, you are quite unlikely to find information about this topic other than small hints (stuff like "Use feature A instead of B, it's 20% faster").

The reason is pretty much twofold:
1) Both companies regard this as sensitive technology information. While nothing would likely happen to either company if this info got out, anyone who knows this kind of information is likely under NDA.
2) This is very, very, very specific. It might go as far as the card you're using, the driver installed on the system, the format of the texture you're using, and the instruction ahead of it in the shader. It's literally impossible to get accurate enough information.

I would suggest you focus your energy in the direction of testing and profiling. If you want to find out which operation is faster than the other, profile it. Set up a system where you can easily test small bits of VS/PS code for efficiency (by avoiding other bottlenecks) and simply test anything you're unsure of. This will likely provide you with the best set of information you'll get without signing an NDA about it.

Lastly, be sure you understand how the API works, and how to profile it correctly. Some calls might seem to take longer than others while in reality that isn't the case. Be sure you understand when the CPU and GPU sync, and who's waiting for who when. Get the tools you need to help you identify possible issues (NVPerfHUD if you have NV hardware) and start inspecting ASM output of compiling HLSL snippets. With all this information, you're sure to learn a great deal.

Of course, if you have more specific questions, feel free to raise them for debate, and hear what others think.

Hope this helps.
Sirob Yes.» - status: Work-O-Rama.
Quote:Original post by tokaplan
I've been wondering why certain vertex/pixel shader operations take much more time then the others; say, pow operation may be much slower then a texture look-up.
Essentially because of architectural decisions. Since most shaders up to now are texture-fetch based and exploit little math, there's no point in having really strong ALUs.
Keep in mind this is very architecture-specific. The normalize instruction for example was slower than a texture lookup on FX but it's faster right on NV4x and later (and practically for free on half vectors). There will be a day 100% procedural shaders will both look better and work faster.
In the case of POW or SQRT for that matter there could be some iterative processes going on.
Quote:Original post by tokaplan
So I feel I need to learn more about the way on-board vertex/pixel processors work.
You shouldn't!

Previously "Krohm"

Understanding the GPU architecture can be bloody difficult, but it is very interesting and useful.

As Sirob pointed out, you can't get solid details from the IHV's but many sites (such as the excellent Beyond 3D) carry detailed articles focussing on the hardware - R600 and G80 being good examples.

With the move to USC's the actual shader units tend to be very simple - previously just MADD's and VEC3 units, but now just generalised FP32 MADD's and so on. Complex instructions like pow() must be broken down into these basic atomic elements before the GPU can actually execute them - it's a simple case that they require MANY cycles compared with other operations.

Look up a good computer science theory book for details on this side of things - it's quite a complex topic on computation theory, machine language, compilers etc..etc..

hth
Jack

<hr align="left" width="25%" />
Jack Hoxley <small>[</small><small> Forum FAQ | Revised FAQ | MVP Profile | Developer Journal ]</small>

Thanks a lot, this article is exactly what I need - http://www.beyond3d.com/content/reviews/16/12
I have pretty good idea of what DSP's look like inside, but I've never seen any interesting information on GPU's.
Quite the contrary the what's been said, it's often possible to get very good information off the hardware vendor sites. Haven't tried NVIDIA's lately, but ATI not only has pretty detailed articles about all kinds of performance issues, but also a tool that can tell you how much time your shaders will take on various ATI architectures.

IHV's want developers to write the fastest code for their hardware. I'd strongly suggest going to the IHV's sites and looking for such articles / tools, and keeping up to date with them, if you're truly interested in low level optimisations for specific cards.
Another source of info on the IHV's sites is the GPGPU languages / architectures (CTM, CUDA). The documents for those can give you an idea of what the chips are capable of.

This topic is closed to new replies.

Advertisement