Quote:Original post by stanlo
That's really interesting, do you know why this was done?
By using scalar processors instead of vector ones, they are able to get 100% utilization out of them and reach peak performance. With vec4 processors, a good number of instructions will effectively not fully utilize the ALUs (ex. most lighting calculations are vec3 or even scalar). Also since the fragments going through the GPU are already a massive parallel machine, it makes sense to parallelize only on that level, rather than on that *and* the instruction level. The G80's design really avoids wasted ALU cycles and needless code restructuring while maintaining peak performance on even fully (instruction level) scalar code.
Quote:Original post by stanlo
Was there some sort of development that sped up scalar processors to the point that vector operations on each processor type were the same speed?
Parallelism is certainly how speed is achieved on GPUs, but as I mentioned above, there's really no need for two levels of parallelism. With respect to the hardware design, I can't comment as I don't know the complexity of the two designs, but it seems that NVIDIA figures that 128 scalar processors can pull better numbers overall than the equivalent transistor budget of vec4 processors. We'll see what R600 does and how it compares.