I don't really get the use of normal-mapping. Graphics cards have triangle output ability, fillrate, and math power.

Normal-mapping uses fillrate in the form of an extra texture lookup (right?), and even more math power, to save triangle output.

The problem is, math power and fillrate are much more precious on a graphics card than triangle output. You'll hit limits on them faster.

I've seen slides showing a 20k model using a 1 million k normal map, being 10x faster than using a million k model with no normal map. The only problem with this test is that a 20k model with a 1 million k normal map, is going to end up looking like a 60k-500k model, not a 1 million k model.