Ah yes, I thought that you were storing vertices in double-precision format.
I guess you're reading in some compact data (e.g. 16-bit elevation), doing a bunch of double-precision trasforms on it, then outputting 32-bit floats?
That's much less offensive to performance than what I assumed you were doing
Nope! In fact I'm generating terrain completely on the GPU. Only 16-bit elevation data, and different overlays are sent through textures. Everything is rendered without a single attribute (in a GLSL sense). CPU calculates precise position on the globe and relevant parameters used to generate full ellipsoid calculation and height correction on the GPU per vertex. Everything is done using FP on the GPU side, but coefficients are calculated on the CPU in DP, downcasted to FP, and sent to GPU as uniforms. Once again, no attributes are used. The representation cannot be more compact. But I still need DP to do accurate math on the CPU.
While on this topic though, it's worth noting that some compilers, such as MSVC, actually output really horribly bad assembly code when you use floats, depending on the compiler settings. MSVC has "Enhanced Instruction Set" and "Floating point model". With the FP model set to "strict" or "precise", then it will produce assembly code with a LOT of redundant instructions to take every 80-bit intermediate values and round it down to 32-bit precision, so it your code behaves as if the FPU actually used 32-bit precision internally. When using double, it doesn't bother with all this redundant rounding code, which can actually make double seem like it's much faster than float!
Personally, I always set the instruction set to SSE2 and the FP model to "fast", which makes MSVC produce more sensible x86 code for floats.
Thank you for the advice! Although I've been using VS since version 4.1, I have never had need to tweak compiler options. I'll try what you have suggested!