We again misunderstood each other. I don't "promote" double precision models, just calculation. There is no impact on the bandwidth since only few floats are sent to the GPU.
... Can you elaborate this, please? I'm already using 16-bit storage for the height map (DEM).
Ah yes, I thought that you were storing vertices in double-precision format.
I guess you're reading in some compact data (e.g. 16-bit elevation), doing a bunch of double-precision trasforms on it, then outputting 32-bit floats?
That's much less offensive to performance than what I assumed you were doing
However, it may still be that double-precision calculations aren't necessary... you may be able to rearrange your order of operations, or the coordinate systems that you're working in so that everything works ok with just 32-bit precision. Whether that's at all worthwhile when you've already got a working solution is another whole topic though!
I guess if ALU-time was a performance bottleneck for you and you wanted to make use of 4-wide (or 16-wide on new PC CPUs) SIMD, then it might be worthwhile, otherwise if it aint broke don't fix it
While on this topic though, it's worth noting that some compilers, such as MSVC, actually output really horribly bad assembly code when you use floats, depending on the compiler settings. MSVC has "Enhanced Instruction Set" and "Floating point model". With the FP model set to "strict" or "precise", then it will produce assembly code with a LOT of redundant instructions to take every 80-bit intermediate values and round it down to 32-bit precision, so it your code behaves as if the FPU actually used 32-bit precision internally. When using double, it doesn't bother with all this redundant rounding code, which can actually make double seem like it's much faster than float!
Personally, I always set the instruction set to SSE2 and the FP model to "fast", which makes MSVC produce more sensible x86 code for floats.