Original post by RobTheBloke
on the bog standard 0x86 FPU, all calcs happen at 80bits, whether they are double or float. however, the 0x86 can read only 4bytes at a time. So (assuming that the data is 4byte aligned), a double requires 2 reads to get it into memory, a float requires a single read.
Just to clarify, entire blocks of 4k will be read out of main memory and into L1 cache at a time. These will then be loaded into registers as needed. The latter step isn't especially intensive. The thing is though, that the 4k will only fit half as many doubles as floats.