The most significant performance hit for using doubles will probably come from the CPU cache misses and memory bandwidth caused by them taking up twice as much memory.
In addition SSE instructions can handle four floats at a time, but only two doubles at a time. So for code which uses them the performance hit can be significant.
For basic operations on values in registers, floats aren't significantly faster than doubles. For some more complex operations (like division) doubles will be slower than floats, as they have more precision, but overall you probably won't notice much difference.
For details on specific instructions look at Intel's optimization manual - http://www.intel.com/content/www/us/en/architecture-and-technology/64-ia-32-architectures-optimization-manual.html (for example you can compare the divsd and divss instructions there, to see the timings for float vs double division).
Yep. that's what I thought. I have been reading a lot about cache optimization and fitting everything on the cache line. It's very interesting.