Also, there is a slight performance advantage when using the block compressed texture formats (DXT / BC), since each block contains data of several texels and the typical texture access patterns make use of nearby texels. One block has data of 4x4 texels and it is compressed to 1:4 or 1:6 so sampling a compressed texture uses 1/4 or 1/6 of the memory bandwidth compared to non-compressed texture. The block decompression is implemented on the hardware and it is rather simple so it doesn't practically slow down.
Of course you don't need to use DDS as container, but there are already lots of tools for them.
Cheers!
Edited by kauna, 06 January 2013 - 03:47 PM.