Is it actually faster to pass e.g. 4 float variables as a 4-component vector then just as 4 separate float variables ?
At least somebody told me that. I don't see the difference since it's just a 16 byte aligned array that is being transfered as a constant buffer to the gpu or not ? Is there some secret gpu magic going on that I'm missing ?