Once that 64 byte buffer is loaded, anything you do within that 64 byte buffer is very nearly free. Thanks to the magic of the Out Of Order core and processor caches, doing one operation on one item in that block is very nearly the same clock time as doing one operation on sixteen items in that block.
Would it actually be beneficial to pack our variables more tightly together (use char/short where possible) in order to reduce block loads?
I was pleasantly surprised when I ate at McDonald's in Washington DC. I heard so many horrible things about the food from people, I didn't expect much, however, the food served had fresh tomatoes and lettuce and the buns were whole-grain. I'd say that was one of the most healthy burgers I've ever seen in McDonald's.
That was my first and last time I ate a McDonald's burger in America, so maybe I just got lucky?