I wouldn't call it a micro-optimization at all but rather a design issue and one that is becoming more important every year. There are two issues at stake, the first being the fact that fetching from RAM is slow and will continue to get slower. Therefore one of the most important issues is how you fetch and cache the data to operate on as there is really no reason not to do this... it usually makes the code much easier to read. On current generation platforms (and likely future) it is extremely important as you can't afford to DMA a bunch of data that you don't need to work on, etc. Therefore I wouldn't call this the "worst kind of optimization" but rather "the best kind of design".
That sounds like the worst kind of micro-optimization. If you're worrying about things down to individual bytes and cycles, then you're worrying about the wrong things. There's rarely meaningful performance gains to be had from that kind of optimization.
Also note when data is separated out and designed like this it usually goes hand in hand with being able to parallelize operations much easier. You aren't passing a large object with the kitchen sink inside (where realistically anything could be called/changed)... you are able to pass large contiguous blocks of memory that hold specific pieces of data to be worked on.