#### Archived

This topic is now archived and is closed to further replies.

# which way is the most cache friendly?

This topic is 5302 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

## Recommended Posts

This has always bothered me. Which of the following is the most efficient?  Assume that the array was created like so:
someobject** somearray = new someobject*[width];
for(int x=0;x < width;x++) {
somearray = new someobject[height];
}     
A)
for(int x=0;x < width;x++) {
for(int y=0;y < height;y++) {
...somearray[x][y]...
}
}     
B)
for(int x=0;x < width;x++) {
for(int y=0;y < height;y++) {
...somearray[y][x]...
}
}     
C)
for(int y=0;y < height;y++) {
for(int x=0;x < width;x++) {
...somearray[x][y]...
}
}     
D)
for(int y=0;y < height;y++) {
for(int x=0;x < width;x++) {
...somearray[y][x]...
}
}     
[edited by - Raloth on February 8, 2004 6:25:01 PM]

##### Share on other sites
I would think that all 4 would have O(n^2) behaviour.

##### Share on other sites
Unless you take into consideration width and height not being equal (and as long as you do it right that won''t be a cache issue, it will be a branch prediction issue) C and D are the same as A and B.

The reason one would be better then the other in terms of caching is because one is sequentially moving through memory (so that you can process an entire chunk of memory before requesting another) and one is making random accesses, which if your structure is large enough will require a new chunk to be brought into cache for every single iteration (which would be very slow). For this A seems to be the right one, since in memory you have A[0][0];A[0][1];A[0][2];A[1][0];A[1][1];A[1][2];A[2][0];A2][1];A[2][2];.

##### Share on other sites
porthios: not when a cache is involved

Right. B and C are dog slow, because you''re jumping around in memory. D transposes your matrix. A is the one, but only because of the non-canonical way you set up the matrix.
The ''correct'' way is:
int m[h][w];
for each y: for each x: m[y][x] = 0
You want to scan sequentially in memory in the inner loop.

##### Share on other sites
So...A and D are equally good, right?

##### Share on other sites
Only if width = height. The matrix is declared as [width][height], and D accesses it backwards.

##### Share on other sites
quote:
Original post by Jan Wassenberg
porthios: not when a cache is involved
Nah, it still boils down to O(n^2) (or O(w*h) more accurately), becuse incorrect lookup may only make it O(n^2 + n) where the latter n could have a high constant, but it''s still same as O(n^2)

##### Share on other sites
quote:
Original post by Anonymous Poster
O(n^2 + n)
Oops, more like O(n^2) for incorrect look-up (n^2 times normal mem access) and O(n^2+n) for good look-up (n times slow access, roughly n^2 times cache-friendly access)

##### Share on other sites
quote:
Original post by Anonymous Poster
Oops, more like O(n^2) for incorrect look-up (n^2 times normal mem access) and O(n^2+n) for good look-up (n times slow access, roughly n^2 times cache-friendly access)

i just have to mention here, why i dont like that academic "if it has roughly the same number of trailing zeroes for huge n it is equal". or maybe one should insist on the word _complexity_ which doesnt translate to _speed_. use that O-notation stuff to compare complexities of algorithms, not as a weird way to say "cache and parallel execution doesnt matter, its still equally fast".

in the end w*h*cacheaccess and w*h*cacheaccess+ w*h*overhead obviously isnt "the same", no matter what some "rough estimation notation" might say.

##### Share on other sites
quote:
This has always bothered me. Which of the following is the most efficient?

object* objectArray = new object[width * height];

Allocate a single contiguous array helps [hardware, especially] prefetchers do the right thing and allows you to walk the entire array with a pointer increment (strength reduction is your friend).

Whether you want the element at row r, column c to be at objectArray[r + (c * width)] or objectArray[(r * height) + c] will depend as much on what you want to do as anything else (i.e. row-major or column-major may be optimal, depending on the operations being performed -- for matrix premultiplication, for example, it would be preferred to have one matrix row-major and one column-major).

You might want also to investigate a cache agnostic configuration, though that will be more complicated.

1. 1
2. 2
3. 3
4. 4
frob
15
5. 5

• 16
• 12
• 20
• 12
• 14
• ### Forum Statistics

• Total Topics
632155
• Total Posts
3004477

×