Jump to content
  • Advertisement

Archived

This topic is now archived and is closed to further replies.

Raloth

which way is the most cache friendly?

This topic is 5302 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

This has always bothered me. Which of the following is the most efficient? [edit] Assume that the array was created like so:
someobject** somearray = new someobject*[width];
for(int x=0;x < width;x++) {
    somearray = new someobject[height];
}     
A)
for(int x=0;x < width;x++) {
    for(int y=0;y < height;y++) {
        ...somearray[x][y]...
    }
}     
B)
for(int x=0;x < width;x++) {
    for(int y=0;y < height;y++) {
        ...somearray[y][x]...
    }
}     
C)
for(int y=0;y < height;y++) {
    for(int x=0;x < width;x++) {
        ...somearray[x][y]...
    }
}     
D)
for(int y=0;y < height;y++) {
    for(int x=0;x < width;x++) {
        ...somearray[y][x]...
    }
}     
[edited by - Raloth on February 8, 2004 6:25:01 PM]

Share this post


Link to post
Share on other sites
Advertisement
Unless you take into consideration width and height not being equal (and as long as you do it right that won''t be a cache issue, it will be a branch prediction issue) C and D are the same as A and B.

The reason one would be better then the other in terms of caching is because one is sequentially moving through memory (so that you can process an entire chunk of memory before requesting another) and one is making random accesses, which if your structure is large enough will require a new chunk to be brought into cache for every single iteration (which would be very slow). For this A seems to be the right one, since in memory you have A[0][0];A[0][1];A[0][2];A[1][0];A[1][1];A[1][2];A[2][0];A2][1];A[2][2];.

Share this post


Link to post
Share on other sites
porthios: not when a cache is involved

Right. B and C are dog slow, because you''re jumping around in memory. D transposes your matrix. A is the one, but only because of the non-canonical way you set up the matrix.
The ''correct'' way is:
int m[h][w];
for each y: for each x: m[y][x] = 0
You want to scan sequentially in memory in the inner loop.

Share this post


Link to post
Share on other sites
Guest Anonymous Poster
quote:
Original post by Jan Wassenberg
porthios: not when a cache is involved
Nah, it still boils down to O(n^2) (or O(w*h) more accurately), becuse incorrect lookup may only make it O(n^2 + n) where the latter n could have a high constant, but it''s still same as O(n^2)

Share this post


Link to post
Share on other sites
Guest Anonymous Poster
quote:
Original post by Anonymous Poster
O(n^2 + n)
Oops, more like O(n^2) for incorrect look-up (n^2 times normal mem access) and O(n^2+n) for good look-up (n times slow access, roughly n^2 times cache-friendly access)

Share this post


Link to post
Share on other sites
quote:
Original post by Anonymous Poster
Oops, more like O(n^2) for incorrect look-up (n^2 times normal mem access) and O(n^2+n) for good look-up (n times slow access, roughly n^2 times cache-friendly access)


i just have to mention here, why i dont like that academic "if it has roughly the same number of trailing zeroes for huge n it is equal". or maybe one should insist on the word _complexity_ which doesnt translate to _speed_. use that O-notation stuff to compare complexities of algorithms, not as a weird way to say "cache and parallel execution doesnt matter, its still equally fast".

in the end w*h*cacheaccess and w*h*cacheaccess+ w*h*overhead obviously isnt "the same", no matter what some "rough estimation notation" might say.

Share this post


Link to post
Share on other sites
quote:
This has always bothered me. Which of the following is the most efficient?



object* objectArray = new object[width * height];

Allocate a single contiguous array helps [hardware, especially] prefetchers do the right thing and allows you to walk the entire array with a pointer increment (strength reduction is your friend).

Whether you want the element at row r, column c to be at objectArray[r + (c * width)] or objectArray[(r * height) + c] will depend as much on what you want to do as anything else (i.e. row-major or column-major may be optimal, depending on the operations being performed -- for matrix premultiplication, for example, it would be preferred to have one matrix row-major and one column-major).

You might want also to investigate a cache agnostic configuration, though that will be more complicated.

Share this post


Link to post
Share on other sites

  • Advertisement
×

Important Information

By using GameDev.net, you agree to our community Guidelines, Terms of Use, and Privacy Policy.

We are the game development community.

Whether you are an indie, hobbyist, AAA developer, or just trying to learn, GameDev.net is the place for you to learn, share, and connect with the games industry. Learn more About Us or sign up!

Sign me up!