Archived

This topic is now archived and is closed to further replies.

Russell

Multidimensional Arrays in C/C++

Recommended Posts

I have heard that doing a 2D array lookup is slower than doing a 1D array lookup, and 3D array lookup is slower than 2D, and so on. It seems to me that a 2D array lookup should be the same as two 1D array lookups, and a 3D array lookup should be the same as three 1D array lookups, but people seem to think 3D array lookups are REALLY slow. So what is the truth here? Here is why I think a 2D array lookup should be the same as two 1D array lookups. Let''s say I have two arrays: int b1[10], b2[10];. I make another array: int * a[2]; a[0] = b1; a[1] = b2; Now I should be able to do a 2D array lookup like: int i = a[1][5]; However, shouldn''t this be the same as doing two 1D array lookups? I''m just doing two 1D array lookups.

Share this post


Link to post
Share on other sites
No. These two things behave differently :

int a[10][20];
int * a[10]; //then fill a with int[20]'s

the first object is actually an array of the form a[200], and when you access a[j][k] it actually accesses a[j+10*k]. The "10*k" part is the slow part about it : to access an element you do 1 mul and 2 add, then a dereferencing :

a[j][k] == *(a+j+10*k)

the second object is actually an int** object, so you are basically adding, dereferencing, adding again, and dereferencing again :

a[j][k] == *(*(a+j)+k)

the second method is of course faster, but it's slower to set up (because you need to allocate additional memory and set it up. Still, a good way would be to allocate on the stack something like this :

int a[10][20];
int * b[10];
for( j = 0; j < 10; ++j ) { b[j] = &(a[j][0]); }

And then use b[j][k] to access the elements.

Victor Nicollet, INT13 game programmer



[edited by - ToohrVyk on March 20, 2004 11:07:48 AM]

Share this post


Link to post
Share on other sites
I thought that even a 1D array lookup required a multiply (unless it is an array of bytes). Since an int is 4 bytes (usually), to lookup in int a[10], it has to do *(a + (i * 4)).

Another issue is, how smart is the compiler? I hear a lot of people recommend to just do it the most straightforward way, avoid all of the pointer stuff, and let the compiler convert it to that method if that will be faster.

Share this post


Link to post
Share on other sites
Guest Anonymous Poster
With all of the "real" stuff that your game is probably doing--graphics, sound, physics, etc, etc--you should NOT be worrying about whether a[10][10] or a[100] is faster.

Worrying about optimizing the least important 1% of your code is simply wasted effort. You''d be better off trying to get more speed out of your rendering loop. (If you even need any, most computers are fast enough that you don''t care.)

Share this post


Link to post
Share on other sites
AP : why would optimizing the rendering system only mean he doesn''t have to optimize bidimensional arrays? Don''t forget bitmaps! In such cases even an array read can become a huge waste of time, since you usually do a lot of them each frame.

One of my collaborators recently worked on an application that did something like 15,360,000 pixel acecesses per second. In such cases optimization IS required (especially when you only have 200 MHz to work with in the first place, AND many other things to do, AND a lousy chipset overall).

OP : of course, any lookup requires a multiply (which is why []-accessing is slower than getting a copy of the array and incrementing it), but n-dimensional arrays require n*(n-1)/2 multiplies (in addition to the "normal" multiply of a 1D array).

Victor Nicollet, INT13 game programmer

Share this post


Link to post
Share on other sites
Guest Anonymous Poster
And the multiply-by-four for integers is built in to the processor. AFAIK it takes no more time than a ''normal'' lookup

Share this post


Link to post
Share on other sites
If your data is two dimensional by nature then whether you store it all in one long 1D array, or in a 2D array you are going to have about the same asm instructions either way. So for readability''s sake, just use a 2D array.
If you want to make sure it is fast the best thing you can do is use array dimensions that are a power of two in size. If the compiler sees that it has to multiply by a power of two then it will replace that with a shift.

Share this post


Link to post
Share on other sites
Multi dimentional array access DOES require multiple multiplies - compared to SINGLE dimention lookups - but this is completely unavoidable ... cause you have to do the multiply sometime ... the only thing you can optimize is when ... the the user wants to get item [3] [2] on a chess board ... someone has to do the mutliple to figure out that this really is item index [26] (3 * 8 + 2).

also, when the array is declared all at once it is faster, not slower, than when the arrays are built using pointers to arrays ... in which case they use multiple multiplies AND a pointer dereference per dimension.

int a1[20];
int a2[20][20];
int a2b[20*20];
int a3[20][20][20];
int a3b[20*20*20];

when you use a2b[index];

the compiler does something like: *(a2b + (index * 4))

when you use a2[x][y];

the compiler does something like:

*(a2b + (x * 80) + (y * 4))

so the [ + x * 80 ] is the extra work ... BUT this work is actually DOING SOME FUNCTION that you need, it''s comverting a number from a logical domain (rows / column) to a physical domain (single dimentional address space) ... if you try to use a2b you will find YOU writing the equivelent code to solve the problem ... so it''s NOT an optimization, it''s a functional change.

also, smart compilers do optimize out that multiply in loops - it''s just standard variable computation hoisting ... if the compiler can see that for a given inner loop a certain computation is contanst, it compute it once, prior to the loop ...

so this:

// assume this might be a nested loop - variable x is already set ... outside this code ...
for(int y=0; y<20; ++y)
cout << a2[x][y];

if the compiler cannot or does not optimze this (such as in debug mode) - then EVERY time inside the loop, the [+ x * 80] code will be done ...

but most (all I know of) modern compilers can optimize this so that the [x * 80] part is done before the loop, and is simple added in each time ... - and most compilers can go even farther and precompute the [a2 + (x * 80)] part, so inside the loop the code is JUST this:

*(precomputedSum + (y * 4))

EXACTLY the same as a one dimentional array.

Share this post


Link to post
Share on other sites
quote:
Original post by Anonymous Poster
With all of the "real" stuff that your game is probably doing...
Ahem... My program is a chess program, so...
quote:
graphics
No graphics.
quote:
sound
No sound.
quote:
physics
No physics.
quote:
etc, etc
Aha! You hit on one. Enhanced Transposition Cutoffs (ETC)! A decent tree searching pruning technique.
quote:
you should NOT be worrying about whether a[10][10] or a[100] is faster.

Worrying about optimizing the least important 1% of your code is simply wasted effort. You''d be better off trying to get more speed out of your rendering loop. (If you even need any, most computers are fast enough that you don''t care.)
The chess board is an array. The piece lists are 2D arrays or 3D arrays, depending upon which I choose. A chess program will spend almost all of its time scanning one of these arrays (board or piece lists).

If there is a detectable difference between using a 2D array or a 3D array for piece lists, I''d like to know that. I''ll be accessing and scanning these quite a bit.

Share this post


Link to post
Share on other sites
I was just thinking about one time when I turned a 2D array lookup into a single bitwise OR. Instead of doing this: a[x][y];, I did this: a[x|y];. I just modifed the values that x and y contained so it would work.

If the size of each dimension of the array was a power of two, I imagine if I wanted to take the time, I could do this for a 3D array also, right? a[x|y|z];?

Share this post


Link to post
Share on other sites
A chess game doesnt even have to worry about ze small stuff like that... and doing a tree search can get real, real expensive; i ran my connect 4 AI game on a uni machine, and some moves took more than a few seconds, so it could get real hairy with chess.

However, with the approach you mentioned youre going to make the code beyond unreadable... and if it turns out that theres a bug in the system, youre going to kill yourself later..

Share this post


Link to post
Share on other sites
Take a look at how J or SAC treat multidimensional arrays. It is much cleaner and propably faster.

[edited by - Trap on March 20, 2004 7:48:57 PM]

Share this post


Link to post
Share on other sites
quote:
Original post by psamty10
A chess game doesnt even have to worry about ze small stuff like that... and doing a tree search can get real, real expensive; i ran my connect 4 AI game on a uni machine, and some moves took more than a few seconds, so it could get real hairy with chess.

However, with the approach you mentioned youre going to make the code beyond unreadable... and if it turns out that theres a bug in the system, youre going to kill yourself later..


It is true that speed is not the most important thing when it comes to a board game playing program of the tree searching variety, but it's definitely up there.

At this point I'm trying to decide between two approaches to storing the piece lists. I can either do:

Two players (white and black) with a maximum of 16 pieces each
Piece piece_list[2][16];

Two players, six piece types, maximum of 10 of one piece type
Piece piece_list[2][6][10];

The first approach (2D array) means I would have code like this when looping through the piece list:

for (n = 0; n < piece_count[WHITE]; n++)
{
switch (piece_list[WHITE][n].type)
{
case PAWN: ...; break;
case KNIGHT: ...; break;
case BISHOP: ...; break;
case ROOK: ...; break;
case QUEEN: ...; break;
case KING: ...; break;
}
}
The 3D array allows me to have code like this, which should avoid some (more or less) unpredictable branches in the switch statement above:
int knights_left = piece_count[WHITE][KNIGHT];
Piece * knight = piece_list[WHITE][KNIGHT];
while (knights_left--)
generate_move(knight++); // or whatever
I was just a little concerned with the "fake" 3D array I'm using in the latter version. As you can see I'm not really using the 3D array inside the loop, and I never actually do a 3D array lookup. I do a 2D lookup to get the piece list for 'white knights', then a series of 1D lookups.

[edited by - Russell on March 21, 2004 2:39:01 AM]

Share this post


Link to post
Share on other sites
Guest Anonymous Poster
as always do it one way and then profile or do it both ways and compare or whatever...

Share this post


Link to post
Share on other sites
Guest Anonymous Poster
and... the smaller array might be more cache friendly

Share this post


Link to post
Share on other sites