Move Generation using Bit boards (Connect-4)

Started by
12 comments, last by alvaro 11 years, 6 months ago
Hi all,
I am facing some speed(performance) issues while generating moves for connect 4.

Perviously I wrote simple nested for-loops to generate the moves now I tried to convert it into bit boards so
I found all the empty squares and anded it with column bits. (eg column1=(1L<<1|1L<<10...)
This gave me the empty bits which are in a particular column.
Now I found the MSB by right-shifting this till the number was 0( trick to find MSB when its power of 2).
it gave me correct answer, but then surprisingly this was slower as compared to nested for (nested for loops took 238 ms where as bitboards took 1349 ms).

So then I tried another method, the folding trick as mentioned here.
[source lang="csharp"] x |= (x >> 1);
x |= (x >> 2);
x |= (x >> 4);
x |= (x >> 8);
x |= (x >> 16);
x |= (x >> 32);

for (int n = 53; n >=0; n--)
if (((1L << n) & (x & ~(x >> 1))) != 0)
return n;[/source]
This too gave me slow results.
What am I doing wrong as I am sure bitboards will be certainly faster then nested loops.
How can I achieve this without nested loops, something like debrujin sequence for MSB (64 bit number).

-Thank you.
Advertisement
I would use consecutive bits to represent columns. Following the same convention as Fhourstones:

. . . . . . .
5 12 19 26 33 40 47
4 11 18 25 32 39 46
3 10 17 24 31 38 45
2 9 16 23 30 37 44
1 8 15 22 29 36 43
0 7 14 21 28 35 42


You can then generate moves as
u64 generate_moves() {
u64 occupied = pieces[0] | pieces[1];
return BOARD_MASK & (occupied >> 1) & ~occupied;
}


When you need to loop over the moves, you do something like this:
for (u64 moves = generate_moves(); moves; moves &= moves-1) {
u64 move = moves & -moves;
// `move' now has a bitboard with a single 1 in the position where you can move.
// You can use the De Bruijn sequence trick if you want to convert it to an index.
}
Fhourstones representation is nice, it uses lesser number of bits as compared to mine (with borders).
However I will choose this type of representation in second version (to compare with my own implementation)

I found the bug that was causing delay, it considered a move on index 0, which lies on the border. (so approximately 7 times more ouch!)

I used Nalimov representation from chess-programming wiki.
Generally what I do is, take up a 2d-array, store all moves in form of moves[depth,move] and then access according to depth. I also keep another array which helps me to count number of moves for particular depth, which is used to traverse.

The array representation helped me to sort the moves based on killer moves heuristics. But I also noticed that using arrays for storing moves seems to be slow (I may be wrong on this one, kindly correct if I am.) But I am unable to see a way to sort killer moves first with using only bit boards.

Here is what I did with my genMoves method.
[source lang="csharp"]public static void genMoves()
{
long empty = ((~(xBits | yBits)) & bitBoard);

int moveIndex = 0;

moveIndex = findIndex((ulong)(empty & column1));

if (moveIndex != 0)
moves[depth, nPly[depth]++] = moveIndex;

moveIndex = findIndex((ulong)(empty & column2));
if (moveIndex != 0)
moves[depth, nPly[depth]++] = moveIndex;

moveIndex = findIndex((ulong)(empty & column3));
if (moveIndex != 0)
moves[depth, nPly[depth]++] = moveIndex;

moveIndex = findIndex((ulong)(empty & column4));

if (moveIndex != 0)
moves[depth, nPly[depth]++] = moveIndex;

moveIndex = findIndex((ulong)(empty & column5));

if (moveIndex != 0)
moves[depth, nPly[depth]++] = moveIndex;

moveIndex = findIndex((ulong)(empty & column6));

if (moveIndex != 0)
moves[depth, nPly[depth]++] = moveIndex;

moveIndex = findIndex((ulong)(empty & column7));

if (moveIndex != 0)
moves[depth, nPly[depth]++] = moveIndex;


}

public static int findIndex(ulong bb)
{
int result = 0;
if (bb > 0xFFFFFFFF)
{
bb >>= 32;
result = 32;
}
if (bb > 0xFFFF)
{
bb >>= 16;
result += 16;
}
if (bb > 0xFF)
{
bb >>= 8;
result += 8;
}
return result + ms1bTable[(int)bb];
[/source]
Theres negligible improvement of jus one second over the for loops.
With arrays its simpler to order the moves, but with bits its quicker.
Can you show me a way to order the moves using bit approach.
I can't write code for you following your square-to-bit convention, because I don't know what it is. However, my code shows you how you don't have to consider each column individually to find all the valid moves: Just compute `empties & shift_north(occupied)'. Then extract all the bits that are set, using a loop like the one I showed you.

I don't know of any way to sort moves other than putting them in an array first, but that shouldn't be slow at all.

Just compute `empties & shift_north(occupied)'. Then extract all the bits that are set, using a loop like the one I showed you.


On an empty board, occupied will be 0, so according to the pseudo-code, only available move is 0.
am I missing something? I adopted the Fhourstones structure for a while.

[quote name='alvaro' timestamp='1341054536' post='4954278']
Just compute `empties & shift_north(occupied)'. Then extract all the bits that are set, using a loop like the one I showed you.


On an empty board, occupied will be 0, so according to the pseudo-code, only available move is 0.
am I missing something? I adopted the Fhourstones structure for a while.
[/quote]

Ooops! You are right. It's easily fixed, though: empties & (shift_north(occupied) | FIRST_ROW)
Hi again,
I was occupied with few things so had to keep this coding away.
@alvaro: Apology for late reply, but your trick did its job and its working nicely.

Thinking about this further, I think I can reduce the number of moves when there is winning threat present.
say I have three in a row and computer has 6 different moves, its not practical to search every single move as next I am going to play on that.
Firstly can you please classify if this approach is correct. I have added some piece of code in my make move method which will help me do that.
its buggy currently(of which I do not concentrate as of now,concentrating on concept), but thing is I am focussing on discarding of search nodes as much as possible before adding more knowledge to the eval as it would slow down.
Currently its getting about 400knps.
Please help me with this.

long occupied = (xBits | yBits);
long empty = ~occupied;
long bitMoves = 0L;
bitMoves = bitBoard & empty & ((occupied >> 9) | lastrow);
//Find the forced moves
long xThreats = 0L;
long yThreats = 0L;

yThreats |= ((yBits << 1) & (yBits << 2) & (yBits << 3) & empty & bitBoard);//XXX_
yThreats |= ((yBits >> 2) & (yBits << 1) & (yBits >> 1) & empty & bitBoard);//X_XX
yThreats |= ((yBits << 2) & (yBits << 1) & (yBits >> 1) & empty & bitBoard);//XX_X
yThreats |= ((yBits >> 1) & (yBits >> 2) & (yBits >> 3) & empty & bitBoard);//_XXX

yThreats |= ((yBits << 10) & (yBits << 20) & (yBits << 30) & empty & bitBoard);//XXX_
yThreats |= ((yBits >> 20) & (yBits << 10) & (yBits >> 10) & empty & bitBoard);//X_XX
yThreats |= ((yBits << 20) & (yBits << 10) & (yBits >> 10) & empty & bitBoard);//XX_X
yThreats |= ((yBits >> 10) & (yBits >> 20) & (yBits >> 30) & empty & bitBoard);//_XXX
yThreats |= ((yBits << 8) & (yBits << 16) & (yBits << 24) & empty & bitBoard);//XXX_
yThreats |= ((yBits >> 16) & (yBits << 8) & (yBits >> 8) & empty & bitBoard);//X_XX
yThreats |= ((yBits << 16) & (yBits << 8) & (yBits >> 8) & empty & bitBoard);//XX_X
yThreats |= ((yBits >> 8) & (yBits >> 16) & (yBits >> 24) & empty & bitBoard);//_XXX

xThreats |= ((xBits << 1) & (xBits << 2) & (xBits << 3) & empty & bitBoard);//XXX_
xThreats |= ((xBits >> 2) & (xBits << 1) & (xBits >> 1) & empty & bitBoard);//X_XX
xThreats |= ((xBits << 2) & (xBits << 1) & (xBits >> 1) & empty & bitBoard);//XX_X
xThreats |= ((xBits >> 1) & (xBits >> 2) & (xBits >> 3) & empty & bitBoard);//_XXX
xThreats |= ((xBits << 10) & (xBits << 20) & (xBits << 30) & empty & bitBoard);//XXX_
xThreats |= ((xBits >> 20) & (xBits << 10) & (xBits >> 10) & empty & bitBoard);//X_XX
xThreats |= ((xBits << 20) & (xBits << 10) & (xBits >> 10) & empty & bitBoard);//XX_X
xThreats |= ((xBits >> 10) & (xBits >> 20) & (xBits >> 30) & empty & bitBoard);//XXX_

xThreats |= ((xBits << 8) & (xBits << 16) & (xBits << 24) & empty & bitBoard);//XXX_
xThreats |= ((xBits >> 16) & (xBits << 8) & (xBits >> 8) & empty & bitBoard);//X_XX
xThreats |= ((xBits << 16) & (xBits << 8) & (xBits >> 8) & empty & bitBoard);//XX_X
xThreats |= ((xBits >> 8) & (xBits >> 16) & (xBits >> 24) & empty & bitBoard);//_XXX


if((((yThreats|xThreats)&bitMoves)!=0))
{
bitMoves = bitMoves&(yThreats|xThreats);// play on the threatend empty square only.
}


my board structure is
00 | 01 02 03 04 05 06 07 | 08
09 | 10 11 12 13 14 15 16 | 17
18 | 19 20 21 22 23 24 25 | 26
27 | 28 29 30 31 32 33 34 | 35
36 | 37 38 39 40 41 42 43 | 44
45 | 46 47 48 49 50 51 52 | 53
where | is the border.
You don't need two columns for padding, but that doesn't really matter.

I spent a lot of time in the early 90s writing a connect 4 program. What I did at the time was making the move generator smart enough to only allow you to win if a win is present, and only allows you to block an opponent's threat if one is present. My friends and I were playing a lot of connect 4 at the time, and we actually used rules similar to chess, where it is considered illegal to expose your king. So (during the search) my move generator also didn't let a player play right under an opponent's threat, because that results in immediate victory for the opponent.

Oh, I also extended the depth for forced moves (where only one move is legal, with the definition above). This makes the program stronger in tactics, but nowadays I would prefer to test this potential improvement more scientifically, actually playing thousands of games to see if it really helps.

A matter of style:
long bitMoves = 0L;
bitMoves = bitBoard & empty & ((occupied >> 9) | lastrow);


Why is the code above not simply this?
long bitMoves = bitBoard & empty & ((occupied >> 9) | lastrow);

Actually, I think my compiler (gcc) would complain about initializing a variable to a value that is never used.
@alvaro, Thanks for the reply,
From your reply, I took that I was on right track.
Now for a trial position after 8 plies, my program solves the game in about 5 mins.
Later on I shall implement iterative deepening and would post the updated progress.
Hi again,
I implemented the iterative deepening, with History heuristics, but then some how I find that the time taken for Iterative deepening is far too more than normal
I think this happened because of high search depth (20+) as there were lot of sorting for each move. I also read few papers which said that performance History heuristics decreases with higher depths where depth is above 7 or 8.

This topic is closed to new replies.

Advertisement