Sign in to follow this  
sgalzin

Finding the N best moves in a Minimax tree

Recommended Posts

hi, i am currently working on a strategy game i've invented which is similar to chess or go. the best move is searched using a minimax tree. I want to be able, once the tree has been searched, to provide not just one but several move possibilities which have similar outcomes (in terms of values). What I have in mind is a "granularity" which could enable the computer to think in the following terms : "The best branches are A and D. If I choose branch A I get 22.45 points, if I choose branch D I get 22.20 points, therefore I may choose randomly between branches A and D since my granularity is of plus or minus 0.5". This is important because I wish to have the computer play against itself and I would like it to play diffferent games ... Once this method returns several good moves, the computer may choose the one to play randomly. thanks for your help, stephane.

Share this post


Link to post
Share on other sites
The way I'd do it is the same as my previous suggestion - have a parent pointer in each node. When the iterative deepening finishes a branch have it search its way back up to the original move that was spawned and save the move and the final score that was found at that depth in a table. Each time the search returns it checks against the value in the table (for that move) and replaces it if it's lower. At the end of your searching you can go through the table and do what you like with the answers. It may be useful to use a priority queue in place of simple map (table). The only downside to this is that it adds O(mlogn) to the runtime of the search, as it has to iterate up log n nodes m times.
Best of luck - I'll be keen to see the finished product. :)
Cam

Share this post


Link to post
Share on other sites
The problem of finding the top k moves is much harder than the problem of adding a little variety to your move selection so you don't play the same game over and over again. The first thing to do is using a wide opening book. That alone will make repeating the same game twice much less likely. If your program can ponder while the opponent is thinking, timing differences in how long the opponent thinks might be enough to not repeat the same game. In my own chess program, this is all I do.

If you really want to make more random moves, you can add a small random component to your evaluation function. There is some evidence that this might be actually a good way to capture some weird dynamic notion of mobility (a search with pure random evaluation has been shown to be superior to random move selection).

If you really want to find all the moves that are close to the optimum score, the problem is that alphabeta doesn't provide such information. One thing you can do is perform a regular alphabeta search (or PVS, or SSS*, or MTD(f) or whatever) and then make a second pass, searching every move that is not the best move with a null window a few points below the best score. This will tell you what moves were close to being the best. The extra searches are unavoidable.

Share this post


Link to post
Share on other sites
Hi,

Thanks again for those useful replies ...

Talyssan : your solution (if I understand it correctly) seems fairly easy to implement ... I could use the PVS (principal variation search described here) and insert the solution found when foundPV==true in an ordered set containing a fixed number of "best solutions". when inserting a new solution in a full set, the worst solution is deleted. Does this seem like a better solution than going back up the branches for every leaf ?

Alvaro : the first solution you suggest won't work for me :-( the reason is that, first of all, I don't have and cannot have an opening book : I've invented the game and I still don't know how to play it well. worse : no one knows how to play it any better since I'm the only one to know the game ! so I don't know any convenient openings (at least not now). So no randomnes can come from the opening. Second, the reason I am implementing the randomnes is that I want the computer to play against itself ... both oponents will therefore be using the same "intelligence" ... the only thing that could make things different during two games would be to force the computer to think for a random amount of time, which doesn't seem fair and won't garanty the outcome to be fair play.

Your second suggestion, however surprising, is very attractive ! And easy to implement :-) I'm surprised it works !! Do you suggest something like
value += (random*granularity)-(granularity/2)
? I'll try that.

Concerning your third suggestion, I have to say I didn't understand it ... that's because I'm new to AlphaBeta and search windows ... But do you think it will give better results than what I propose to do using Talyssan's solution with PVS ?

Thanks, I'm going back to my coding window for now ...

Cheers,

Stephane.

Share this post


Link to post
Share on other sites
Quote:
Alvaro : the first solution you suggest won't work for me :-( the reason is that, first of all, I don't have and cannot have an opening book : I've invented the game and I still don't know how to play it well. worse : no one knows how to play it any better since I'm the only one to know the game ! so I don't know any convenient openings (at least not now). So no randomnes can come from the opening. Second, the reason I am implementing the randomnes is that I want the computer to play against itself ... both oponents will therefore be using the same "intelligence" ... the only thing that could make things different during two games would be to force the computer to think for a random amount of time, which doesn't seem fair and won't garanty the outcome to be fair play.

I don't mean that writing an opening book doesn't require work. You need to learn how to play the game and then decide what are good opening moves. If the initial position of your game is always the same, it doesn't make sense to start thinking on that position when you are playing "live". You should have thought about what you want to do in that position beforehand, and store the results in an opening book.

Quote:
Your second suggestion, however surprising, is very attractive ! And easy to implement :-) I'm surprised it works !! Do you suggest something like
value += (random*granularity)-(granularity/2)
? I'll try that.

Yup, as simple as that.

Quote:
Concerning your third suggestion, I have to say I didn't understand it ... that's because I'm new to AlphaBeta and search windows ... But do you think it will give better results than what I propose to do using Talyssan's solution with PVS ?

Well, I didn't understand Talyssan's solution, so I guess we are even. :) I don't recommend following the method I described, since my "second suggestion" is so simple and works so well.

Share this post


Link to post
Share on other sites
Quote:
I don't mean that writing an opening book doesn't require work. You need to learn how to play the game and then decide what are good opening moves. If the initial position of your game is always the same, it doesn't make sense to start thinking on that position when you are playing "live". You should have thought about what you want to do in that position beforehand, and store the results in an opening book.


you're right ! that makes sense ... i'l write that book, but not in the immediate future : I first need the computer to play a few times and adjust the rules (yep ... they aren't final yet !) before feeding it the definitive game and searching patterns :-) once that's done, I guess I can let it think about the first move for say a day and feed its conclusions in an opening book.

thanks, I'll let you guys know how it's all coming along !

stephane.

Share this post


Link to post
Share on other sites
My bad, yes Alvaro is right my method wouldn't help you much as the algorithm ends at a depth, not at a total number of nodes searched.

Rather than trace back up the tree it might be better just send the name of the initial move down the tree - then you could return that move and the final score with it.

Sorry for any confusion. :(
-Cam

Share this post


Link to post
Share on other sites
Hi all,

I'm still having trouble with this "best move" thing ! I've tried to implement what's written on this page but there are still some things I don't understand :
http://www.seanet.com/~brucemo/topics/hashing.htm

Here is what is said :
Quote:

The main transposition table is an array of hash elements. Each hash element looks something like this:


#define hashfEXACT 0
#define hashfALPHA 1
#define hashfBETA 2

typedef struct tagHASHE {
U64 key;
int depth;
int flags;
int value;
MOVE best;
} HASHE;





Quote:

The final field in my hash element is the "best" move encountered last time I searched this position. Sometimes I don't get a best move, like if everything failed low (returned a score <= alpha), but other times there is a definite best move, like when something fails high (returns a score >= beta).

If a best move is found, it will be searched first.

Some sample code, overlaid onto an alpha-beta function :


int AlphaBeta(int depth, int alpha, int beta)
{
int hashf = hashfALPHA;

if ((val = ProbeHash(depth, alpha, beta)) != valUNKNOWN)
return val;
if (depth == 0) {
val = Evaluate();
RecordHash(depth, val, hashfEXACT);
return val;
}
GenerateLegalMoves();
while (MovesLeft()) {
MakeNextMove();
val = -AlphaBeta(depth - 1, -beta, -alpha);
UnmakeMove();
if (val >= beta) {
RecordHash(depth, beta, hashfBETA);
return beta;
}
if (val > alpha) {
hashf = hashfEXACT;
alpha = val;
}
}
RecordHash(depth, alpha, hashf);
return alpha;
}

int ProbeHash(int depth, int alpha, int beta)
{
HASHE * phashe = &hash_table[ZobristKey() % TableSize()];

if (phashe->key == ZobristKey()) {
if (phashe->depth >= depth) {
if (phashe->flags == hashfEXACT)
return phashe->val;
if ((phashe->flags == hashfALPHA) &&
(phashe->val <= alpha))
return alpha;
if ((phashe->flags == hashfBETA) &&
(phashe->val >= beta))
return beta;
}
RememberBestMove();
}
return valUNKNOWN;
}

void RecordHash(int depth, int val, int hashf)
{
HASHE * phashe = &hash_table[ZobristKey() % TableSize()];

phashe->key = ZobristKey();
phashe->best = BestMove();
phashe->val = val;
phashe->hashf = hashf;
phashe->depth = depth;
}



I don't understand where the code corresponding to "search the best move first" is. Actually, I'm not sure what to do with the RemeberBestMove() method : should I just save the current move into a variable, say bestCurrentMove ?

I hope someone will be able to help me on this ! Thanks,

Stephane.

Share this post


Link to post
Share on other sites
Quote:
Original post by sgalzinI don't understand where the code corresponding to "search the best move first" is. Actually, I'm not sure what to do with the RemeberBestMove() method : should I just save the current move into a variable, say bestCurrentMove ?


Yeah, that code is not very self-explanatory. It looks to me like RememberBestMove() stores the best move from the hash somewhere (in the board structure?), and then GenerateLegalMoves() will put that move first if it finds it. There are other ways you can do this. In particular, it might be interesting to explore the hash move before any other moves have been even generated. If that produces a beta cut-off, you won't need to generate moves at all.

As for not storing the best move in the case of a fail-low, that's not what I do. I just store whatever the best move found is, which might not be very good in the case of a fail-low, but when I tried Bruce's suggestion it didn't make much of a difference.

Share this post


Link to post
Share on other sites
Hi everybody,

I´m happy to find sgalzin´s latest posting here with all the code from Moreland´s excellent and helpful page, cause that´s very close to the reason I registered to this board - I somehow just don´t understand how storing best move information in the tt would work. So I don´t open a new thread and just throw my question in.
I´m quite new to game programming, but what I already have is a nice little iterative deepening alpha-beta algorithm with extensions, quiescence search and transposition tables. When I experimented with PVS, I realized that my move ordering can´t be too good, because it actually brought me a cost of additional 10-20% instead of any gain (in terms of nodes). Therefore I´d like to use my tt to enhance move ordering first, and now my question is: Which move exactly do I have to put in there in the various lines of code where "RecordHash" shows up? The mystery seems to be somewhere in "BestMove()". :)
Thanks and excuse my english errors, if you find any. :)

Hendrik

Share this post


Link to post
Share on other sites
When you explore a node in alpha-beta, there are basically three possible results:
- You find a move whose score is beta or better. In this case you can stop considering moves and return the score you found. You store the score as a lower bound in the TT and you also store the move that produced the beta cut.
- All your moves have a score that is less than beta, but at least one of them has a score above alpha. In this case you store the score in the TT as an exact score and you also store the move that gave you the best score.
- All your moves have scores that are alpha or worse. In this case you store the best score you find as an upper bound, and now whether you store the best move in the TT or not is not very important, in my experience. Moreland doesn't store it. I do. Moreland's position probably makes a little more sense, but in practice my program performed about the same with or without storing the move.

Moreland's pseudo-code is not very clear, but it shouldn't be hard for you to figure out how to keep track of the best move. If you need to see some pseudo-code, I can produce some.

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this