Jump to content

  • Log In with Google      Sign In   
  • Create Account


Time Management in a game of Blokus


Old topic!
Guest, the last post of this topic is over 60 days old and at this point you may not reply in this topic. If you wish to continue this conversation start a new topic.

  • You cannot reply to this topic
10 replies to this topic

#1 shishio64   Members   -  Reputation: 116

Like
0Likes
Like

Posted 21 December 2013 - 05:47 PM

Hello everyone,

I have got an assignment in my AI class to design a good time management plan for a player in the game of Blokus.

the player is given a fixed amount of time for the all game and he needs to distribute it wisely over his moves.

the player uses the alpha-beta search algorithm in an iterative deepening framework to find the best move.

im given an implementation of the game where the player statically distributes his time evenly over the max amount of moves (21) and i need to improve upon that.

my question is how should i dynamically distribute the time between moves using the iterative deepening framework to my advantage?

after a lot of searching this is the closest thing i could find about time management (in chess) but it is not explained:

http://chessprogramming.wikispaces.com/Time+Management

Iterative deepening in conjunction with its predictable effective branching factor allows a flexible time management either to terminate the current iteration and to fall back on best move and PV of the previous iteration, or to decide about termination prior to start a new iteration or to search a next root-move.

 

how do you calculate that predictable effective branching factor and why is it important for the decision of the ID termination?

 

any help would be greatly appreciated!



Sponsor:

#2 Álvaro   Crossbones+   -  Reputation: 11865

Like
1Likes
Like

Posted 21 December 2013 - 08:51 PM

The way I like to implement time control is by dividing the problem in two sub-problems.

The first part is a quick computation at the beginning of the search, where we determine two numbers: How much we would like to spend on this move in normal circumstances, and how much is the absolute maximum we can use.

The second part consists of trying to obey the indications given by the first part during the search. How this second part is achieved depends on the details of the algorithm being used by the AI. If you are using MCTS --or a similar algorithm that can be stopped at any time--, you simply run simulations until the normal time has elapsed, and then only continue running simulations if you determine that you are in an unstable situation (for instance, if the move with the best score and the move that has been examined the most times are not the same). If you ever get to the maximum time, you stop.

If your algorithm is something like alpha-beta search with iterative deepening (what is used in chess), you have to decide whether to start another iteration or not. This is where you may need to have an estimate of the effective branching factor (the time it takes to search depth d divided by the time it takes to search depth d-1). Let's say you always start a new iteration if the time spent is less than T, and your effective branching factor is 4. You will end up spending some random amount of time between T and 4*T, with an average of 2.5*T. So you should make T = normal_time / 2.5 so that you end up using the amount of time you want in the average. You would want to do that because interrupting an iteration of deepening is very undesirable: You will often not learn anything in a partial search like that, and you should only abort if you reach the maximum time. You can also try to spend more time if you have an indication that there is trouble (e.g., if the score for the best move drops below a certain amount (this can be implemented together with aspiration windows)), and perhaps spend less time if it is easy to determine that one move is obviously much better than the others.

With all that in mind, my recommendation is that you implement a very simplistic "first part", where you simply take the remaining time and divide it by the number of pieces you have in your hand that are not provably unplaceable, make that the normal time, and determine the maximum time so that you can always spend at least some fixed small amount of time per move in the future. Then do as good a job as you can in implementing the "second part".

Now you have a working time-control system and it's time to tweak it. By that I mean tweaking the "first part". Most engines benefit from spending more time in the early part of the game, so try to do that. An easy way to do this is to make the normal time be proportional to a power of the number of moves left. If the exponent is 0, that gets you back the basic behavior of evenly spending time through the game. If the exponent is 1, you'll try to spend 21 units of time initially of a total of 21+20+19+...+1 = 231. So instead of normal_time = total_time/21, you'll end up doing normal_time = total_time*(21/231) = total_time/11.

Now it's a matter of determining what the best setting for that exponent. The only way to do that is by playing lots of games and seeing what wins the most. You can play a match between several versions of your program, or you can have a reference program and try different versions of your program against it, or you can use something much fancier, like CLOP. In any case, you will need to run at least several thousand games to gain some statistical significance. Putting together a good environment to test changes to your program is essential for all engine development beyond the "putting something together" phase, so it's an investment that will pay off in future development.

That should keep you busy for a while. smile.png

#3 shishio64   Members   -  Reputation: 116

Like
0Likes
Like

Posted 22 December 2013 - 03:58 AM

Thanks a lot Alvaro!

 

I have a bit of trouble understanding this part:

 

 

If your algorithm is something like alpha-beta search with iterative deepening (what is used in chess), you have to decide whether to start another iteration or not. This is where you may need to have an estimate of the effective branching factor (the time it takes to search depth d divided by the time it takes to search depth d-1).

 

how do you estimate the effective branching factor? do you need to predict the time it will take for the next iteration to complete and divide that by the time of the current iteration? or to divide the time it took the current iteration to complete by the time it took the previous iteration to complete?

if it's the first, how do you predict something like that, if it's the latter it's easy (but not sure how practical).

 

 

Let's say you always start a new iteration if the time spent is less than T, and your effective branching factor is 4. You will end up spending some random amount of time between T and 4*T, with an average of 2.5*T. So you should make T = normal_time / 2.5 so that you end up using the amount of time you want in the average.

 

the random amount of time spent you mention here is the total amount of time for all the iterations up to the end of the next iteration, right?

if so, shouldn't you spend an average of 2.5*T only on the next iteration? and you still need to take into account the time spent up until the current iteration - 1*T (or less), so you should make T = normal_time/(1+2.5) so that you end up using the amount of time you want in the average,right?



#4 Álvaro   Crossbones+   -  Reputation: 11865

Like
0Likes
Like

Posted 22 December 2013 - 08:23 AM

The effective branching factor is not that far from constant in the case of chess, so I would just measure a bunch of test positions on different phases of the game and come up with a single number. In a game like Blokus, it is likely that the effective branching factor depends heavily on the phase of the game, so you may have to come up with a formula, but I would try to keep it simple.

By "time spent" I mean time spent searching all the iterations up to the current one, yes.

Here's a mathematical curiosity that might clear things up a bit. Let's define the "effective cumulative branching factor" as the expected value of the time spent searching all the depths up to d divided by the time spent searching all the depths up to d-1. If the time spent searching depth d by itself is x^d, then the ECBF is

(1+x+x^2+...+x^d)/(1+x+x^2+...+x^(d-1)) = ((x^(d+1)-1)/(x-1)) / ((x^d-1)/(x-1)) = (x^(d+1)-1) / (x^d-1) ~= x

So you see that the ECBF is roughly equal to x, so when we talk about effective branching factor you can think of it as being cumulative or not, and it doesn't make a huge difference, at least in situations where the time spent in a search as a function of depth is exponential.

By the way, you still haven't told us what kind of algorithm you are using. If you are using MCTS this whole discussion is kind of irrelevant.

Edited by Álvaro, 22 December 2013 - 08:24 AM.


#5 shishio64   Members   -  Reputation: 116

Like
0Likes
Like

Posted 23 December 2013 - 10:59 AM

 

The effective branching factor is not that far from constant in the case of chess, so I would just measure a bunch of test positions on different phases of the game and come up with a single number. In a game like Blokus, it is likely that the effective branching factor depends heavily on the phase of the game, so you may have to come up with a formula, but I would try to keep it simple.

 

what confuses me is how exactly do you measure the position's effective branching factor? do you run an iterative deepening search from that position and measure the ratio between the time it took for iteration d to complete and the time it took iteration d-1 to complete and you do that for each iteration of the deepening search? if so what do you do with that information, take the average of those ratios and consider this average the effective branching factor of that position?

 

this is an example of what i got in one of my experiments:

 

move no. 2
EBF of iter 1:  0.00018349495004   (the EBF here is not correct because iteration 1 has no previous iteration so i divided it by 1)
EBF of iter 2:  3.94405594404
EBF of iter 3:  2.80673758865
EBF of iter 4:  2.80227416298
EBF of iter 5:  8.09798617373
EBF of iter 6:  13.6630075718
EBF of iter 7:  4.82501298869
depth: 7

 

as you can see there are huge fluctuations in the EBF so i have no idea how to treat these numbers... (i calculated the EBF of iter d like so: time(iter d)/time(iter d-1))



#6 Álvaro   Crossbones+   -  Reputation: 11865

Like
1Likes
Like

Posted 23 December 2013 - 12:26 PM

Let's back up a bit. What you are interested in figuring out is whether it is worth it for you to start another iteration or not. For that you need to have some rough idea of how long the next iteration is likely to take. It turns out that chess programs that use reasonably good ordering heuristics, the time it takes to search up to depth d is well approximated by the time it took to search up to depth d-1 times a constant, which I call "effective branching factor". Of course there are situations where this goes completely haywire (e.g., when the move that was supposed to be the best all along turns out to be horrible and a lot of the information in the transposition tables is now useless, or when depth d-1 was almost free because the transposition tables had all the information).

 

It is possible that in your game things are less predictable than in modern chess engines. Still, you should be able to collect some data (from many positions, not just one) and make a histogram, which can help you decide where to cut off.

 

But the most important part of all my posts here is that you should test your strategy by playing lots of games. You'll probably have to play the games under fast time controls so the experiment can be finished in some reasonable amount of time.



#7 shishio64   Members   -  Reputation: 116

Like
0Likes
Like

Posted 25 December 2013 - 05:10 PM

Thanks for all your help Alvaro!

I implemented a time management system using your suggestions and it works nicely.

 

I have another improvement I would like to add to my alpha-beta search and would like to hear your suggestions:

Would it be a good idea to add a quiescence search to my cut-off test in an iterative deepening framework? At first I didn't even consider this option because I thought it could easily lead to an expensive aborted iteration because of time constraints.

If I do decide to add this improvement, should it be called recursively until a quite position is reached or only called once?

How should I determine the depth of the search?

And how should I decide if a position is noisy? I thought of calculating the difference in heuristic value between the current position and the previous position (the one we came from) and if it passes a certain threshold I would consider it noisy...



#8 Álvaro   Crossbones+   -  Reputation: 11865

Like
1Likes
Like

Posted 25 December 2013 - 09:45 PM

No, I don't think quiescence search makes much sense for Blokus. In chess you have lots of positions where the side to move can take a defended pawn with a queen at depth_left=1, and then you would evaluate an unreasonable position because the queen could be captured right away. So you need to do something about those positions, like give the side to move the opportunity to try some captures instead of just accepting the static evaluation. In Blokus I don't think an equivalent of this situation exists in, and I don't know what would be a move that changes things a lot (like a capture or a promotion do in chess).

 

Did you try MCTS instead of alpha-beta search? I really think it could be very good for Blokus, particularly in the 4-player version.



#9 shishio64   Members   -  Reputation: 116

Like
0Likes
Like

Posted 26 December 2013 - 03:24 AM

thanks, that's what i thought but wasn't sure...

 

this is an assignment given in my AI class after learning the alpha-beta algorithm and some improvements on it.

we haven't learned about MCTS and i don't think they expect us to use it istead of alpha-beta as an improvement to the player.

actually the question related to the improvements is:

"choose at least one improvement to the alpha-beta algorithm from the following that you think will be most effective in improving agent performance:

1. Time Management

2. Move Ordering

3. Selective Deepening (which includes Quiescence search - the only one we have learned in this category)

You are allowed to make additional improvements that do not appear on the above list."

 

I used Move Ordering, Transposition Table and Time Management because i thought they could improve the depth the player could look and also prevent himself from waisting time where he shouldn't. it turns out that they do allow the player to search deeper but not as much as i anticipated and even if he can look further he almost always returns the same move as the unimproved player (i guess it's heuristic dependent).

maybe there could be a better selective deepening scheme than quiesence search in my case?

maybe something that developes the search tree according to relevance (cutts off bad subtrees and developes promising subtrees)? or something like singular extensions that if he finds a really good move he keeps going deeper in it's subtree and maybe finds a winning position fast?



#10 Álvaro   Crossbones+   -  Reputation: 11865

Like
1Likes
Like

Posted 26 December 2013 - 08:47 AM

Move ordering is probably the best place to start. What heuristics have you tried? Without thinking about it too much, I would try the move from the transposition tables first, if there is one. Then two killer moves (if they are legal from the current position), then all moves sorted by history heuristic. At least that's probably a solid start. History heuristic will require tweaking because there are many details to determine.

 

I wouldn't try too hard to make the search selective, because that route rarely has been productive in my experience, and because the nature of Blokus is very combinatorial, so I don't expect good heuristics to determine good moves. Perhaps piece size? You probably don't want to be spending too much time thinking of the ways in which you could place the 1x1 piece early on. But alpha-beta with a few enhancements can often spend very little time rejecting such moves.

 

Singular extensions are hard to get right and can make your search space explode in some pathological positions if you are not careful. I wouldn't spend time with it either.

 

Something you haven't mentioned and which should be relatively easy to implement is late move reductions, or a similar scheme. You start by modifying your code to implement NegaScout, which is easy to do. Then change NegaScout so the zero-window search for moves other than the first move is performed at a reduced depth (reduced by one, probably), except for a selected class of moves that is too likely to be successful. That "selected class of moves" is typically something like captures, promotions, checks, check evasions... But for some games that I have tested, it is perfectly fine to reduce all moves other than the first, or perhaps you can try not reducing the killer moves. You should probably not reduce in nodes with depth<3 or depth<2.

 

If you want to make the game strong, once you have the basic search in place you should concentrate on finding good evaluation terms and tuning the evaluation function. But I don't know if that falls within the assignment.

 

EDIT: Do you have a mechanism to play matches to test different versions of your program? If you don't that's the first thing you should do, so you can measure progress.


Edited by Álvaro, 26 December 2013 - 08:48 AM.


#11 shishio64   Members   -  Reputation: 116

Like
0Likes
Like

Posted 26 December 2013 - 05:18 PM

in move ordering i only try the move from the transposition tables (the best move from previous iteration). i also use the transposition tables to store exact values or bounds in case we reach a state with depth lower or equal to the one stored in the TT (i used an old post of yours which was very helpful to do this :) ).

can you explain what killer moves are and how do they differ from moves stored by the history heuristic? i understand how the history heuristic works in general but i don't really know how to maintain it throughout the game. should i clear it after each move search or share the information between moves searches? i also don't really see how it could help here? in Blokus the same move cannot be applied more than once so wouldn't the information be useless? what tweaking would it require?

 

i will look into the late move reduction scheme you mentioned, but i think it's too advanced for me... i only know alpha-beta, i haven't even learned NegaMax so NegaScout might be a bit too much...

 

finding a good evaluation function is part of the assignment. the evaluation function im using takes into consideration square balance (which is actually the score balance computed at O(1)) and "corner balance". the problem with corner balance is that the value is computed at O(n) which slows search down a lot.

i was also thinking about "center control" (my player tries to go for the center) but it is also computed at O(n). do you have any idea for a simple yet powerful heuristic?

 

the first thing i did was to build a mechanism for matches and also a mechanism to run two players on the same states to really see the difference.






Old topic!
Guest, the last post of this topic is over 60 days old and at this point you may not reply in this topic. If you wish to continue this conversation start a new topic.



PARTNERS