The way I like to implement time control is by dividing the problem in two sub-problems.
The first part is a quick computation at the beginning of the search, where we determine two numbers: How much we would like to spend on this move in normal circumstances, and how much is the absolute maximum we can use.
The second part consists of trying to obey the indications given by the first part during the search. How this second part is achieved depends on the details of the algorithm being used by the AI. If you are using MCTS
--or a similar algorithm that can be stopped at any time--, you simply run simulations until the normal time has elapsed, and then only continue running simulations if you determine that you are in an unstable situation (for instance, if the move with the best score and the move that has been examined the most times are not the same). If you ever get to the maximum time, you stop.
If your algorithm is something like alpha-beta search with iterative deepening (what is used in chess), you have to decide whether to start another iteration or not. This is where you may need to have an estimate of the effective branching factor (the time it takes to search depth d divided by the time it takes to search depth d-1). Let's say you always start a new iteration if the time spent is less than T, and your effective branching factor is 4. You will end up spending some random amount of time between T and 4*T, with an average of 2.5*T. So you should make T = normal_time / 2.5 so that you end up using the amount of time you want in the average. You would want to do that because interrupting an iteration of deepening is very undesirable: You will often not learn anything in a partial search like that, and you should only abort if you reach the maximum time. You can also try to spend more time if you have an indication that there is trouble (e.g., if the score for the best move drops below a certain amount (this can be implemented together with aspiration windows)), and perhaps spend less time if it is easy to determine that one move is obviously much better than the others.
With all that in mind, my recommendation is that you implement a very simplistic "first part", where you simply take the remaining time and divide it by the number of pieces you have in your hand that are not provably unplaceable, make that the normal time, and determine the maximum time so that you can always spend at least some fixed small amount of time per move in the future. Then do as good a job as you can in implementing the "second part".
Now you have a working time-control system and it's time to tweak it. By that I mean tweaking the "first part". Most engines benefit from spending more time in the early part of the game, so try to do that. An easy way to do this is to make the normal time be proportional to a power of the number of moves left. If the exponent is 0, that gets you back the basic behavior of evenly spending time through the game. If the exponent is 1, you'll try to spend 21 units of time initially of a total of 21+20+19+...+1 = 231. So instead of normal_time = total_time/21, you'll end up doing normal_time = total_time*(21/231) = total_time/11.
Now it's a matter of determining what the best setting for that exponent. The only way to do that is by playing lots of games and seeing what wins the most. You can play a match between several versions of your program, or you can have a reference program and try different versions of your program against it, or you can use something much fancier, like CLOP
. In any case, you will need to run at least several thousand games to gain some statistical significance. Putting together a good environment to test changes to your program is essential for all engine development beyond the "putting something together" phase, so it's an investment that will pay off in future development.
That should keep you busy for a while.