# How to account for position's history in transposition tables

This topic is 1416 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

## Recommended Posts

I'm currently developing a solver for a trick-based card game called Skat in a perfect information situation. Although most of the people may not know the game, please bear with me; my problem is of a general nature.

Short introduction to Skat:
Basically, each player plays one card alternatingly, and three cards form a trick. Every card has a specific value. The score that a player has achieved is the result of adding up the value of every card contained in the tricks that the respective player has won. I left out certain things that are unimportant for my problem, e.g. who plays against whom or when do I win a trick.
What we should keep in mind is that there is a running score, and who played what before when investigating a certain position (-> its history) is relevant to that score.

So that everyone can imagine better how the score develops thoughout a Skat game until all cards have been played, here's an example. The course of the game is displayed in the lower table, one trick per line. The actual score after each trick is on its left side, where +X is the declarer's score (-Y is the defending team's score, which is irrelevant for alpha beta). As I said, the winner of a trick (declarer or defending team) adds the value of each card in this trick to their score.

The card values are:

Card  J  A   10  K  Q  9  8  7
Value 2  11  10  4  3  0  0  0

If you have questions about Skat/its rules nonetheless, I would love to elaborate.

I have written an alpha beta algorithm in Java which seems to work fine, but it's way too slow. The first enhancement that seems the most promising is the use of a transposition table (TT). I read that when searching the tree of a Skat game, one will encounter a lot of positions that have already been investigated.

In theory, I figured, when I find a position that has already been investigated before, I need to substitute the "old" position's running score with the score of the current position. In other words: When I store a position in the TT, I take the optimal value that alpha beta returns, subtract the node's running score from it, and associate the difference with the position. To my understanding, this difference represents the maximum points that the declarer can achieve in this subtree, with the current position as the root. Analogously, when I find a previously investigated position, I return the sum of the previously stored value and the current position's running score.

Enough with the theory, here's how I implemented it in Java:

The storing routine of a node:

// "bestVal" is alpha or beta, depending on the player
// "tranpo" is a HashMap<Integer, int[]>
int val = bestVal - node.getScore();
int hash = logic.hashNode(node);
// "TT_VALID" is a flag constant
transpo.put(hash, new int[]{TT_VALID, val});

And the lookup:

int hash = logic.hashNode(node);
int[] sameState = transpo.get(hash);
if(sameState != null) {
int val = node.getScore() + sameState[1];
if(sameState[0] == TT_VALID) {
return val;
}
}

I'm only storing "VALID" nodes for now. Those are the nodes that haven't been pruned by alpha/beta.

The problem is, that it won't return the same (correct) results that the normal alpha beta algorithm returns. And the moves it suggests are obviously suboptimal, but not necessarily "absurd".

I've been debugging this for a while, without success. The hashing function seems to work correctly (I tested it by comparing allegedly identical nodes with their verbal represantations). More importantly, I don't suspect any error in the alpha beta algorithm itself, since it works perfectly without the TT. If there's a reason that my assumptions could be wrong, please let me know, and I'll post more code.

Given, my assumptions are correct, the problem has to be in the storing or lookup routine. What am I missing? Is my whole approach logically flawed? Or is it something technical I've overlooked?

Since I don't expect a magical solution to appear, I'm mainly asking about how I would debug this. Since the algorithm makes millions of recursive calls, I can't just let it run line by line and sit there with my calculator until I find the mysterious error. Where should I start debugging? Maybe some kind of back-to-back test with the working alpha beta function? Or what inconsistencies could I look for?

References:

There's almost no English sources out there that deal with AI in skat games, but I found this one: A Skat Player Based on Monte Carlo Simulation by Kupferschmid, Helmert. Unfortunately, the whole paper and especially the elaboration on transposition tables is rather compact.

Again, if there's something you would like to have clarified, don't hesitate to ask!

Edited by MCL

##### Share on other sites
I have played Skat before, actually. I don't see what part of Skat is a full-information game suitable for alpha-beta search, though...

Regarding transposition tables, it's very tricky to get right. Depending on whether the result of alpha-beta is alpha or below, beta or above, or between alpha and beta, the result is an upper bound, a lower bound, or the exact minimax score of the position, respectively. The transposition tables should store which case we found and when we get a hash hit that is not an exact score we should only update alpha or beta (whichever one is appropriate). Also, we should store the depth with which the node was searched, and only use the score if the depth from the hash tables is enough.

I am sure the chess programming wiki has detailed information about all of this.

##### Share on other sites

I don't see what part of Skat is a full-information game suitable for alpha-beta search, though...

Of course, a player is usually only able to see his own cards. I'm writing the program because I want to find out the optimal strategy for a certain deal. This comes in handy for instance for a post-mortem analysis of a game that was played before. Naturally, this won't be applicable for a real card playing engine where a player only knows his own cards. But that's not my goal anyway.

[...] the result is an upper bound, a lower bound, or the exact minimax score of the position [...]

Yes, I left out certain parts of my code. I do store the actual value of the subgame and a flag. In my code it's TT_VALID when the value is between alpha and bet. When a node gets pruned, the flag is TT_LBOUND or TT_UBOUND, respectively. Right now, I'm only storing valid nodes, but even this yields false results. Or are you suggesting that not working with bounds right now could be the cause for those results?

Also, we should store the depth with which the node was searched [...]

Actually, the depth is implicitely represented in the hash itself: I use a 32-bit integer, in which 30 bits represent the remaining cards (30 cards are in play, 2 are in the Skat). The other 2 bits represent the player to move. This means that if a hash is equal, every hand is the same; ergo, the depth must be the same, too. The way I see it, it only makes sense to hash only those nodes that conclude a trick (whenever three cards are on the table), because that's the earliest situation in which two nodes with different histories can represent the same situation.

You basically explained the concepts of transposition tables. Does this mean you suspect the problem in the implementation?

Edited by MCL

##### Share on other sites
The "depth" I am referring to here is the "depth left". Presumably you are not examining the whole tree exhaustively, but you have a depth limit and use a heuristic evaluation function at the leaves. If you are doing that, you might be using other common refinements, like extensions for forced moves, reductions for moves that show little promise, iterative deepening... All of these things can result in the same position being examined with different "depth left" at different times.

A partial implementation of transposition tables that only stores exact scores shouldn't break anything, but you have to be careful to only store at the right times. That is, whenever one of the successors has yielded a score better than alpha and none of them has yielded a score that is beta of better.

Yes, I suspect there is a problem with the implementation. As I said earlier, it is very tricky to get transposition tables right, even when you fully understand the theory behind them.

##### Share on other sites

I haven't considered any improvements other than a transposition table, and I'm not planning to consider them until the TT implementation is working. I currently am doing an exhaustive search, no heuristics involved. The same thing applies for your elaboration on "depth"; I'll consider that when I got the TT working in the first place.

[...] whenever one of the successors has yielded a score better than alpha and none of them has yielded a score that is beta of better.

Why is that? The way my implementation works (or should work) right now, is returning the exact score achieved in the respective subgame. Why not let the algorithm decide which node to pick when we return that value? Moreover, if we find a node that has been investigated before, alpha and beta can vary tremendously, depending on the history. Can we really effectively make assumptions for every other position that could come, when storing a position in the TT?

Yes, I suspect there is a problem with the implementation.

What do you suggest doing? Is there something wrong with the code? How can I debug such things?

##### Share on other sites

I haven't considered any improvements other than a transposition table, and I'm not planning to consider them until the TT implementation is working. I currently am doing an exhaustive search, no heuristics involved. The same thing applies for your elaboration on "depth"; I'll consider that when I got the TT working in the first place.

If you are doing exhaustive search, everything I said about depth doesn't matter.

[...] whenever one of the successors has yielded a score better than alpha and none of them has yielded a score that is beta of better.

Why is that? The way my implementation works (or should work) right now, is returning the exact score achieved in the respective subgame.

If that's the case, you are not using alpha-beta search: You are doing straight minimax search. Alpha-beta doesn't always return exact scores.

Why not let the algorithm decide which node to pick when we return that value?

I don't know what that sentence means.

Moreover, if we find a node that has been investigated before, alpha and beta can vary tremendously, depending on the history.

Yes, of course.

Can we really effectively make assumptions for every other position that could come, when storing a position in the TT?

You can store what you have discovered, so it can be used in the future. You are not assuming anything.

Yes, I suspect there is a problem with the implementation.

What do you suggest doing? Is there something wrong with the code? How can I debug such things?

Start by only storing an exact score when it is actually the case that the value obtained by the search is an exact score.

Why are you storing the difference between bestEval and node.getScore() instead of just storing bestEval? What's the point of node.getScore() anyway if you are doing exhaustive search?

##### Share on other sites

If that's the case, you are not using alpha-beta search: You are doing straight minimax search. Alpha-beta doesn't always return exact scores.

I am currently only storing nodes that haven't been pruned/cut off. In this case we do get the minimax value, don't we?

Why are you storing the difference between bestEval and node.getScore() instead of just storing bestEval? What's the point of node.getScore() anyway if you are doing exhaustive search?

I am doing the subtraction in order to "decouple" the subtree from its history. In my opinion, this difference will express: "How many points can the declarer achieve, starting from this position?"

If I don't do the subtraction, what am I supposed to do with the value I retrieve from the TT? Simply return that value in case it's valid? In all likelihood, the running score of those nodes will be different due to their different histories. Consequentially, the points the declarer achieved in the other situation won't be the same he will achieve in this situation. That's why I store the "points achievable by the declarer, starting from the current position"; and when I find this position later in the TT, I take the current score and add those "achievable points from this position"; which should result in the minimax value, if the flag is VALID.

Just to be sure, I tested it by only storing bestVal in the TT and returning this value in the lookup routine, and both the resulting values and strategy were far off from the optimal course of play. More importantly, the final value that alpha beta returned wasn't consistent with the real score suggested by the optimal strategy.

Storing routine:

int hash = logic.hashNode(node);
transpo.put(hash, new int[]{TT_VALID, bestVal});

Lookup routine:

int hash = logic.hashNode(node);
int[] sameState = transpo.get(hash);
if(sameState != null) {
if(sameState[0] == TT_VALID) {
return sameState[1];
}
}
Edited by MCL

##### Share on other sites

If that's the case, you are not using alpha-beta search: You are doing straight minimax search. Alpha-beta doesn't always return exact scores.

I am currently only storing nodes that haven't been pruned/cut off. In this case we do get the minimax value, don't we?

No, if you didn't find a successor that was better than alpha, you only know that the minimax score of this node is alpha or worse.

Why are you storing the difference between bestEval and node.getScore() instead of just storing bestEval? What's the point of node.getScore() anyway if you are doing exhaustive search?

I am doing the subtraction in order to "decouple" the subtree from its history. In my opinion, this difference will express: "How many points can the declarer achieve, starting from this position?"

Yes, you are right about storing the difference between search result and current score. Sorry, I forgot you described this in your first post. Edited by Álvaro

##### Share on other sites

No, if you didn't find a successor that was better than alpha, you only know that the minimax score of this node is alpha or worse.

I taxed my brain, and I think i got it. Only because the node itself wasn't pruned, doesn't neccessarily mean that subsequent nodes further down the tree haven't been pruned, right?

To your suggestion of only storing the successors better than alpha: Thanks! It works when I do it this way!

The function returns the same value and strategy as normal alpha beta, but nearly 2.5 times faster, and depending on the deal, there are between 80 and 500 million positive lookups. I still don't quite get why that is. Could you please elaborate a bit why I should only store values better than alpha?

Also: How would I go about storing pruned nodes? I understand the concept of bounds, but should I make similar constraints as you suggested for VALID nodes?

And last but not least a basic problem: As I said earlier, I only store nodes that conclude a trick, since that's the earliest moment, I could possibly find a match. Do you agree with that? If yes, can I make the same assumption for prund nodes?

##### Share on other sites
My intuition is that it would be better to store all nodes, not just the ones that conclude a trick. But these things can only be settled by testing.

If you want to store and use bounds, I'll try to explain the technique as simply as I can. Let's start with alpha-beta without transposition tables:
int negamax(Position P, int alpha, int beta) {
if (P.is_terminal())
return P.score_from_the_point_of_view_of_player_to_move();

for (Move move : P.legal_moves()) {
P.make_move(move);
int score = -negamax(P, -beta, -alpha);
P.undo_move(move);
if (score > alpha) {
alpha = score;
if (score >= beta)
break;
}
}

return alpha;
}

Now we have to modify in two ways: To retrieve and use data from the transposition table, and to store it.

int negamax(Position P, int alpha, int beta) {
if (P.is_terminal())
return P.score_from_the_point_of_view_of_player_to_move();

int hash_lower_bound;
int hash_upper_bound;
bool hash_hit = TT.retrieve(P.get_hash(), &hash_lower_bound, &hash_upper_bound);
if (hash_hit) {
if (alpha < hash_lower_bound)
alpha = hash_lower_bound;
if (beta > hash_upper_bound)
beta = hash_upper_bound;
if (alpha >= beta)
return alpha;
}

BoundType bound_type = UpperBound;

for (Move move : P.legal_moves()) {
P.make_move(move);
int score = -negamax(P, -beta, -alpha);
P.undo_move(move);
if (score > alpha) {
alpha = score;
bound_type = Exact;
if (score >= beta) {
bound_type = LowerBound;
break;
}
}
}

TT.store_hash(P.get_hash(), bound_type, alpha);

return alpha;
}

[EDIT: In case the pseudo-C++ above doesn't make it clear enough, I suggest you store a lower bound and an upper bound in the transposition table. These can possibly be -infinity or +infinity, of course.]

Let me know if anything is not clear or doesn't make sense to you.

You should also spend some effort working on move ordering. Trying promising moves first can effect a dramatic reduction in the time it takes to explore a tree using alpha-beta search. In particular, you should probably store the best move in the transposition table, because you can explore that move first the next time you explore this node again, even if the bounds you found aren't good enough to stop the search. Some sort of killer heuristic is probably also very effective. Edited by Álvaro

##### Share on other sites

Thanks for your detailed answer. I just came home from an exhausting day at work, and it seems that there are a whole lot more to come; so I'll have to look into it as soon as I get some "room to breathe".

By a quick look, I noticed one thing that's unclear:

You store a position in the TT by storing the bound type and alpha:

TT.store_hash(P.get_hash(), bound_type, alpha);

But later, you retrieve what seems to be the lower bound and upper bound:

TT.retrieve(P.get_hash(), &hash_lower_bound, &hash_upper_bound);

Where has the bound type gone? And where is the lower bound coming from?

Edited by MCL

##### Share on other sites

This is roughly what I had in mind:

void TranspositionTable::store_hash(HashKey hash_key, BoundType bound_type, int score) {
TT_Entry &entry = find_entry(hash_key); // If not found, this sets up an entry with lower_bound=-Infinity, upper_bound=+Infinity

switch (bound_type) {
case Exact:
entry.lower_bound = entry.upper_bound = score;
break;
case LowerBound:
entry.lower_bound = score;
break;
case UpperBound:
entry.upper_bound = score;
}
}

Edited by Álvaro

##### Share on other sites

Okay, I made some progress. I designed the TT to store nodes from all depths (not just the beginning of a trick), resulting in a speed improvement between the factor 1.5 and 2.5, increasing JVM memory usage to about 780 MB in average.

The example you provided I believe I already have implemented basically.

I am currently giving move ordering a try, but surprisingly, it slows down the execution dramatically. I took the following suggestion from the reference in my first post:

If there is currently no card on the table, we prefer playing a suit of which the other players hold at least one card (so that they must follow suit), but only few cards (so that their choice is limited).More to the point, for each card we multiply the number of allowed answers for the other two players, preferring cards which minimize this value. Within a suit, cards of higher rank are preferred.

I now also store the index of the move (from the respective order that the move ordering function returns) in the TT, that has led to a cutoff. When I investigate a corresponding node, I start at this index (I simply ignore every move with a lower index). In other words, when a node has been pruned before, I only investigate the move that caused the cutoff and every subsequent move, as suggested by the move ordering function.

I believe, my move ordering function itself is pretty efficient, so I'd like to rule that out as a problem. But something must really go wrong if it not just doesn't improve things, but slows them down so severely. Any ideas to what the problem may be?

##### Share on other sites
Does it slow down the number of nodes visited per second, or does it increase the number of nodes it takes to finish a particular search?

##### Share on other sites

I have to put this into perspective. First of all, I improved the move ordering function, which speeds it up a couple percent. Second of all, I noticed that move ordering will be more efficient if the analyzed game is "complex" (I believe extreme distributions of suits/trumps could play a big role). Generally, when alpha beta without move ordering takes about 6-8 seconds of CPU time, move ordering will "break even"; that is, move ordering is faster when the game takes longer to be analyzed.

To your question: The number of visited nodes seems to be reduced by the factor of 1.5, the number of TT lookups is reduced to almost 50%. Interestingly enough, the more optimization techniques I implement, the faster the CPU runtime (the execution time of the main thread) gets, but the bigger the ratio of real time / CPU time will be. I believe this is due to the JVM having other threads running (e.g. garbabe collection); especially since I can see the CPU load (quad-core) peak to about 50%, which means that secondary threads fully utilize their own core.

Anyway, another thing I've tried is to store the move ordering in the transposition table (not just the best move itself). Unfortunately, that will not just blow up RAM consumption, but also leads to inferior execution speeds, although memory wasn't the bottleneck. Weird! Maybe it's JVM's fault ;)

Currently, I'm trying to find a good way to reproduce the "optimal strategy". In other words, I want to keep track of the path from the root to the leaf that represents the optimal course of play. Since every succesful TT lookup will lead to a "shortcut", the algorithm will eventually just cut off the optimal path and I'm left with a lonely position somewhere in the middle of the tree. To my understanding, every move along the optimal path will be stored in the TT as a valid node. Consequentially, I store the best move for VALID nodes. When alphaBetaTT has finished, I go as far down as possible (that is, as long as I could keep track of the optimal moves, until a TT cutoff occured). From there, I just look up all the optimal moves in the TT until I reach a leaf.

I hope I made myself clear enough! If yes, does that plan make sense to you?

I'll post my results asap, just wanted to let you know what I've achieved in the meantime and what I've been thinking about ;)

Edited by MCL

##### Share on other sites
A reasonable alpha-beta searcher should not perform any dynamic memory allocation at all, so garbage collection should not even enter the picture. However, I understand that you may not have enough control of memory management to achieve a no-dynamic-memory implementation. I don't know why anyone puts up with the idiosyncrasies of programming in Java. My advice, switch to C++.

The "optimal strategy" you are trying to recover is usually called the "principal variation". There is some good information here. You can simply return the first move, then make it and search again. If your transposition tables are working properly, the next search should take almost no time.

##### Share on other sites

I spent the last days reading in C++, and I think I know enough to port my project. But there's one thing I just can't get to work. In order to import Skat games, I want to parse HTML documents of previously played games, like in the link from my first post. And I thought this would be a good starting point to learn the language, since I'm very fond of HTML parsing. I know this question doesn't belong here, but I searched the whole web, and still didn't succeed.

All I want to do is import the library htmlcxx and use it. But how do I do that? I'm using Visual Studio Express 2013 on Windows. I've tried adding the library directory to the include directories, importing it into my project etc. but I'm still getting weird Linker errors. It can't be that hard, can it? Maybe you could give me a hint ;[

##### Share on other sites
I don't use Visual Studio or even Windows, so I can't help you there. You'll get better help if you start a separate thread in either For Beginners or in General Programming. And please post the error messages you are getting. They might be cryptic to you, but they might mean something to others in the forum.