How to account for position's history in transposition tables

MCL · 2014-03-09T14:12:15

I'm currently developing a solver for a trick-based card game called Skat in a perfect information situation. Although most of the people may not know the game, please bear with me; my problem is of a general nature. Short introduction to Skat: Basically, each player plays one card alternatingly, and three cards form a trick. Every card has a specific value. The score that a player has achieved is the result of adding up the value of every card contained in the tricks that the respective player has won. I left out certain things that are unimportant for my problem, e.g. who plays against whom or when do I win a trick. What we should keep in mind is that there is a running score, and who played what before when investigating a certain position (-> its history) is relevant to that score. So that everyone can imagine better how the score develops thoughout a Skat game until all cards have been played, here's an example. The course of the game is displayed in the lower table, one trick per line. The actual score after each trick is on its left side, where +X is the declarer's score (-Y is the defending team's score, which is irrelevant for alpha beta). As I said, the winner of a trick (declarer or defending team) adds the value of each card in this trick to their score. The card values are: Card J A 10 K Q 9 8 7 Value 2 11 10 4 3 0 0 0 If you have questions about Skat/its rules nonetheless, I would love to elaborate. I have written an alpha beta algorithm in Java which seems to work fine, but it's way too slow. The first enhancement that seems the most promising is the use of a transposition table (TT). I read that when searching the tree of a Skat game, one will encounter a lot of positions that have already been investigated. In theory, I figured, when I find a position that has already been investigated before, I need to substitute the "old" position's running score with the score of the current position. In other words: When I store a position in the TT, I take the optimal value that alpha beta returns, subtract the node's running score from it, and associate the difference with the position. To my understanding, this difference represents the maximum points that the declarer can achieve in this subtree, with the current position as the root. Analogously, when I find a previously investigated position, I return the sum of the previously stored value and the current position's running score. Enough with the theory, here's how I implemented it in Java: The storing routine of a node: // "bestVal" is alpha or beta, depending on the player // "tranpo" is a HashMap<Integer, int[]> int val = bestVal - node.getScore(); int hash = logic.hashNode(node); // "TT_VALID" is a flag constant transpo.put(hash, new int[]{TT_VALID, val}); And the lookup: int hash = logic.hashNode(node); int[] sameState = transpo.get(hash); if(sameState != null) { int val = node.getScore() + sameState[1]; if(sameState[0] == TT_VALID) { return val; } } I'm only storing "VALID" nodes for now. Those are the nodes that haven't been pruned by alpha/beta. The problem is, that it won't return the same (correct) results that the normal alpha beta algorithm returns. And the moves it suggests are obviously suboptimal, but not necessarily "absurd". I've been debugging this for a while, without success. The hashing function seems to work correctly (I tested it by comparing allegedly identical nodes with their verbal represantations). More importantly, I don't suspect any error in the alpha beta algorithm itself, since it works perfectly without the TT. If there's a reason that my assumptions could be wrong, please let me know, and I'll post more code. Given, my assumptions are correct, the problem has to be in the storing or lookup routine. What am I missing? Is my whole approach logically flawed? Or is it something technical I've overlooked? Since I don't expect a magical solution to appear, I'm mainly asking about how I would debug this. Since the algorithm makes millions of recursive calls, I can't just let it run line by line and sit there with my calculator until I find the mysterious error. Where should I start debugging? Maybe some kind of back-to-back test with the working alpha beta function? Or what inconsistencies could I look for? References: There's almost no English sources out there that deal with AI in skat games, but I found this one: A Skat Player Based on Monte Carlo Simulation by Kupferschmid, Helmert. Unfortunately, the whole paper and especially the elaboration on transposition tables is rather compact. Again, if there's something you would like to have clarified, don't hesitate to ask!

Artificial Intelligence Programming

Started by MCL February 22, 2014 03:15 PM

16 comments, last by alvaro 10 years, 1 month ago

MCL

129

Author

February 24, 2014 08:32 PM

Thanks for your detailed answer. I just came home from an exhausting day at work, and it seems that there are a whole lot more to come; so I'll have to look into it as soon as I get some "room to breathe".

By a quick look, I noticed one thing that's unclear:

You store a position in the TT by storing the bound type and alpha:


TT.store_hash(P.get_hash(), bound_type, alpha);

But later, you retrieve what seems to be the lower bound and upper bound:


TT.retrieve(P.get_hash(), &hash_lower_bound, &hash_upper_bound);

Where has the bound type gone? And where is the lower bound coming from?

alvaro

21,604

February 24, 2014 08:54 PM

This is roughly what I had in mind:


void TranspositionTable::store_hash(HashKey hash_key, BoundType bound_type, int score) {
  TT_Entry &entry = find_entry(hash_key); // If not found, this sets up an entry with lower_bound=-Infinity, upper_bound=+Infinity
 
  switch (bound_type) {
    case Exact:
      entry.lower_bound = entry.upper_bound = score;
      break;
    case LowerBound:
      entry.lower_bound = score;
       break;
    case UpperBound:
      entry.upper_bound = score;
  }
}

MCL

129

Author

March 02, 2014 12:59 PM

Okay, I made some progress. I designed the TT to store nodes from all depths (not just the beginning of a trick), resulting in a speed improvement between the factor 1.5 and 2.5, increasing JVM memory usage to about 780 MB in average.

The example you provided I believe I already have implemented basically.

I am currently giving move ordering a try, but surprisingly, it slows down the execution dramatically. I took the following suggestion from the reference in my first post:

If there is currently no card on the table, we prefer playing a suit of which the other players hold at least one card (so that they must follow suit), but only few cards (so that their choice is limited).More to the point, for each card we multiply the number of allowed answers for the other two players, preferring cards which minimize this value. Within a suit, cards of higher rank are preferred.

I now also store the index of the move (from the respective order that the move ordering function returns) in the TT, that has led to a cutoff. When I investigate a corresponding node, I start at this index (I simply ignore every move with a lower index). In other words, when a node has been pruned before, I only investigate the move that caused the cutoff and every subsequent move, as suggested by the move ordering function.

I believe, my move ordering function itself is pretty efficient, so I'd like to rule that out as a problem. But something must really go wrong if it not just doesn't improve things, but slows them down so severely. Any ideas to what the problem may be?

alvaro

21,604

March 02, 2014 05:34 PM

Does it slow down the number of nodes visited per second, or does it increase the number of nodes it takes to finish a particular search?

MCL

129

Author

March 02, 2014 07:59 PM

I have to put this into perspective. First of all, I improved the move ordering function, which speeds it up a couple percent. Second of all, I noticed that move ordering will be more efficient if the analyzed game is "complex" (I believe extreme distributions of suits/trumps could play a big role). Generally, when alpha beta without move ordering takes about 6-8 seconds of CPU time, move ordering will "break even"; that is, move ordering is faster when the game takes longer to be analyzed.

To your question: The number of visited nodes seems to be reduced by the factor of 1.5, the number of TT lookups is reduced to almost 50%. Interestingly enough, the more optimization techniques I implement, the faster the CPU runtime (the execution time of the main thread) gets, but the bigger the ratio of real time / CPU time will be. I believe this is due to the JVM having other threads running (e.g. garbabe collection); especially since I can see the CPU load (quad-core) peak to about 50%, which means that secondary threads fully utilize their own core.

Anyway, another thing I've tried is to store the move ordering in the transposition table (not just the best move itself). Unfortunately, that will not just blow up RAM consumption, but also leads to inferior execution speeds, although memory wasn't the bottleneck. Weird! Maybe it's JVM's fault ;)

Currently, I'm trying to find a good way to reproduce the "optimal strategy". In other words, I want to keep track of the path from the root to the leaf that represents the optimal course of play. Since every succesful TT lookup will lead to a "shortcut", the algorithm will eventually just cut off the optimal path and I'm left with a lonely position somewhere in the middle of the tree. To my understanding, every move along the optimal path will be stored in the TT as a valid node. Consequentially, I store the best move for VALID nodes. When alphaBetaTT has finished, I go as far down as possible (that is, as long as I could keep track of the optimal moves, until a TT cutoff occured). From there, I just look up all the optimal moves in the TT until I reach a leaf.

I hope I made myself clear enough! If yes, does that plan make sense to you?

I'll post my results asap, just wanted to let you know what I've achieved in the meantime and what I've been thinking about ;)

alvaro

21,604

March 02, 2014 10:53 PM

A reasonable alpha-beta searcher should not perform any dynamic memory allocation at all, so garbage collection should not even enter the picture. However, I understand that you may not have enough control of memory management to achieve a no-dynamic-memory implementation. I don't know why anyone puts up with the idiosyncrasies of programming in Java. My advice, switch to C++.

The "optimal strategy" you are trying to recover is usually called the "principal variation". There is some good information here. You can simply return the first move, then make it and search again. If your transposition tables are working properly, the next search should take almost no time.

MCL

129

Author

March 09, 2014 01:06 PM

My advice, switch to C++.

I spent the last days reading in C++, and I think I know enough to port my project. But there's one thing I just can't get to work. In order to import Skat games, I want to parse HTML documents of previously played games, like in the link from my first post. And I thought this would be a good starting point to learn the language, since I'm very fond of HTML parsing. I know this question doesn't belong here, but I searched the whole web, and still didn't succeed.

All I want to do is import the library htmlcxx and use it. But how do I do that? I'm using Visual Studio Express 2013 on Windows. I've tried adding the library directory to the include directories, importing it into my project etc. but I'm still getting weird Linker errors. It can't be that hard, can it? Maybe you could give me a hint ;[

alvaro

21,604

March 09, 2014 02:12 PM

I don't use Visual Studio or even Windows, so I can't help you there. You'll get better help if you start a separate thread in either For Beginners or in General Programming. And please post the error messages you are getting. They might be cryptic to you, but they might mean something to others in the forum.

How to account for position's history in transposition tables

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

How to account for position's history in transposition tables

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

Reticulating splines