Needed advice for an enjoyable AI in Buraco card game

Started by
5 comments, last by Luigi Lescarini 6 years, 6 months ago

Hi,

i’m trying to build an effective AI for the Buraco card game (2 and 4 players).

I want to avoid the heuristic approach : i’m not an expert of the game and for the last games i’ve developed this way i obtained mediocre results with that path.

I know the montecarlo tree search algorithm, i’ve used it for a checkers game with discrete result but I’m really confused by the recent success of other Machine Learning options.

For example i found this answer in stack overflow that really puzzles me, it says :
"So again: build a bot which can play against itself. One common basis is a function Q(S,a) which assigns to any game state and possible action of the player a value -- this is called Q-learning. And this function is often implemented as a neural network ... although I would think it does not need to be that sophisticated here.”

I’m very new to Machine Learning (this should be Reinforcement Learning, right?) and i only know a little of Q-learning but it sounds like a great idea: i take my bot, making play against itself and then it learns from its results… the problem is that i have no idea how to start! (and neither if this approach could be good or not).

Could you help me to get the right direction?
Is the Q-learning strategy a good one for my domain?
Is the Montecarlo still the best option for me?
Would it work well in a 4 players game like Buraco (2 opponents and 1 team mate)?
Is there any other method that i’m ignoring?

PS: My goal is to develop an enjoyable AI for a casual application, i can even consider the possibility to make the AI cheating for example by looking at the players hands or deck.  Even with this, ehm, permission i would not be able to build a good heuristic, i think :/

Thank you guys for your help!

Advertisement
  1. Don't do machine learning.
  2. If your results with heuristic approaches has been mediocre, then your heuristics were mediocre. This is a viable option when done right.
  3. MCTS is not machine learning... it is brute force search. This will certainly be a viable option since it can handle hidden information.

As always, however, my initial question to you is... how do you play the game? What information do you take into account when playing? How do you use that information to make your decisions? Therein lies your AI.

Dave Mark - President and Lead Designer of Intrinsic Algorithm LLC
Professional consultant on game AI, mathematical modeling, simulation modeling
Co-founder and 10 year advisor of the GDC AI Summit
Author of the book, Behavioral Mathematics for Game AI
Blogs I write:
IA News - What's happening at IA | IA on AI - AI news and notes | Post-Play'em - Observations on AI of games I play

"Reducing the world to mathematical equations!"

I only read the vague initial description of the game, but I think I got enough details to tell you what I think.

I think using reinforcement learning is quite reasonable for games with randomness and hidden state. I don't have much experience here, but I think I know how I would do it. First you need a fast simulator for the game. You can start with a strategy that picks an action at random to make sure that the simulator is working. You then define a "policy" to be a mapping from the state known to the agent (including the history of previous actions by all players) to a probability distribution over actions. This policy can be implemented as some sort of neural network, with a final SoftMax operation to make sure what is produced is indeed a probability distribution. If you initialize the weights so the neural network produces small outputs before the SoftMax, the SoftMax will in turn produce close to a uniform distribution. Start playing games. After each game, you can tweak the weights of the neural network to increase the probability of the actions taken by the winners and decrease the probability of the actions taken by the losers. In order to do this, you would need to be able to compute the derivative of the probability of producing a move with respect to each of the weights in the neural network, which is what backpropagation does.

If you had a database of games played by experts, you could train a policy to simply imitate their playing. In that case you would be using supervised learning, which is easier. You could even use supervised learning first and then tweak the resulting network using reinforcement learning. At some stage AlphaGo used this (although it's not the primary mechanism of how AlphaGo works; they just used this to generate a database of labeled positions that could be used to train their value network).

Monte Carlo search would be difficult to get to work, but not impossible. I can give you more details if you want to go down this route, but I don't recommend it. I will just say that using Monte Carlo search would still require a "policy", as described above. So you have to start there anyway.

This sounds like a fun project. I should do something like this with some other card game.

 

31 minutes ago, IADaveMark said:
  1. ...
  2. If your results with heuristic approaches has been mediocre, then your heuristics were mediocre. This is a viable option when done right.
  3. ...

As always, however, my initial question to you is... how do you play the game? What information do you take into account when playing? How do you use that information to make your decisions? Therein lies your AI.

Thanks for your reply.

I know that the problem in the heuristic approach is ... my heuristic :)

I developed 4-5 heuristic for other card games and i would discourage anyone taking that route : for any non trivial game this approach requires a LOT of tuning, a LOT of knowledge of the game (expert knowledge) and usually you end with a code that is very difficult to mantain or refactor. Let's add to this that the AI should play in a cooperative enviroment with another AI or a human...

When you ask me : "As always, however, my initial question to you is... how do you play the game? What information do you take into account when playing? How do you use that information to make your decisions? Therein lies your AI.", i suppose you are suggesting me to make some considerations to build a good heuristic, well, i think i already took in account those... but they fall in to the issues i underlined above.

I'm a beginner in this game, i know the rules perfectly and i played like 100 games, i will never be an expert and i do NOT want to become en expert, i want to be en expert developer, not gamer.
Having expert friends can help but not so much: they play as ... humans, most of their strategies come from their unconscious part of the brain and they, for sure, will forgot to cover some game scenarios when they try to teach you something. I've already done this for other games.

1 hour ago, alvaro said:

I only read the vague initial description of the game, but I think I got enough details to tell you what I think.

[...]

 
Hi Alvaro,
 
thanks for your reply!
 
As i said before i’m very new to ML so i need to be addressed in the steps that are related to it.
I’m quite motivated for this project and i’m not afraid to learn something new, so please suggest me the resources you think i need to learn.
 
Quote

 

>I only read the vague initial description of the game, but I think I got enough details to tell you what I think.
 
>I think using reinforcement learning is quite reasonable for games with randomness and hidden state. I don't have much experience here, but I think I know how I would do it. First you need a fast simulator for the game. You can start with a strategy that picks an action at random to make sure that the simulator is working.

 

 
 
I’ve already built a library/engine (in javascript) that plays complete matches (a match is a set of rounds) for 2 or 4 players, it has all the rules, the scoring, etc etc.
 
Instead of picking an action at random (as you suggest) i’ve built a very basic heuristic to make the players play. 
If you want to know more about it i think you know to have a better knowledge of the game.
I take this opportunity to underline that in this game the player has to take different decisions : on his turn he has to decide if drawing a card on the top of the deck or taking all the cards that were discarded ‘till that point.
Then he can make multiple plays until he decides to discard a card and pass the turn.
As you can see even this adds a bit of complexity.
 
 
Quote

>You then define a "policy" to be a mapping from the state known to the agent (including the history of previous actions by all players) to a probability distribution over actions. This policy can be implemented as some sort of neural network, with a final SoftMax operation to make sure what is produced is indeed a probability distribution. If you initialize the weights so the neural network produces small outputs before the SoftMax, the SoftMax will in turn produce close to a uniform distribution.

 

Can you elaborate on this? Unfortunately i do not understand this (my ML gaps! :/)

The only thing that i understand it’s that i need to store the history of the game state. In this moment my library has a state object that stores all the information needed to carry the game on in the current step, it deletes what is not needed. I can easily adjust this.

 

Quote


>Start playing games. After each game, you can tweak the weights of the neural network to increase the probability of the actions taken by the winners and decrease the probability of the actions taken by the losers. In order to do this, you would need to be able to compute the derivative of the probability of producing a move with respect to each of the weights in the neural network, which is what backpropagation does.

 

 

I think/hope this will become clearer when i will understand the step above :)
 
Quote

>If you had a database of games played by experts, you could train a policy to simply imitate their playing. In that case you would be using supervised learning, which is easier. You could even use supervised learning first and then tweak the resulting network using reinforcement learning. At some stage AlphaGo used this (although it's not the primary mechanism of how AlphaGo works; they just used this to generate a database of labeled positions that could be used to train their value network).

 

No, i have not.
 
Quote

>Monte Carlo search would be difficult to get to work, but not impossible. I can give you more details if you want to go down this route, but I don't recommend it. I will just say that using Monte Carlo search would still require a "policy", as described above. So you have to start there anyway.

 

If you think that the reinforcement learning can be a viable option i’m very excited to try it. I’m eager to learn it. Let’s focus on it. However i have to say that more details in the monte carlo approach interest me as well :)
 
Quote

>This sounds like a fun project. I should do something like this with some other card game.

 

You are very welcome to help me on this ? ? ? ?

Just an update.

I spent the last days trying to learn as much as possible about the ML world, i've watched a ton of videos, read lot of articles and got my hands dirty (just a little bit) with two practical courses (https://www.udemy.com/deeplearning/learn/v4/content) and (https://www.udemy.com/artificial-intelligence-az/learn/v4/content).
I have not finished them yet (of course!) and i have to say that i only scratched the surface of this very intriguing world, by the way i feel more comfortable with all the terms and the concepts that were used in this thread (they looked so obscure to me only some days ago).

Thanks to some replies in this thread and on a parallel thread i opened in Stack Overflow i see now that my project requires a very solid understanding of a big chunk of the ML stuff and it could also lead to unpleasant results.

As far as i understand the best strategy (intended as a mix of 'easy to implement' and 'acceptable results') would fall between heuristic and MCTS.

Please, feel free to comment my statement and any other kind of advice.
Thanks!

This topic is closed to new replies.

Advertisement