Luigi Lescarini

R&D Needed advice for an enjoyable AI in Buraco card game

Recommended Posts

Hi,

i’m trying to build an effective AI for the Buraco card game (2 and 4 players).

I want to avoid the heuristic approach : i’m not an expert of the game and for the last games i’ve developed this way i obtained mediocre results with that path.

I know the montecarlo tree search algorithm, i’ve used it for a checkers game with discrete result but I’m really confused by the recent success of other Machine Learning options.

For example i found this answer in stack overflow that really puzzles me, it says :
"So again: build a bot which can play against itself. One common basis is a function Q(S,a) which assigns to any game state and possible action of the player a value -- this is called Q-learning. And this function is often implemented as a neural network ... although I would think it does not need to be that sophisticated here.”

I’m very new to Machine Learning (this should be Reinforcement Learning, right?) and i only know a little of Q-learning but it sounds like a great idea: i take my bot, making play against itself and then it learns from its results… the problem is that i have no idea how to start! (and neither if this approach could be good or not).

Could you help me to get the right direction?
Is the Q-learning strategy a good one for my domain?
Is the Montecarlo still the best option for me?
Would it work well in a 4 players game like Buraco (2 opponents and 1 team mate)?
Is there any other method that i’m ignoring?

PS: My goal is to develop an enjoyable AI for a casual application, i can even consider the possibility to make the AI cheating for example by looking at the players hands or deck.  Even with this, ehm, permission i would not be able to build a good heuristic, i think :/

Thank you guys for your help!

Share this post


Link to post
Share on other sites
  1. Don't do machine learning.
  2. If your results with heuristic approaches has been mediocre, then your heuristics were mediocre. This is a viable option when done right.
  3. MCTS is not machine learning... it is brute force search. This will certainly be a viable option since it can handle hidden information.

As always, however, my initial question to you is... how do you play the game? What information do you take into account when playing? How do you use that information to make your decisions? Therein lies your AI.

Share this post


Link to post
Share on other sites

I only read the vague initial description of the game, but I think I got enough details to tell you what I think.

I think using reinforcement learning is quite reasonable for games with randomness and hidden state. I don't have much experience here, but I think I know how I would do it. First you need a fast simulator for the game. You can start with a strategy that picks an action at random to make sure that the simulator is working. You then define a "policy" to be a mapping from the state known to the agent (including the history of previous actions by all players) to a probability distribution over actions. This policy can be implemented as some sort of neural network, with a final SoftMax operation to make sure what is produced is indeed a probability distribution. If you initialize the weights so the neural network produces small outputs before the SoftMax, the SoftMax will in turn produce close to a uniform distribution. Start playing games. After each game, you can tweak the weights of the neural network to increase the probability of the actions taken by the winners and decrease the probability of the actions taken by the losers. In order to do this, you would need to be able to compute the derivative of the probability of producing a move with respect to each of the weights in the neural network, which is what backpropagation does.

If you had a database of games played by experts, you could train a policy to simply imitate their playing. In that case you would be using supervised learning, which is easier. You could even use supervised learning first and then tweak the resulting network using reinforcement learning. At some stage AlphaGo used this (although it's not the primary mechanism of how AlphaGo works; they just used this to generate a database of labeled positions that could be used to train their value network).

Monte Carlo search would be difficult to get to work, but not impossible. I can give you more details if you want to go down this route, but I don't recommend it. I will just say that using Monte Carlo search would still require a "policy", as described above. So you have to start there anyway.

This sounds like a fun project. I should do something like this with some other card game.

 

Share this post


Link to post
Share on other sites
31 minutes ago, IADaveMark said:
  1. ...
  2. If your results with heuristic approaches has been mediocre, then your heuristics were mediocre. This is a viable option when done right.
  3. ...

As always, however, my initial question to you is... how do you play the game? What information do you take into account when playing? How do you use that information to make your decisions? Therein lies your AI.

Thanks for your reply.

I know that the problem in the heuristic approach is ... my heuristic :)

I developed 4-5 heuristic for other card games and i would discourage anyone taking that route : for any non trivial game this approach requires a LOT of tuning, a LOT of knowledge of the game (expert knowledge) and usually you end with a code that is very difficult to mantain or refactor. Let's add to this that the AI should play in a cooperative enviroment with another AI or a human...

When you ask me : "As always, however, my initial question to you is... how do you play the game? What information do you take into account when playing? How do you use that information to make your decisions? Therein lies your AI.", i suppose you are suggesting me to make some considerations to build a good heuristic, well, i think i already took in account those... but they fall in to the issues i underlined above.

I'm a beginner in this game, i know the rules perfectly and i played like 100 games, i will never be an expert and i do NOT want to become en expert, i want to be en expert developer, not gamer.
Having expert friends can help but not so much: they play as ... humans, most of their strategies come from their unconscious part of the brain and they, for sure, will forgot to cover some game scenarios when they try to teach you something. I've already done this for other games.

Share this post


Link to post
Share on other sites
1 hour ago, alvaro said:

I only read the vague initial description of the game, but I think I got enough details to tell you what I think.

[...]

 
Hi Alvaro,
 
thanks for your reply!
 
As i said before i’m very new to ML so i need to be addressed in the steps that are related to it.
I’m quite motivated for this project and i’m not afraid to learn something new, so please suggest me the resources you think i need to learn.
 
Quote

 

>I only read the vague initial description of the game, but I think I got enough details to tell you what I think.
 
>I think using reinforcement learning is quite reasonable for games with randomness and hidden state. I don't have much experience here, but I think I know how I would do it. First you need a fast simulator for the game. You can start with a strategy that picks an action at random to make sure that the simulator is working.

 

 
 
I’ve already built a library/engine (in javascript) that plays complete matches (a match is a set of rounds) for 2 or 4 players, it has all the rules, the scoring, etc etc.
 
Instead of picking an action at random (as you suggest) i’ve built a very basic heuristic to make the players play. 
If you want to know more about it i think you know to have a better knowledge of the game.
I take this opportunity to underline that in this game the player has to take different decisions : on his turn he has to decide if drawing a card on the top of the deck or taking all the cards that were discarded ‘till that point.
Then he can make multiple plays until he decides to discard a card and pass the turn.
As you can see even this adds a bit of complexity.
 
 
Quote

>You then define a "policy" to be a mapping from the state known to the agent (including the history of previous actions by all players) to a probability distribution over actions. This policy can be implemented as some sort of neural network, with a final SoftMax operation to make sure what is produced is indeed a probability distribution. If you initialize the weights so the neural network produces small outputs before the SoftMax, the SoftMax will in turn produce close to a uniform distribution.

 

Can you elaborate on this? Unfortunately i do not understand this (my ML gaps! :/)

The only thing that i understand it’s that i need to store the history of the game state. In this moment my library has a state object that stores all the information needed to carry the game on in the current step, it deletes what is not needed. I can easily adjust this.

 

Quote


>Start playing games. After each game, you can tweak the weights of the neural network to increase the probability of the actions taken by the winners and decrease the probability of the actions taken by the losers. In order to do this, you would need to be able to compute the derivative of the probability of producing a move with respect to each of the weights in the neural network, which is what backpropagation does.

 

 

I think/hope this will become clearer when i will understand the step above :)
 
Quote

>If you had a database of games played by experts, you could train a policy to simply imitate their playing. In that case you would be using supervised learning, which is easier. You could even use supervised learning first and then tweak the resulting network using reinforcement learning. At some stage AlphaGo used this (although it's not the primary mechanism of how AlphaGo works; they just used this to generate a database of labeled positions that could be used to train their value network).

 

No, i have not.
 
Quote

>Monte Carlo search would be difficult to get to work, but not impossible. I can give you more details if you want to go down this route, but I don't recommend it. I will just say that using Monte Carlo search would still require a "policy", as described above. So you have to start there anyway.

 

If you think that the reinforcement learning can be a viable option i’m very excited to try it. I’m eager to learn it. Let’s focus on it. However i have to say that more details in the monte carlo approach interest me as well :)
 
Quote

>This sounds like a fun project. I should do something like this with some other card game.

 

You are very welcome to help me on this 😃 😃 😃 😃

Share this post


Link to post
Share on other sites

Just an update.

I spent the last days trying to learn as much as possible about the ML world, i've watched a ton of videos, read lot of articles and got my hands dirty (just a little bit) with two practical courses (https://www.udemy.com/deeplearning/learn/v4/content) and (https://www.udemy.com/artificial-intelligence-az/learn/v4/content).
I have not finished them yet (of course!) and i have to say that i only scratched the surface of this very intriguing world, by the way i feel more comfortable with all the terms and the concepts that were used in this thread (they looked so obscure to me only some days ago).

Thanks to some replies in this thread and on a parallel thread i opened in Stack Overflow i see now that my project requires a very solid understanding of a big chunk of the ML stuff and it could also lead to unpleasant results.

As far as i understand the best strategy (intended as a mix of 'easy to implement' and 'acceptable results') would fall between heuristic and MCTS.

Please, feel free to comment my statement and any other kind of advice.
Thanks!

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now


  • Announcements

  • Forum Statistics

    • Total Topics
      628388
    • Total Posts
      2982403
  • Similar Content

    • By rXpSwiss
      Hello,
      I am sending compressed json data from the UE4 client to a C++ server made with boost.
      I am using ZLib to compress and decompress all json but it doesn't work. I am now encoding it in base64 to avoid some issues but that doesn't change a thing.
      I currently stopped trying to send the data and I am writing it in a file from the client and trying to read the file and decompress on the server side.
      When the server is trying to decompress it I get an error from ZLib : zlib error: iostream error
      My question is the following : Did anyone manage to compress and decompress data between a UE4 client and a C++ server ?
      I cannot really configure anything on the server side (because boost has its ZLib compressor) and I don't know what is wrong with the decompression.
      Any idea ?
      rXp
    • By NoobDevSensei
      Hey all im Norvik i really like games but recently want to create them yes i know it takes a lot of people to make a single game but right now im just focusing on the 3d aspect of it i tried c# i kinda understood a bit of it but it didnt catch my attention right now im doing 3d work in Blender and later on i would like to at least prototype my ideas into Unity the thing is i think i hate coding and the other day i saw something called Uscript (visual Scripting) can someone tell me if thats a good way to go ?
    • By Connor Rust
      I am currently attempting to make a navigation mesh for our 2D top down game, which is a multiplayer game using Node.js as the server communication. At the moment, I have implemented A* over an obstacle hardnessmap, which is awfully slow and laggy at times when we test our game on Heroku. I have been trying to find an algorithm to automatically generate the navmesh after map creation, instead of me having to do this manually. I am currently attempting to use Delaunay's Triangulation Divide and Conquer algorithm, but I am running into some issues. I have already asked a question on StackOverflow and am not getting many suggestions and help from it, so I figured I would come here. Is there another algorithm that might be better to use for the navmesh generation in comparison to Deluanay's Triangulation? My current implementation seems extremely buggy during the merge step and I cannot find the error. I have checked over the code countless times, comparing it to the description of the algorithm from http://www.geom.uiuc.edu/~samuelp/del_project.html. 
      My current code is this:
      class MapNode { constructor(x, y) { this.position = new Vector(x, y); this.neighbors = []; } distance(n) { return this.position.distance(n.position); } inNeighbor(n) { for (let i = 0; i < this.neighbors.length; i++) { if (this.neighbors[i] === n) return true; } return false; } addNeighbor(n) { this.neighbors = this.neighbors.filter((node) => node != n); this.neighbors.push(n); } addNeighbors(arr) { let self = this; arr.forEach((n) => self.neighbors.push(n)); } removeNeighbor(n) { this.neighbors = this.neighbors.filter((neighbor) => neighbor != n); } } class Triangle { constructor(p1, p2, p3) { this.p1 = p1; this.p2 = p2; this.p3 = p3; this.neighbors = []; } addNeighbors(n) { this.neighbors.push(n); } } function genSubMat(matrix, ignoreCol) { let r = []; for (let i = 0; i < matrix.length - 1; i++) { r.push([]); for (let j = 0; j < matrix[0].length; j++) { if (j != ignoreCol) r[i].push(matrix[i + 1][j]); } } return r; } function determinantSqMat(matrix) { if (matrix.length != matrix[0].length) return false; if (matrix.length === 2) return matrix[0][0] * matrix[1][1] - matrix[1][0] * matrix[0][1]; let det = 0; for (let i = 0; i < matrix.length; i++) { let r = genSubMat(matrix, i); let tmp = matrix[0][i] * determinantSqMat(r); if (i % 2 == 0) det += tmp; else det -= tmp; } return -det; } // if d is in the circle formed by points a, b, and c, return > 0 // d is on circle, return 0 // d is outside of circle, return < 0 function inCircle(a, b, c, d) { let arr = [a, b, c, d]; let mat = [ [], [], [], [] ]; for (let i = 0; i < arr.length; i++) { mat[i][0] = 1; mat[i][1] = arr[i].position.x; mat[i][2] = arr[i].position.y; mat[i][3] = arr[i].position.x * arr[i].position.x + arr[i].position.y * arr[i].position.y; } return determinantSqMat(mat); } function walkable(from, to, hardnessMap) { let diff = new Vector(to.x - from.x, to.y - from.y); if (Math.abs(diff.x) > Math.abs(diff.y)) diff.scale(Math.abs(1 / diff.x)); else diff.scale(Math.abs(1 / diff.y)); let current = new Vector(from.x + diff.x, from.y + diff.y); while (Math.round(current.x) != to.x || Math.round(current.y) != to.y) { if (hardnessMap[Math.floor(current.y)][Math.floor(current.x)] === 1) return false; current.x += diff.x; current.y += diff.y; } return true; } function getLowest(nodes) { let lowest = nodes[0]; for (let i = 1; i < nodes.length; i++) { if (nodes[i].position.y < lowest.position.y) lowest = nodes[i]; } return lowest; } // returns the angle between 2 vectors, if cw is true, then return clockwise angle between, // else return the ccw angle between. b is the "hinge" point function angleBetween(a, b, c, cw) { let ba = new Vector(a.position.x - b.position.x, a.position.y - b.position.y); let bc = new Vector(c.position.x - b.position.x, c.position.y - b.position.y); let v0 = new Vector(0, 1); let angleBA = v0.angleBetween(ba) * 180 / Math.PI; if (angleBA < 0) angleBA += 360; let angleBC = v0.angleBetween(bc) * 180 / Math.PI; if (angleBC < 0) angleBC += 360; let smallest = Math.min(angleBA, angleBC); let largest = Math.max(angleBA, angleBC); let angle = largest - smallest; return (cw) ? angle : 360 - angle; } function sortSmallestAngle(a, b, list, cw) { list.sort((m, n) => { let vab = new Vector(a.position.x - b.position.x, a.position.y - b.position.y); let vmb = new Vector(m.position.x - b.position.x, m.position.y - b.position.y); let vnb = new Vector(n.position.x - b.position.x, n.position.y - b.position.y); if (cw) return vab.angleBetween(vmb, cw) - vab.angleBetween(vnb, cw); else return vab.angleBetween(vnb, cw) - vab.angleBetween(vmb, cw); }); } // a is in list, b is in the other list function getPotential(a, b, list, cw) { sortSmallestAngle(b, a, list, cw); for (let i = 0; i < list.length - 1; i++) { let angle = angleBetween(b, a, list[i], cw); if (angle > 180) return false; else if (inCircle(a, b, list[i], list[i + 1]) <= 0) return list[i]; else { a.removeNeighbor(list[i]); list[i].removeNeighbor(a); } } let potential = list[list.length - 1]; if (potential) { let angle = angleBetween(a, b, potential, cw); if (angle > 180) return false; return potential; } return false; } function merge(leftNodes, rightNodes, leftBase, rightBase, hardnessMap) { leftBase.addNeighbor(rightBase); rightBase.addNeighbor(leftBase); let newLeft = leftNodes.filter((n) => n != leftBase); let newRight = rightNodes.filter((n) => n != rightBase); let potentialLeft = getPotential(leftBase, rightBase, newLeft, false); let potentialRight = getPotential(rightBase, leftBase, newRight, true); if (!potentialLeft && !potentialRight) return; else if (potentialLeft && !potentialRight) merge(newLeft, newRight, potentialLeft, rightBase, hardnessMap); else if (potentialRight && !potentialLeft) merge(newLeft, newRight, leftBase, potentialRight, hardnessMap); else { if (inCircle(leftBase, rightBase, potentialLeft, potentialRight) <= 0) merge(newLeft, newRight, potentialLeft, rightBase, hardnessMap); if (inCircle(leftBase, rightBase, potentialRight, potentialLeft) <= 0) merge(newLeft, newRight, leftBase, potentialRight, hardnessMap); } } // divide and conquer algorithm function delaunay(nodes, hardnessMap) { if (nodes.length <= 3) { for (let i = 0; i < nodes.length; i++) for (let j = 0; j < nodes.length; j++) if (i != j) nodes[i].addNeighbor(nodes[j]); return nodes; } else { nodes.sort((a, b) => { let tmp = a.position.x - b.position.x; if (tmp === 0) return b.position.y - a.position.y; return tmp; }); let l = nodes.length; let leftNodes; let rightNodes; if (l === 4) { leftNodes = delaunay(nodes.slice(0, 3), hardnessMap); rightNodes = delaunay(nodes.slice(3, 4), hardnessMap); } else { leftNodes = delaunay(nodes.slice(0, Math.floor(nodes.length / 2)), hardnessMap); rightNodes = delaunay(nodes.slice(Math.floor(nodes.length / 2), nodes.length), hardnessMap); } let leftBase = getLowest(leftNodes); let rightBase = getLowest(rightNodes); merge(leftNodes, rightNodes, leftBase, rightBase, hardnessMap); console.log("=============================MergeComplete================================"); return nodes; } }  
    • By MeeMaster
      I am a beginner in the Game Dev business, however I plan to build a futuristic MMO with some interesting mechanics.
      However, I have some doubts about shooting mechanics that I chose for this game and would like to know your opinion on this. The mechanic goes as follows:
      - Each gun would have it's damage-per-shot value
      - Each gun would have it's shots-per-second value
      - Each gun would have it's accuracy rating
      Now the question is: how to calculate the output damage? I have three available options:
      1) Calculate the chance of each shot hitting the target (per-shot accuracy)
      2) Multiply the damage output of a weapon by it's accuracy rating (weapon with 50% accuracy deals 50% of it's base damage)
      3) Don't use accuracy at all and just adjust the weapon damage output
      Which of these three mechanics would you like to see in a game? Mind, this will be an MMO game, so it will have lock-on targets, AoE effects and all that jazz.
    • By Vityou
      I am thinking of making a game like screeps, where you use actual programming to control the game, but I'm not sure about the best way to do this.  I'm probably going to use lua or javascript for the language, and I would like to represent units in the game as objects that the user can command and modify, with restraints based on the actual game, like how screeps does it.  However, I am not sure how to get started.  I think I have to use C or C++ and then embed the languages, but I'm not sure where to go from there, for example, having objects in lua actually correspond to units in the game.  Are there any resources that explain how to do this?  Also, I'm not concerned about the graphics of the game whatsoever, it might as well just be a text adventure as far as I'm concerned, I just want the user to be able to control the game through scripting.
  • Popular Now