@Kyltotan: Rather than test for error, I could abort after x number of games, or some high enough win/loss ratio. Indeed, I'm having trouble knowing when to stop the learning process. Millions of games would be played, for sure, seeing how the 5 card odds of landing a royal flush in Texas Hold 'Em is like 649,739:1.
@alvaro: I think I see what was meant by knowledge in advance: the training process must check to see if the proposed state change given by the ANN is even valid. Thus, all of the possible state changes must be enumerated, so as to have something to check the validity of the proposed state change. If the proposed state change is invalid, then one valid state change is picked pseudorandomly from the enumeration of possible state changes. Anyway, the main point is that the possible state changes must be known to the AI in advance. That's cheating, I suppose, but it's the only way that I can see it working. I can also see this being part of the equation when it comes to the socket server having to check to see if a player's proposed state change is valid.