• Create Account

### Categories (See All)

Mark as peer reviewed: 0 votes
Still needs work: 2 votes (Álvaro, Dave Hunt)

Editor Feedback:
• Unclear or Incomplete

Like
12Likes
Dislike

# Reinforcement Learning for Games UNDER REVIEW

By Eric Laukien | Published Jul 14 2013 11:08 PM in Artificial Intelligence

learning machine ai

If you do not know what neural networks are, I recommend trying out the great tutorials at AI Junkie.

Neural networks are often overlooked when considering game AI. This is because they once received a lot of hype but the hype didn't amount to much. However, neural networks are still an area of intense research, and numerous learning algorithms have been developed for each of the 3 basic types of learning: supervised, unsupervised, and reinforcement learning.

Reinforcement learning is the learning algorithm that allows an agent to learn from its environment and improve itself on its own. This is the class of learning algorithms we will focus on in this article. This article will discuss the use of genetic algorithms as well as an algorithm the author has researched for single-agent reinforcement learning. This article assumes that the neural networks are simple integrate and fire, non-spiking sigmoidal activation neural networks.

# Genetic Algorithms

## The concept

Genetic algorithms are one of the simplest but also one of the most effective reinforcement learning methods. It does have one key limitation though: It has to operate on multiple agents (AI's). Nevertheless, genetic algorithms can be a great tool for creating neural networks via the process of evolution.

Genetic algorithms are part of a broader range of evolutionary algorithms. Their basic operation proceeds as follows:

1. Initialize a set of genes
2. Evaluate the fitnesses of all genes
3. Mate genes based on how well they performed (performing crossover and mutation)
4. Replace old genes with the new children
5. Repeat steps 2 - 4 until a termination criterion is met

The genes can be either a direct encoding of the traits of the AI (neuron weights in the neural network case), or an indirect, "generative" encoding. Evaluating the fitnesses is where most of the processing time is spent, since it involves simulating a phenotype created from each genotype to assert how good it is at finishing the task. The fitnesses are recorded, and are typically scaled such that the lowest fitness is always 0. The reason for this is the mating step. Here, genes are selected based on their fitnesses using a selection function. A popular selection function is the fitness proportional "roulette wheel" selection function, which randomly chooses genes with likelihoods proportional to their fitnesses. Some selection functions such as fitness proportional selection require all positive fitnesses, which is why they are typically rescaled so the lowest is always 0.

When a pair of parents is selected, their genes are crossed over using a crossover function. This function depends on your encoding, but you generally want some traits of each parent to make it into the child without being destructive. After crossover, the genes are also mutated (randomly altered slightly) to help force the algorithm to perform more exploration of possible solutions.

Over time, genotypes will improve themselves, and often will even learn how to exploit faults in whichever system they are operating!

Next, we will discuss a particular gene encoding method for neural networks.

## NEAT

The code accompanying performs the genetic algorithm following the NEAT (Neuro-Evolution of Augmenting Topologies) methodology created by Kenneth Stanley. As a result, the neural network genes are encoded by storing connections between neurons as index pairs (indexed into the neuron array) along with the associated weight, as well as the biases for each of the neurons.

This is all the information that is needed to construct a completely functional neural network from genes. However, along with this information, both the neuron biases and connection genes have a special "innovation number" stored along with them. These numbers are unique, a counter is incremented each time an innovation number is assigned. That way, when network genes are being mated, we can tell if connections share a heritage by seeing if their innovation numbers match. These can then be crossed over directly, while the genes without innovation number matches can be assigned randomly to the child neural networks.

This description is lacking in detail, but intends to simply provide an overview of the way the genetic algorithm included in the software package functions.

While this genetic algorithm works very well for many problems, it requires that many agents are simulated at a time rather than one just learning by itself. So, we will briefly cover another method of neural network training.

# Local Dopamine Weight Update Rule with Weight and Output Traces

## The concept

This method quite possibly has already been invented, but I could not find a paper describing the same method so far. This method applies to how neuron weights are updated when learning in a single agent scenario. This method is entirely separate from network topology selection. As a result, the included software package uses a genetic algorithm to evolve a topology for use with the single agent reinforcement learning system. Of course, one could also simply grow a neural network by randomly attaching new neurons over time.

Anyways, I discovered this technique after a lot of trial and error while trying to find a weight update rule for neural networks that operates using information available at the neuron/synapse level. It therefore could be biologically plausible. The method uses a reward signal, dopamine, to determine when the network should be rewarded or punished. In order to make this method work, one needs to add a output trace (floating point variable) to each neuron, as well as a weight trace to each neuron weight, except for the bias weight. Other than that, one only needs the reward signal dopamine, which can take on any value where positive means reward the network and negative means punish the network (0 is therefore neutral and doesn't do anything). When one has this information, all one needs to do is update the neural network weights after every normal update cycle using the following code (here in C++):

m_output = Sigmoid(potential * activationMultiplier);

m_outputTrace += -traceDecay * m_outputTrace + 2.0f * m_output - 1.0f;

// Weight update
for(size_t i = 0; i < numInputs; i++)
{
m_inputs[i].m_trace += -traceDecay * m_inputs[i].m_trace + m_outputTrace * (m_inputs[i].m_pInput->m_outputTrace * 0.5f + 0.5f);
m_inputs[i].m_weight += alpha * m_inputs[i].m_trace * dopamine;
}

// Bias update
m_bias += alpha * m_outputTrace * dopamine;


Where m_output is the output of the neuron, traceDecay is a value that defines how quickly the network forgets (ranges from [0, 1]), alpha is the learning rate, and m_inputs is an array of connections.

This code works as follows:

The output trace is simply an average output over time that decays if left untouched. The weight update simply moves the weight if dopamine is not equal to 0 (it doesn't have perfect fitness yet) in the direction that would cause it to output its average output less often if dopamine was negative or more often if dopamine was positive. However, it does so based on the weight trace, which measures how much a connection has contributed to the firing of the neuron over time, and therefore helps judge how eligible the weight is for a weight update. The bias doesn't use a weight trace, since it is always eligible for a weight update.

This method is able to solve the XOR problem with considerable ease (it easily learns an XOR gate). I tested it with simple feed-forward neural networks (1 hidden layer with 2 neurons), a growing neural cloud, and networks resulting from the NEAT algorithm.

# Use in Games?

These methods seem like total overkill for games. But, they can do things that traditional methods can't. For instance, with the genetic algorithm, you can create a physics-based character controller like this one.

The animation was not made by an animator; rather the AI learned how to walk by itself. This results in an animation that can react to the environment directly. The AI in the video was created using the same software package this article revolves around (linked below).

The second discussed technique can be used to have game characters or enemies learn from experience. Enemies can for instance be assigned a reward for how close they get to a player, so that they try to get as close as possible to the player given a few sensory inputs. This method can also be used by virtual pet type games, where you can reward or punish a pet to achieve the desired behavior.

# Using the Code

The software package accompanying this article contains a manual on how to use the code.

The software package can be found at: https://sourceforge.net/projects/neatvisualizers/

# Conclusion

I hope this article has provided some insight and inspiration for the use of reinforcement learning neural network AI in games. Now get out there and make some cool game AI!

# Article Update Log

2 July 2013: Initial release
14 July 2013: Heavy edits

I love graphics programming and AI programming.

Neural networks are often overlooked when considering game AI. This is because they once received a lot of hype but the hype didn't amount to much.

I don't think that's correct. The reason presented here explains why a lot of people in the research community in general overlook ANNs, or why they look at them suspiciously. It might also apply to game AI practitioners, but it's not the main objection.

ANNs and learning in general are undesirable for game AI for other reasons, when compared with more traditional approaches. Off the top of my head:

* They are harder to test.

* They are harder to tweak by a game designer.

Those are two lethal wounds, if you ask me.

As an example, in the linked video of a learned controller for a four-legged robotic enemy, you can see the feet often slide in an awkward manner. How would one go about tweaking the ANN to fix that?

The true objective of game AI is providing the player with a fun experience. Since we don't have a quantitative definition of fun, automated learning is unlikely to provide very satisfactory results. In some genres you may also have believability as an objective, which has similar problems.

Trying to help the case for the article, there might be a place for learning techniques in game AI for very specific tasks (learning how to drive in a racetrack comes to mind).

If the article is deemed relevant to game AI at all, I believe it should be much more honest about the objections raised above.

I feel the article didn't go far enough in support of its title - Reinforcement Learning for Games.  There are only 8 sentences dedicated to the discussion of applying the algorithms to games and those are somewhat superficial.

The article just seems incomplete with so little content related to the title.

Neural networks are often overlooked when considering game AI. This is because they once received a lot of hype but the hype didn't amount to much.

I don't think that's correct. The reason presented here explains why a lot of people in the research community in general overlook ANNs, or why they look at them suspiciously. It might also apply to game AI practitioners, but it's not the main objection.

ANNs and learning in general are undesirable for game AI for other reasons, when compared with more traditional approaches. Off the top of my head:

* They are harder to test.

* They are harder to tweak by a game designer.

Those are two lethal wounds, if you ask me.

As an example, in the linked video of a learned controller for a four-legged robotic enemy, you can see the feet often slide in an awkward manner. How would one go about tweaking the ANN to fix that?

The true objective of game AI is providing the player with a fun experience. Since we don't have a quantitative definition of fun, automated learning is unlikely to provide very satisfactory results. In some genres you may also have believability as an objective, which has similar problems.

Trying to help the case for the article, there might be a place for learning techniques in game AI for very specific tasks (learning how to drive in a racetrack comes to mind).

If the article is deemed relevant to game AI at all, I believe it should be much more honest about the objections raised above.

I understand your criticisms, I am not exactly the best writer, I mostly just wanted to share a new technique I researched and used in a game. That said, the point of these techniques isn't to replace existing methods that work perfectly, but rather to allow certain behaviors either much more difficult or impossible to implement with traditional methods. The idea isn't to force the AI into a behavior easily achieved by other methods. For example, the neural network based controller on the robot allows the robot to feel the ground under its feet and make adjustments based on the terrain. It might look a little sloppy, but I believe this is easy to fix (more generations would probably do it, I only did 50 generations, which is quite little!). Also, producing additional animations without an artist suddenly becomes a very fast automated process - so things like procedurally generated enemies can be animated on the fly.

As for the case of these neural-network-only behaviors being fun in a game, nobody has really tried to use them, at least not with the latest in neural network research. I don't think you can deem neural networks to make for boring games simply on the basis that there are on games that do it well, since there are also no games that do it poorly.

In case it interests you, I am currently working on what I think would be a fun neural-network based game. The game I am working on absolutely requires these kinds neural networks. The idea is to have a procedurally generated world filled with procedurally generated plants and creatures. The player then must hunt for a specific creature using a tool that gives the genetic similarity to the target animal for any animal. Using this, one can track the animal down by noticing common features in the sampled creatures. Due to the random nature of the environment and creatures, the AI must learn how to walk and interact with its environment on its own. I already have the random creatures and environments in separate projects, I simply must merge them together and polish it. I will probably add some survival elements as well.

Note: Please offer only positive, constructive comments - we are looking to promote a positive atmosphere where collaboration is valued above all else.

PARTNERS