Unsupervised neural networks

Started by
9 comments, last by GameDev.net 18 years, 7 months ago
I want to learn neural networks. I kinda understand how they work but I cant get my head around how they learn if there isnt a known "optimal" value, i.e. a techer. Exemple: I want a neural network to keep a helicopter in midair. The inputs are the helicopters weight or the gravitys pull (m*g) and the second is the vertical speed. The output is 1 or 0, should the engine be running or not.


1. vertival speed  0 >-w1->
                           | T | -- 0> engine
2. m * g           0 >-w2-> 

If I understand it right, the calculation goes as follows: First input (vertical speed) * first inputs weight (w1) + second input (m*g) * second inputs weight (w2). If that is bigger than the threshold (T) then the output fires (engine starts). Now after each execution the net is supposed to evolve (the weights change value). But how do they change? Since I dont know when I should go up or down I cant tell the net if it did the right or wrong thing. This is whats called a unsupervised neural network, right? I dont want to write anymore then no one cares to read. :P Please enlighten and/or correct me. Cheers.
Advertisement
I assume your evolutionary computation step will have some sort of fitness function. In this case, the time the helicopter stayed in mid-air. Given that you have some sort of fitness function which provides the selection step for the evolutionary step, that becomes your teacher and you have a supervised network.

The changes to the network's weights will likely involve some random mutations and perhaps crossover between networks, which will help to promote favorable traits through the population.

-Kirk

Hmm... you mean i make ten individual neural nets with random weigths, let them all run and take the best ones, evolve and mutate? With some sort of genetic algoritm? Is it possible to make this stay inside a single neural net? I want one mind (or single individual) that learns and not generations of a population. Or am I completely of track here?
Yeah, a genetic algorithm is not really required here. Although it might help.

I don't think the algorithm you're looking for is an unsupervised one. What unsupervised learning does is cause your network to discover correlations in the data you send it. But that's not really what's going on here. Your network shouldn't really care if there are any correlations between w1 and w2, all that your net should do is compute the correct function to keep the thing in the air.

So you want a supervised algorithm. I think the way to go is to actually run your helicopter simulation. Then while the thing is flying, if the helicopter starts to fall, give it a training case that tells it to go heavier on the gas. If the helicopter goes too high, train it to ease up on the gas. If all goes well, the network will settle on the correct middle ground.
In your original post you used the word "evolve" and I thought you were implying the use of evolutionary computation. If you don't want to go through populations, then that makes the problem somewhat more difficult.

Essentially, the net will have to go through a process of figuring out what it needs to do, what it should not do, and how to relate the inputs to those effects. This becomes a process of trial and error, but you still need something similar to a fitness function. You will inevitably have a measure of how well it is performing that gives you (or the net) reason to believe it is doing well.

Now that I think of it, this might work as a simulated annealing project. Start off with a random neural net and see how it performs. Make a random change and then test the new net. If it does better, keep it. If it does worse, there is a probability that it will be kept. The probability that it is kept starts off high and decreases over time so that late in the process there is a very low probability it is retained. Usually this is an exponential decay function such as:

p=exp(-df/T)

where exp is the exponential function
df is the change in fitness between this net and the previous net
T is a temperature factor which helps control the probability of acceptance of worse nets, and it decreases over time, effectively approaching 0.

I hope I haven't overinterpreted your problem. 8^) Let me know if this helps or hinders.

-Kirk



Simulated annealing and genetic algorithms are both basically random searches with heuristics. The space you are searching is the weights of your neural network and the function is some indicator of how well the neural network performs your task. You could also try using parametized functions other than neural networks and search over the space of their parameters. Neural networks just happen to form parameterized functions that can approximate many different types of functions, which is useful if you have no idea what your function should look like.

There is another category of algorithms called reinforcement learning algorithms. Reinforcement learning techniques attempt to associate inputs and actions with a potential reward. In addition to the inputs your AI receives, it also receives a reward signal. The reward signal is like giving mice a treat at the end of a maze. Mice learn to make it to the end of the maze to get a treat. A reinforcement learning algorithm might simulate this by giving the AI a small and constant negative reward (punishment) as long as it stays in the maze and a positive reward when it reaches the end (actually just the punishment may be enough, the AI would learn to escape the maze more quickly in order to receive less punishment). The reinforcement learning algorithms attempt to learn which actions result in greater rewards. These algorithms are complicated and may or may not perform better than genetic algorithms. Also there are many many complicated variations. But instead of just randomly trying things, they make a more systematic effort to learn using the knowledge available to them. Instead of each trial involving a new generation with random differences, each trial would use the same agent applying the knowledge it gained from its previous trials.
Thanks for the post everyone. Still got some questions though, Ill reread the posts once I get home, Ive probarly missed something.

The only thing (in terms of AI) Ive done so far is bombing planes raiding a city, theyre guessing the coordinates and those who was the best makes up the next populatuion (with mutation). The only thing in the gene was the coordinates (x and y) and it was very easy to make out a fitness value.

If I give that little helicopter (who is supposed to stay in midair) negative signals constantly when its not in midair and positive when it is. But how will it react to the negative signals? Say its rushing towards the ground very fast, I give it -9 (much punishment) what will it do with the number? Ok it knows its getting very punished but its not knowing to add or decrease the weights. Is this just a case of trial and error? Try one direction, if it gets more punishment, choose the other.

Am I supposed to store this in some kind of database? If its going down, decrease weights or something like that? And then some numbers to help it. I thought everthing was supposed to stay in the neural net and not some kind of offside database.

I guess I either apply some reinforced learning where I really tell it what to do and not to do. But since I dont want that the other thing that really is able to make is the trial and error thing.

Hard to make it on a helicopter, its not like it needs to survive or anything. Oh and btw, Im constantly googling for more documentation on neural nets but if someone got a great site please feel free to tell me. ;)
Reinforcement learning is really something completely separate from neural networks but they can be used together. Whether or not this is better than using other knowledge representations is debatable. One of the ideas of reinforcement learning is that you give the agent the reward based only on what you know you want it to try to do, not what you think a good strategy is, and then leave it up to the agent to determine how successful a given action will be at arriving at greater future rewards. There are a lot of reinforcement learning algorithms that attempt to accomplish this sort of thing. The idea is that this is sort of how living beings learn things, by being able to learn what to do in order to get future rewards. The techniques don't directly involve neural networks and in fact combining them with neural networks may be even more complicated.

Anyway with the helicopter problem, the AI could try to learn the reward associated with its observations and possible actions. The action becomes an input to the neural network, and the output is the expected reward. With reinforcement learning, the expected reward is actually the immediate reward that is received plus the expected reward that will be received from the next action. And there are lots of little complicated things here, try searching the internet for more information on it. It gets somewhat complicated.
How does the learner make changes to its approach or strategy in Reinforcement Learning? The way that AP describes it, it sounds like simulated annealing. I have no experience with Reinforcement Learning, so my question is a bit naive.

-Kirk

I'm not the original AP, but a really good reference for reinforcement learning is the text book "Reinforcement Learning: An Introduction", which also happens to be free online http://www.cs.ualberta.ca/%7Esutton/book/ebook/the-book.html You really can't beat this book.

Another shorter reference is the review paper http://www.cs.cmu.edu/afs/cs/project/jair/pub/volume4/kaelbling96a-html/rl-survey.html

This topic is closed to new replies.

Advertisement