My project plan

Started by
5 comments, last by IADaveMark 14 years, 11 months ago
I would like to explain & get feedback on my proposed project idea for discovering the brain's learning algorithm using genetic programming and reinforcement learning. I have worked on many AI projects using my own ideas in the past, but I want write down a real plan first for this one and listen to other people's opinion of it. All my past AI projects have been left unfinished because they were ill-conceived and fundamentally flawed, because I did not take the time to write down a proper plan and share my ideas with others for criticism. This one is going to be different, that is if it ever leaves the drawing board. The paper I cited if you need it is available here: http://www.karlsims.com/papers/alife94.pdf Here is my project proposal, its a bit long but bear with me:
Quote: The past 60 years of Artificial Intelligence research has shown if nothing else that the problem of truly flexible learning and real intelligent behavior is one of great difficulty, and remains as yet unsolved. Many optimistic predictions have been made in the field of AI that turned out to be false, leading to disappointment and skepticism in the field. With the rapid growth in easily available computing power, much work has been focused on using many different search algorithms to automatically find novel solutions to problems not easily solved by more traditional means. Intelligent behavior for specific tasks has often been defined as the goal to obtain, using many different biologically inspired neural network models. In Karl Sims' work "Evolving 3D Morphology and Behavior by Competition", genetic algorithms were used to create the morphologies of creatures, and the neural control systems that control their muscle forces. Fitness evaluation functions were defined to direct the evolved creatures towards specific behaviors such as swimming, walking and jumping. The creatures' "brains" consisted of a network of directed graph nodes that performed common mathematical functions such as sum, product, divide, greater-than, sin, cos etc. After many generations of evolution of the virtual creatures, each one was very proficient at performing the specific task that it was required by the fitness function to do, but each individual contained no further ability to learn or adapt to other tasks if the fitness function requirements or environment were to change. In this case any evolutionary algorithm used would have to resume re-evaluating and refining the entire population of genomes. For any agent or individual member of the population to learn independently, the agent should take actions in its environment based on reinforcement learning, where the agent learns about the results of state action combinations in its environment and their associated rewards and hazards, and improves its own behavior based on past experience. Aspects of the human brain's function are akin to reinforcement learning: the drive for self-preservation, the need for emotional happiness and the ability to make plans to achieve goals. Yet the brain is obviously far more complex than just a Markov decision process, the typical formulation used for reinforcement learning. The brain posseses an innate ability to learn about the trends and relationships of complex and diverse data, using its vast self-organizing predictive memory. This allows it to react to situations never encountered before in an intelligent and self-preserving manner, more effectively than any reinforcement learning algorithm so far devised by man. We propose that the brain's learning algorithm itself should be the goal to search for with an evolutionary algorithm. This would require a fitness function defined as the combined result of learning to perform a range of different simple tasks, each with a reward signal and the same state-action dimensional space. For each fitness evaluation, the agent would have to work on each task for a time period long enough to allow at least some reinforcement learning to take place. The effectiveness of learning to perform each task would be recorded as the difference between the task fitness at the beginning of learning and the task fitness at the end. This would force the search algorithm to find solutions where the agent actually learns the correct actions to take from the reward signal, as opposed to evolving agents that instinctively know the task solutions from their genes. The kind of directed graph control network used in Karl Sims' work proved very effective for controlling arbitrary behaviors. Provided that the right node functions were supported, a network could be evolved to support the online weight adaptation required for an agent to learn the results of state-action pairs in its environment. Directed graph networks also operate in a highly parallel manner like real neurons do, whilst the alternative of processing information serially is biologically very unrealistic. However, the emergent processes arising from many interconnected nodes working in parallel, and the potential size of the evolved graph networks would make them nearly impossible to understand in terms of any functional logic. This is analogous to real brains, where much is understood about the operations of individual neurons and synapses, but the way they cooperate to form high level brain functions still remains unknown. Understanding the evolved learning algorithm is the final goal, therefore we propose that genetic programming be used to evolve syntax tree-like programs that perform the task of building the graph networks that make up the control system. A genotype syntax tree would be much easier to understand than the phenotype parallel graph network it builds. The syntax tree would also form a compressed representation of the graph network, allowing repeated functionality and structure in the phenome to be defined as a loop in the genome, as well as reducing the space needed to store genome solutions. To speed up the evolution we intend to run the fitness function on client computers, with the results being collected on a server where the crossover and mutation operators are performed. The server need only send over the genome for an evaluation, therefore the compact nature of the syntax tree is very beneficial. The choice of tasks for the agent to learn to perform in the fitness function is crucial. If the tasks are too similar or if there is an insufficient number of them, then agents will be evolved that specialze on those tasks without posessing any ability to learn new ones, just like the virtual creatures. The more tasks that are required by the fitness function, the more flexible the evolved agents will be, but a compromise must be made to allow the evolution to be practicle on computer hardware. It is inevitable that at least some instinctive specialisation will occur to the extent that the tasks are similar. The initial set of tasks are now described, and can be expanded after the intial proof of concept. Each of these tasks are games with a state vector of six values, and an action vector of two. For each unit of time in the game, the agent's control network would execute a fixed number of cycles with the current state to provide its output action. The first game is Pong, where two state values are used to record the vertical position of the paddles, and four values record the positions of two balls. One action value controls moving up, and the other moving down. The reward signal goes positive when the agent scores by hitting a ball past its programmed opponent, and goes negative when it concedes. The second game is cat & mouse style where the agent moves in a two dimensional space to find the food, whilst avoiding the enemy that tries to catch it. The six state values represent the positions of the agent, food and enemy in the two dimensional space, and the two action values control the agent's movement vertically and horizontally. The reward signal goes positive when the agent finds the food and negative when the agent is caught. The third game is similar to Space Invaders, where one action value controls moving left and right, and the other shoots. Up to three aliens can exist at a time, who start from the top and move to the bottom where they score against the agent, if they are not shot first.
I am grateful to anyone that managed to read any of that, and I would appreciate greatly any comments or critism you have of my ideas.
Advertisement
What's the context of your project? For fun, research, a student project?

The thing about this particular research is that there hasn't been so much progress on the pure evolutionary front since 1994. Much of the improvements have come from better understanding biomechanics, and creating higher quality underlying models.

On the reinforcement learning front, there's lots to do -- and that's cutting edge. See "Feedback Error Learning." But typically that's not combined with EA, but used for integrating motion capture with physical simulations.


Also see my roundup of research for reference:
Evolving Virtual Creatures: The Definitive Guide


In summary, be prepared for a fun project -- but don't set your expectations very high for innovative results.

Alex
AiGameDev.com

Join us in Vienna for the nucl.ai Conference 2015, on July 20-22... Don't miss it!

Quote:Original post by alexjc
What's the context of your project? For fun, research, a student project?

The thing about this particular research is that there hasn't been so much progress on the pure evolutionary front since 1994. Much of the improvements have come from better understanding biomechanics, and creating higher quality underlying models.

On the reinforcement learning front, there's lots to do -- and that's cutting edge. See "Feedback Error Learning." But typically that's not combined with EA, but used for integrating motion capture with physical simulations.

...


This is a research project. The point of it is to be able to look at the syntax tree from the genetic programming, and explain how a creature actually learns. You're absolutely right reinforcement learning isnt usually combined with evolutionary algorithms. But my thesis is that it should be, because nature has evolved humans beings that are able to learn new tasks, not just animals that only behave according to instinct hard wired by their genes.

Thanks for that creature roundup - it sure does show that there hasnt been much evolution since 1994 :) Im convinced that the evolving of reinforcement learning is the next advancement.

I will readup on Feedback Error Learning.

[Edited by - ideasmichael on May 2, 2009 9:02:12 AM]
Do you know about NEAT ?
Quote:Original post by Yvanhoe
Do you know about NEAT ?


I hadnt heard of that project, but after having glanced over it, it seems to be more of the same - evolve genomes/networks for a specific task. Intelligence & learning itself should be the goal of evolution (as it is for humans in nature), not evolution towards solving specific tasks.
I think you're severly limiting your research opportunities by attempting to sum up the human brain as a single algorithim. It would take hundreds, if not thousands or even millions of algorithims all occuring simultaneously to describe/emulate the human brain.

Additionally, you take the approach that non-human animals lack the ability to learn. This is untrue as it has been shown and proven that animals do indeed have the innate ability to learn. Again, you'd be limiting your research opportunities by not observing how other creatures learn.

I guess what I'm trying to say is that if you don't study the entire picture you'll never be able to accomplish your goal as there will always be some part of the puzzle that's missing.

-Lead developer for OutpostHD

http://www.lairworks.com

Read Marvin Minsky's "Society of Mind". He basically posits the idea that the human mind is a huge collection of relatively independent specialized parts. Some of the parts are managers that know nothing about how the parts do their work, but just how to deal with the input and output of what they do... even coordinating this info with other parts.

If you are looking at replicating human (or whatever) thought, you need to be able to break things down into specialized pieces and parts as well.

Dave Mark - President and Lead Designer of Intrinsic Algorithm LLC
Professional consultant on game AI, mathematical modeling, simulation modeling
Co-founder and 10 year advisor of the GDC AI Summit
Author of the book, Behavioral Mathematics for Game AI
Blogs I write:
IA News - What's happening at IA | IA on AI - AI news and notes | Post-Play'em - Observations on AI of games I play

"Reducing the world to mathematical equations!"

This topic is closed to new replies.

Advertisement