Sign in to follow this  
JimPrice

Question about genetic algorithms

Recommended Posts

Hi, Continuing my reading on AI, I've come across genetic programming. As I understand it, it works as follows: a} Loop over n generations. b) Each generation has m members. Each member of a generation has a single realisation of each of a number of characteristics. c) Calculate a fitness function for each member; keep the top few members and evolve the next generation from pairs drawn from this set. Also allow for the possibility of mutation, to keep from finding local features of the fitness function response surface. The final 'best' members of the final generation provide our genetically superior, trained, set. I would assume that if you wanted an 'easier' set, you could just draw realisations from an earlier generation. This seems very similar to problems that are encountered in statistical analysis very frequenty (ie multi-dimensional characteristic set, evaluable response). In statistical analysis, I would solve this problem using a response-surface analysis design (linear model or generalised additive model, built using splines). Essentially you generate members with given charateristics and use them to build up a mathematical profile of the fitness function, allowing characterisation both of where the 'optimal' region is, but also how changing individual (or sets of) characteristics changes the fitness function. At this stage (and admittedly, given my still limited knowledge of the AI field), I can't see the advantages that a genetic algorithm gives - what am I missing? Is anyone aware of any literature comparing techniques? Would anyone be interested in results from this if I played with it for a bit? Regards, Jim.

Share this post


Link to post
Share on other sites
genetic algorithms are a useful tool where the solution space is massive, or poorly understood, or known not to be smooth with several optima. They are also pretty good at finding solutions where the fitness function is noisy.

PS. genetic algorithms are not the same thing as genetic programming, although they do share similarities.

Share this post


Link to post
Share on other sites
Quote:
Original post by JimPrice, regarding genetic algorithms
This seems very similar to problems that are encountered in statistical analysis very frequenty (ie multi-dimensional characteristic set, evaluable response). In statistical analysis, I would solve this problem using a response-surface analysis design (linear model or generalised additive model, built using splines). Essentially you generate members with given charateristics and use them to build up a mathematical profile of the fitness function, allowing characterisation both of where the 'optimal' region is, but also how changing individual (or sets of) characteristics changes the fitness function.

At this stage (and admittedly, given my still limited knowledge of the AI field), I can't see the advantages that a genetic algorithm gives - what am I missing? Is anyone aware of any literature comparing techniques?



Genetic algorithms (GA) and response surface (and realted design of experiments techniques) are optimizers. The difference boils down to this:


GA Pros:
- Can handle a wide variety solution representations and data types
- Can handle very large number of solution parameters
- Can deal with very complex objective functions
- Searches for a globally near optimal solution


GA Cons:
- Being generic, can be very slow to operate


Conventional Optimizer Pros:
- For appropriate problems, are very efficient


Conventional Optimizer Cons:
- Often quite limited in the number of parameters which can be accommodated
- Often quite limited in the complexity of objective function which can be accommodated
- Many (most) search for locally optimal solution


There is a substantial area of overlap, but response surface techniques would certainly have a difficult (impossible?) time with the kind of high-deception problems which GAs are typically thrown at. For a good reference, see Genetic Algorithms + Data Structures = Evolution Programs
, by Zbigniew Michalewicz (ISBN: 3540606769) for a solid introduction.


-Will Dwinnell
http://will.dwinnell.com


[Edited by - Predictor on September 6, 2004 6:02:18 AM]

Share this post


Link to post
Share on other sites
Thanks for the replies - very useful.


So, it seems the key points are the potential scale of the covariate space (probably the wrong terminology!), and the form of the outcome space. I think I was getting hung up on two points :

firstly, most of the examples I had looked at currently were relatively simple, and I hadn't extrapolated to more detailed ideas.
secondly, I was getting hung up on the lack of speed issue highlighted by will.

I think one of the best things that helped was the example exercise on the ai-junkie website (the 'finding the largest circle') - I couldn't immediately see a solution to this using techniques that I have been using over the last few years, but a GA solution just jumped out at me straight away.

An analogy for me is probably from my own pet field. Many problems in bayesian analysis can be solved using conjugate families of distributions (and therefore potentially making simplifying distributional assumptions about the form of the data and parameters) to simplify an analytical solution. However, using MCMC (numerical) techniques allows expansion to a wider range of problems, but at the cost of speed.

So, basically, a few more tools to add to the toolbox.

I love learning new stuff!

Jim.

Share this post


Link to post
Share on other sites
Quote:
Original post by fup
PS. genetic algorithms are not the same thing as genetic programming, although they do share similarities.


Why not? What difference? Imho, the difference is only in fitness-function. But any tasks are different by FF, so.. the main ideas are same.. the main operations are same.. ;)

Share this post


Link to post
Share on other sites
Quote:
Original post by Yuri Burger
Quote:
Original post by fup
PS. genetic algorithms are not the same thing as genetic programming, although they do share similarities.

Why not? What difference?

Your GA can't evolve complexity or simplicity to solve the problem at hand.

Share this post


Link to post
Share on other sites
Quote:
Original post by Yuri Burger
Quote:
Original post by fup
PS. genetic algorithms are not the same thing as genetic programming, although they do share similarities.


Why not? What difference? Imho, the difference is only in fitness-function. But any tasks are different by FF, so.. the main ideas are same.. the main operations are same.. ;)


Why is programming in assembler any different from programming in ML? It's the same hardware, and it all gets converted to the same bytes and runs on the same OS...

To offer some example answers ;) ...

- look at how complex high-level algorithms are implemented differently in different languages. There is very little similarity between the two algorithms, because the languages are so dissimilar that you actually need literally different algorithms to achieve the same end result.

- look at how much change there is in the generated code when a small change is made in a GA compared to in a GP (you'll need to sketch out example designs to see this). Sometimes, the GA encoding is actually just a GP - in which case there's little difference - but the GA encoding could be radically different, and so a single bit-flip in the GA will have an effect that the equivalent GP could only achieve by doing, say, 4 complex mutations at precise points in the tree.

- (if I inderstand woodsman correctly, this is the one he's referring to) because GP works at a higher level (using prog-lang terminology: i.e. more abstracted, more structured) it's been easy for GP'ers to add modern high-level concepts as trivial features of the GP. For instance, mutators that create arbitrary non-terminals (IIRC called "Automatically Defined Functions (ADF)" in Koza and in the GADS=EP book) that didn't exist in the the original set by iterating over a tree, picking a random sub-tree, declaring it a new NT, and replacing the sub-tree with the symbol / reference for that new NT. Doing the same thing in GA's is a lot less trivial. In fact, it's pretty easy to take a standard OOP code-refactoring library, and hook it up to a GP simulation directly, exposing all the types of modern refactoring that humans have identified as useful as high-level mutation techniques that the GP can utilise!

PS GADS=EP is a great starter book - it's nice and small but has a lot of info. It's a bit close to being an academic text, though - very very terse on many things, and one para might contain concepts that elsewhere other books would say the same info over two pages. A lot of stuff is skimmed on the assumption that the reader is already well versed in related disciplines.

Share this post


Link to post
Share on other sites
"Why not? What difference?"

Well technically you're right. GP is just a form of GA. In practice though GP is a very different tool... this is the reason it has become a separate branch of evolutionary computing.

GAs typically encode solution candidates as strings of binary or floating point numbers. GPs encode solution candidates as trees of operators consisting of function nodes and terminal nodes. Each tree represents a program, which may be run directly in a language such as Lisp or inside a virtual machine.

"Your GA can't evolve complexity or simplicity to solve the problem at hand."

They can actually. Here's an example:

http://www.cs.utexas.edu/users/kstanley/

Share this post


Link to post
Share on other sites
Quote:
Original post by fup
"Your GA can't evolve complexity or simplicity to solve the problem at hand."

They can actually. Here's an example:

http://www.cs.utexas.edu/users/kstanley/

I stand corrected! Thanks for the link, that's pretty neat.

Share this post


Link to post
Share on other sites
Quote:
Original post by fup
GPs encode solution candidates as trees of operators consisting of function nodes and terminal nodes.


GPs can also encode programs other ways, such as sequential programs. See, for instance:

Genetic Programming: An Introduction


-Predictor
http://will.dwinnell.com


[Edited by - Predictor on September 11, 2004 12:33:11 PM]

Share this post


Link to post
Share on other sites
Quote:
Original post by Yuri Burger
Is there somebody to help me with free GA library?

there is C++ GPL library of genetic algorithms http://sourceforge.net/projects/cpplibga/

The help in development of library is necessary...



I think that by the time one understands a GA library, it would have been just as easy to create one's own, which would, of course, be completely customizeable and free of legal entanglements. GAs are not that complicated. Additionally, many optimization problems in my experience have special requirements that are never well served by generic libraries.

-Predictor
http://will.dwinnell.com

Share this post


Link to post
Share on other sites
I have programmed my own Genetic Program. To test it, I had it solve the distance between two points. On my first trial it got it by the tenth generation. On my second trial, I had it run for 224 generation before I decided to quit it. It was still off by more than 1,000. It also got slower and slower as the program trees evolved more and more complexity to try to solve such a simple problem.

On my third trial it got it on the 31st generation. I looked at the code that the third trial generated and it was essentially If(false&&a whole lot of shit , more shit , more if statements that eventually led to the algorithm). So these if statements were sort of like dominant and recessive genes. I guess thats a good thing, because some negative genes can carry over until they're needed, but it makes it so that there is a lot of code just to get sqrt((L-O)dot(L-O)).

I'm also concerned about the results I got on the second trial. It just couldn't come up with the answer. Is there any way I can prevent this, or do I need to check the program every once in a while, just in case. I was thinking that maybe I should introduce a brand new member into the population every once in a while, but this would probably make my programs worse.

Share this post


Link to post
Share on other sites
Quote:
Original post by Nathaniel Hammen
I have programmed my own Genetic Program. To test it, I had it solve the distance between two points. On my first trial it got it by the tenth generation. On my second trial, I had it run for 224 generation before I decided to quit it. It was still off by more than 1,000. It also got slower and slower as the program trees evolved more and more complexity to try to solve such a simple problem.

If you can, enforce a maximum depth of your trees, whether as a result of crossover or otherwise. When one child is too large, copy a random parent instead.
Quote:
Original post by Nathaniel Hammen
On my third trial it got it on the 31st generation. I looked at the code that the third trial generated and it was essentially If(false&&a whole lot of shit , more shit , more if statements that eventually led to the algorithm). So these if statements were sort of like dominant and recessive genes. I guess thats a good thing, because some negative genes can carry over until they're needed, but it makes it so that there is a lot of code just to get sqrt((L-O)dot(L-O)).

Look up 'editing'. Basically, you give it patterns that go in and edit the results. An example would be pattern:
' (anything) | True' can be replaced with true.
Or 'iflessthanzero (some constant) x y' replace with x or y depending on the constant.
Quote:
Original post by Nathaniel Hammen
I'm also concerned about the results I got on the second trial. It just couldn't come up with the answer. Is there any way I can prevent this, or do I need to check the program every once in a while, just in case. I was thinking that maybe I should introduce a brand new member into the population every once in a while, but this would probably make my programs worse.

What I've generally seen is multiple runs, rather than larger populations or number of generations. There is no guarantee that any particular run will find a solution. Increasing the number of runs will increase the chance.

You may want to save, say, the ten best programs generated from one complete run, and introduce them into the next run either in the initial generation or later on. Perhaps even take these best 10 from several runs and then run them all together.

Share this post


Link to post
Share on other sites
Quote:
Original post by Nathaniel HammenOn my third trial it got it on the 31st generation. I looked at the code that the third trial generated and it was essentially If(false&&a whole lot of shit , more shit , more if statements that eventually led to the algorithm). So these if statements were sort of like dominant and recessive genes. I guess thats a good thing, because some negative genes can carry over until they're needed, but it makes it so that there is a lot of code just to get sqrt((L-O)dot(L-O)).


You may want to remove control statement-related alleles (the if statement, true/false, etc), as you're just looking for a plain mathematical formula, not a computer program per se.

Quote:
Original post by Nathaniel HammenI'm also concerned about the results I got on the second trial. It just couldn't come up with the answer. Is there any way I can prevent this, or do I need to check the program every once in a while, just in case. I was thinking that maybe I should introduce a brand new member into the population every once in a while, but this would probably make my programs worse.


There are other elements that may be problems, also. There are a number of parameters to tune before you should expect your algorithm to work well. Depending on the algorithm and representation you use, these can include mutation rate, crossover rate, degree of mutation, population size, fitness function, selection methodetc. Depending on the behavior of your population, you can tweak these and related values back and forth.
Could you give some specifics about your algorithm? For example, what crossover method you're using (if any), selection method, population size, and so on.

Share this post


Link to post
Share on other sites
Guest Anonymous Poster
Quote:
Original post by Nathaniel Hammen
I'm also concerned about the results I got on the second trial. It just couldn't come up with the answer. Is there any way I can prevent this, or do I need to check the program every once in a while, just in case. I was thinking that maybe I should introduce a brand new member into the population every once in a while, but this would probably make my programs worse.


There is lots of literature on this. GP is not a simple process (although that doesn't stop lots of people from just diving in, naively thinking it'll work first time). The best advice is simply to use google and read the 4 or 5 main conferences (and their papers and journals) which are devoted to e.g. "machine learning". There are hundreds of academic papers; many are just individual techniques, so they're kind of like a shopping list of "if you want to do X, read this paper".

You should especially look at:

- ADF == automatically defined functions.
- "bushiness" vs "depth"

there are many papers on each, and most contain algorithms you can copy and use in your own work.

You're probably best off just buying a recent book on GP, though, since a good one ought to cover a variety of techniques and algorithms...

Share this post


Link to post
Share on other sites
Guest Anonymous Poster
Be skeptical of the genetic algorithm. There is no garauntee that after n generations they will perform better. Genetic algorithms are a kind of educated random step. If you want to look at an algorithm that touches more closely to what I think you were talking about check out this paper (http://www.debonet.com/Research/Publications/1996/DeBonet-NIPS96-MIMIC.pdf). Basically this algorithm keeps track of the history of generations to better forecast the best next generation.

Share this post


Link to post
Share on other sites
Quote:
Original post by Woodsman
If you can, enforce a maximum depth of your trees, whether as a result of crossover or otherwise. When one child is too large, copy a random parent instead.

I already enforce a maximum depth on creation, so adding that to mutation as well will be no problem.


Quote:
Original post by Woodsman
Look up 'editing'. Basically, you give it patterns that go in and edit the results. An example would be pattern:
' (anything) | True' can be replaced with true.
Or 'iflessthanzero (some constant) x y' replace with x or y depending on the constant.

I already thought of doing this. Looking up previous ways of doing this will probably be useful. I'm a little leery of just destroying a "gene" that might be useful in a future generation. However, I also don't want to have my programs take up too much room. I'll look it up and see if anyone has a good solution to this. Off of the top of my head, I can only think of randomly deciding whether to do this or not; possibly, this random would be weighted to occur more often in larger trees.


Quote:
Original post by Woodsman
What I've generally seen is multiple runs, rather than larger populations or number of generations. There is no guarantee that any particular run will find a solution. Increasing the number of runs will increase the chance.

You may want to save, say, the ten best programs generated from one complete run, and introduce them into the next run either in the initial generation or later on. Perhaps even take these best 10 from several runs and then run them all together.

Good idea! I'll probably use this.


Quote:
Original post by kwatz
You may want to remove control statement-related alleles (the if statement, true/false, etc), as you're just looking for a plain mathematical formula, not a computer program per se.

The mathematical formula was only for a test to see whether my GP worked, and could evolve the answer. After I work out all of the kinks, I'm going to run my GP on a little battle arena type thing...


Quote:
Original post by kwatz
There are other elements that may be problems, also. There are a number of parameters to tune before you should expect your algorithm to work well. Depending on the algorithm and representation you use, these can include mutation rate, crossover rate, degree of mutation, population size, fitness function, selection methodetc. Depending on the behavior of your population, you can tweak these and related values back and forth.
Could you give some specifics about your algorithm? For example, what crossover method you're using (if any), selection method, population size, and so on.

Tree Selection
I have a struct called ReproductionRate that I plug numbers into at the beginning of running the program. This decides how many trees will be copied, how many will be crossed over, and how many will be mutated. The ToCopy amount of trees that had the best fitness score are copied to the next generation. To select the tree that will be mutated, I pick a ToMutat number of trees randomly from all of the trees except for the ToCopy amount of trees that had the lowest fitness score. For crossing over, I use each of the best ToCross amount of trees exactly once, and select which ones are to be crossed over with which ones randomly.

Node Selection
For both Cross Over and Mutation, I select which node to use the operation on completely randomly. I plan to change this in the future, because there are obviously more nodes near the bottom of the tree, which makes the likelyhood of a specific node being selected increases as the depth increases. This makes it so that alot of times, the node selected is on the bottom row, or just above that, which means that only three or so nodes are being crossed over or mutated.

Crossing Over
For crossing over, all I do is take the selected nodes and switch them. Also, of course I change the data so that the NodeCount is updated, and the ParentPtr points to the correct place.

Mutation
For mutation, I delete the selected node and just build random nodes as I do when creating the tree in the first place.


Quote:
Original post by Anonymous Poster
There is lots of literature on this. GP is not a simple process (although that doesn't stop lots of people from just diving in, naively thinking it'll work first time).

Like me! Except that I didn't think it would work perfectly the first time, and, "Ta da!" it didn't.


Quote:
Original post by Anonymous Poster
The best advice is simply to use google and read the 4 or 5 main conferences (and their papers and journals) which are devoted to e.g. "machine learning". There are hundreds of academic papers; many are just individual techniques, so they're kind of like a shopping list of "if you want to do X, read this paper".

You should especially look at:

- ADF == automatically defined functions.
- "bushiness" vs "depth"

there are many papers on each, and most contain algorithms you can copy and use in your own work.

Yes, google is always good. I'll look up those things.


Quote:
Original post by Anonymous Poster
You're probably best off just buying a recent book on GP, though, since a good one ought to cover a variety of techniques and algorithms...

A book... Paying money... Probably not. But, maybe. After all, I did buy "AI Game Programming Wisdom".


Quote:
Original post by Anonymous Poster
Be skeptical of the genetic algorithm. There is no garauntee that after n generations they will perform better. Genetic algorithms are a kind of educated random step. If you want to look at an algorithm that touches more closely to what I think you were talking about check out this paper (http://www.debonet.com/Research/Publications/1996/DeBonet-NIPS96-MIMIC.pdf). Basically this algorithm keeps track of the history of generations to better forecast the best next generation.

Reading it now... literally.

[Edited by - Nathaniel Hammen on September 15, 2004 6:42:39 PM]

Share this post


Link to post
Share on other sites
Just to let you know, most of my advice came from Koza's Genetic Programming.
Quote:
Original post by Nathaniel Hammen
Quote:
Original post by Woodsman
Look up 'editing'. ...

I already thought of doing this. Looking up previous ways of doing this will probably be useful. I'm a little leery of just destroying a "gene" that might be useful in a future generation. However, I also don't want to have my programs take up too much room. I'll look it up and see if anyone has a good solution to this. Off of the top of my head, I can only think of randomly deciding whether to do this or not; possibly, this random would be weighted to occur more often in larger trees.

Yeah, I meant it primarily in terms of the final result. It may be interesting to see the results of doing this at breeding time.

Share this post


Link to post
Share on other sites
Guest Anonymous Poster
Quote:
Original post by Nathaniel Hammen
I have programmed my own Genetic Program. To test it, I had it solve the distance between two points. On my first trial it got it by the tenth generation. On my second trial, I had it run for 224 generation before I decided to quit it. It was still off by more than 1,000. It also got slower and slower as the program trees evolved more and more complexity to try to solve such a simple problem.

On my third trial it got it on the 31st generation. I looked at the code that the third trial generated and it was essentially If(false&&a whole lot of shit , more shit , more if statements that eventually led to the algorithm). So these if statements were sort of like dominant and recessive genes. I guess thats a good thing, because some negative genes can carry over until they're needed, but it makes it so that there is a lot of code just to get sqrt((L-O)dot(L-O)).

I'm also concerned about the results I got on the second trial. It just couldn't come up with the answer. Is there any way I can prevent this, or do I need to check the program every once in a while, just in case. I was thinking that maybe I should introduce a brand new member into the population every once in a while, but this would probably make my programs worse.


Run in two stages.

First stage: find a valid solution using full featured GP

Second stage: find shortest equivilent by using only "crop" operators during crossover/mutation.

Stage 2 only garantees that irrelevant parts of the program tree are removed


Share this post


Link to post
Share on other sites
Look up academic papers by Terry Soule if you are concerned with the if(false && false) type bloat. He studies that stuff- phd dissertation, etc. Look for it on Citeseer.

The main problem I have with genetic tecniques is that it is (a) solution, not the solution, nor the solution within $epsilon of accuracy. This becomes important when you are working with functions you have no idea of how they are minima-, optima- continuousness-wise.

But yeah.

Look for GECCO papers- its a big GA/GP conference.

Share this post


Link to post
Share on other sites
Quote:
Original post by Nathaniel Hammen
The mathematical formula was only for a test to see whether my GP worked, and could evolve the answer. After I work out all of the kinks, I'm going to run my GP on a little battle arena type thing...


Gotcha.

Quote:
Original post by Nathaniel Hammen
Tree Selection
I have a struct called ReproductionRate that I plug numbers into at the beginning of running the program. This decides how many trees will be copied, how many will be crossed over, and how many will be mutated. The ToCopy amount of trees that had the best fitness score are copied to the next generation. To select the tree that will be mutated, I pick a ToMutat number of trees randomly from all of the trees except for the ToCopy amount of trees that had the lowest fitness score. For crossing over, I use each of the best ToCross amount of trees exactly once, and select which ones are to be crossed over with which ones randomly.


Can you give these numbers in terms relative to the population size? For instance, these values are often expressed as percentages; that is, on average, X% of the total population is mutated each generation, or Y% are candidates for crossover. Expressing them this way may help tell whether they may be too high or too low.

Quote:
Original post by Nathaniel Hammen
Node Selection
For both Cross Over and Mutation, I select which node to use the operation on completely randomly. I plan to change this in the future, because there are obviously more nodes near the bottom of the tree, which makes the likelyhood of a specific node being selected increases as the depth increases. This makes it so that alot of times, the node selected is on the bottom row, or just above that, which means that only three or so nodes are being crossed over or mutated.


Regarding mutation, keep in mind that it generally best serves its purpose when it has small, incremental effects. (Note "generally" - this isn't always the case) So it may not be bad thing that it tends to choose nodes with relatively few children. Changing these nodes will have fairly small effects, while changing nodes higher up in the tree will have broader effects.
On the other hand, you may or may not want the same result with crossover. While I'm not convinced that your current method is better or worse, one alternative could be to choose a level in the tree each with equal probability, then choose a node from within that level. This has the opposite problem, in a way - the root node has a much greater probability of being chosen than any one of the leaf nodes. Just something to think about.


Quote:
Original post by Nathaniel Hammen
Mutation
For mutation, I delete the selected node and just build random nodes as I do when creating the tree in the first place.


Something to consider that draws on my note above would be to change only the node itself instead of removing the entire subtree and replacing it. This would require the new node to take the same number of arguments as the old node, but it's arguably a smaller change.

Quote:
Original post by Nathaniel Hammen
A book... Paying money... Probably not. But, maybe. After all, I did buy "AI Game Programming Wisdom".


If you're attending college, you should look into journal article availability. Other alternatives are memberships to the ACM or IEEE - both offer digital access to certain publications. It'll be harder to find tutorial-like information, but it'll also have a broader wealth of topics.


Are you able to characterize the problem you're having with your results at all? For example, is the population converging to a bad solution, or is it running for many, many generations and never producing a good solution, or perhaps some other problem?

Share this post


Link to post
Share on other sites
Quote:
Original post by Vlion
Look up academic papers by Terry Soule if you are concerned with the if(false && false) type bloat. He studies that stuff- phd dissertation, etc. Look for it on Citeseer.

The main problem I have with genetic tecniques is that it is (a) solution, not the solution, nor the solution within $epsilon of accuracy. This becomes important when you are working with functions you have no idea of how they are minima-, optima- continuousness-wise.

But yeah.

Look for GECCO papers- its a big GA/GP conference.

I'll look his stuff up.

Quote:
Original post by kwatz
Quote:
Original post by Nathaniel Hammen
Tree Selection
I have a struct called ReproductionRate that I plug numbers into at the beginning of running the program. This decides how many trees will be copied, how many will be crossed over, and how many will be mutated. The ToCopy amount of trees that had the best fitness score are copied to the next generation. To select the tree that will be mutated, I pick a ToMutat number of trees randomly from all of the trees except for the ToCopy amount of trees that had the lowest fitness score. For crossing over, I use each of the best ToCross amount of trees exactly once, and select which ones are to be crossed over with which ones randomly.


Can you give these numbers in terms relative to the population size? For instance, these values are often expressed as percentages; that is, on average, X% of the total population is mutated each generation, or Y% are candidates for crossover. Expressing them this way may help tell whether they may be too high or too low.

Well, as I said, I can input different numbers for these values at run-time. Usually I use 10 for Copy, 80 for Cross, and 10 for Mutat.

Quote:
Original post by kwatz
Quote:
Original post by Nathaniel Hammen
Node Selection
For both Cross Over and Mutation, I select which node to use the operation on completely randomly. I plan to change this in the future, because there are obviously more nodes near the bottom of the tree, which makes the likelyhood of a specific node being selected increases as the depth increases. This makes it so that alot of times, the node selected is on the bottom row, or just above that, which means that only three or so nodes are being crossed over or mutated.


Regarding mutation, keep in mind that it generally best serves its purpose when it has small, incremental effects. (Note "generally" - this isn't always the case) So it may not be bad thing that it tends to choose nodes with relatively few children. Changing these nodes will have fairly small effects, while changing nodes higher up in the tree will have broader effects.
On the other hand, you may or may not want the same result with crossover. While I'm not convinced that your current method is better or worse, one alternative could be to choose a level in the tree each with equal probability, then choose a node from within that level. This has the opposite problem, in a way - the root node has a much greater probability of being chosen than any one of the leaf nodes. Just something to think about.

Well then, I'll use the same method I am now for Mutation, but for Cross-Over, I'll use some sort of curve, so that it will be more likely to select nodes at a middle depth.

Quote:
Original post by kwatz
Quote:
Original post by Nathaniel Hammen
Mutation
For mutation, I delete the selected node and just build random nodes as I do when creating the tree in the first place.


Something to consider that draws on my note above would be to change only the node itself instead of removing the entire subtree and replacing it. This would require the new node to take the same number of arguments as the old node, but it's arguably a smaller change.

For some functions, such as an if statement, that function is the only one with the same number and types of inputs... No change would occur.

Quote:
Original post by kwatz
Are you able to characterize the problem you're having with your results at all? For example, is the population converging to a bad solution, or is it running for many, many generations and never producing a good solution, or perhaps some other problem?

Hmmm... I never tried to characterize it. All I know is that it always either got the answer in under 50 generations, or would never come up with anything useful. However, the never coming up with anything was pretty rare; it's just that, when it did, it ran for a longer time, before I decided to make it give up.

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this