# Genetic Algorithm and Sudoku

This topic is 4431 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

## Recommended Posts

Quote:
Original post by Timkin
Quote:
 Original post by RPGeezusFor cross-over to work we must satisfy some not-so-trivial criteria. I thought we were all on agreement in this regard, at least semantically.

No, I don't see agreement on this point. From my experience, crossover will always be effective on problems that are solveable by a directed random walk and crossover will perform better than said random walk.

"your attributes should be as orthogonal as possible. This has implications on how effectively you can search the objective function surface and how effective your operators are in the GA."

and proceeded to highlight proper schema encoding.

I consider this non-trival.

Quote:

Quote:
 Original post by TimkinBecause crossover between two parents preserves as many schemata presently in the population as possible, while possibly forming new schemata which can be evaluated during selection.

That doesn't make any sense to me, Tim. Why would 3 parents be less effective than 2? If I cut a pie in 2 or in 10, it's still all part of the same pie. The number '2' would be suspicious to someone looking for evidence of anthropomorphic biases...

Quote:
 Original post by TimkinOkay, a brief introduction to Schema Theory......

Sorry, I can't say I fully understand your example. The masking you describe is only masking if we use binary states.

i.e.

cat dog    bear  ratcat monkey owl   rat

The child could be:

cat dog/monkey bear/owl rat

but never

cat turkey rabbit rat

Doesn't this satisfy "GAs preserve good quality schema"? Adding a third parent wouldn't change that.

When, earlier, I said that the optimal number of parents might be related to dimensions in solution space I was considering the following:

a = f( x, y, z);

wher x, y and z are a number between -1 and 1.

I want x, y, and z so that a = some number.

member 1 has the X part solved (magically), member 2 has the y part solved, and member 3 has the z part solved. Allowing 3 parents I could actually get the correct solution right away. Given two parents I would require at least two generations.

Quote:
Original post by Timkin
Quote:
 Will cross-over, without mutation, ever find a solution?

If will find the best solution available given the initial information contained in the population. See above.

I take it you mean 'no, unless.......'.

Quote:
Original post by Timkin
Quote:
 Will mutation, without cross-over, ever find a solution?

Possibly... and that is the best that can be said.

The same can be said of using mutation AND cross-over. I'll take your answer as 'regretfully yes'.

Quote:
Original post by Timkin
Quote:
 We're talking about a problem with more than one global maxima. In the hypothetical example there is more than one true global maxima.

If they are both (all) global maxima, then you cannot care which global maxima you attain,

I do care, because if we're talking about swaping components of solutions, and those componets deal with solutions located at different global maxima, then they will be incompatible. By 'will' I mean 'will likely'.

Will

##### Share on other sites
wowww i actually was thinking about making a sudoku solver program today and was just about to start writing it. Just wanted to check the active topics real quick and saw other people make programs around it too.

##### Share on other sites
Sorry if i'm missing something (i did skip over a lot of what i thought was irrelevant to what i'm about to ask) but if you MAKE the puzzle by starting with a completed one and taking away some values, what prevents you from creating a puzzle with more than oen solution? Surely there has to be a certain way you have to make them blank, right? I havent tried this yet so I don't know if it's true, it just seems like a possibility to me.

##### Share on other sites
Quote:
 Original post by RPGeezusYou had said earlier that:"your attributes should be as orthogonal as possible."

Again you're not taking that in the context of what I wrote. Any state space can be mapped into an orthogonal space (or locally orthogonal space) prior to encoding... and in these spaces, the rate of convergence of the GA will be higher (so crossover could be said to be more effective).

Quote:
 Sorry, I can't say I fully understand your example. The masking you describe is only masking if we use binary states.

...a common misconception regarding GAs. Any string made up of characters of cardinality >2 can be mapped into a string with characters of cardinality 2. That is, I can trivially represent the gene 'cat' and its alleles ('tabby','moggy' and 'tom', for example ;) ) using the binary sequences '10','01' and '11' (or any other sequence I choose). Any set of tokens that you wish to use for your domain can be mapped into binary strings and thus I can still discuss the merits of the search algorithm applied to your example, but do so in the space of binary encoding strings.

Quote:
 Doesn't this satisfy "GAs preserve good quality schema"? Adding a third parent wouldn't change that.

Yes, it would. Consider the schema you get if the third parent you select for the tuple is 'goat turkey fox elephant'. What are the set of offspring now? And what values of the objective function would be found in that set? The variance of the objective function over the offspring set is a measure of the quality of the schema represented in the parents. The larger the offspring set, the more likely a higher variance (and vice versa).

Quote:
 When, earlier, I said that the optimal number of parents might be related to dimensions in solution space I was considering the following:a = f( x, y, z);wher x, y and z are a number between -1 and 1.I want x, y, and z so that a = some number.member 1 has the X part solved (magically), member 2 has the y part solved, and member 3 has the z part solved. Allowing 3 parents I could actually get the correct solution right away.

That's simply not true, since you can make no assumptions about whether the correct (solved) part of each parent chromosome will end up in the offspring. Furthermore, you're assuming that you are performing multi-site crossover, which is not a canonical operator.

Quote:
 I take it you mean 'no, unless.......'.

No, that's not what I'm saying. What I said was that crossover will find the optimal solution represented by the initial population. To borrow from your example earlier... if your initial population didnt contain the allele 'hound' for the dog gene, but this was the value of the allele in the global maxima, then obviously crossover cannot find that global maxima. However, it will allow you to find the string of highest fitness that IS contained within the set defined by the combinations of alleles in the initial population. Make sense?

Quote:
 The same can be said of using mutation AND cross-over. I'll take your answer as 'regretfully yes'.

You're doing it again! No, thats NOT what I said. In the absence of selection pressure, neither method is anthing better than randomly generating guesses at the answer. With the above caveat regarding crossover, they both are unlikely to find the global maxima. If we permit selection pressure, then mutation is still just randomly generating solutions with no regard to the solutions already tried. Crossover however makes use of the fact that you've already done testing and work in previous generations.

Quote:
 I do care, because if we're talking about swaping components of solutions, and those componets deal with solutions located at different global maxima, then they will be incompatible. By 'will' I mean 'will likely'.

They're not incompatible at all, just less 'fit', in which case they'll be discarded through selection, just as they would be if the mutation were unfit. No one ever claimed that a GA (and in particular crossover) only ever produced candidate solutions at each generation that had higher fitness than the previous generation. The only claim made by a GA is that the expected fitness of the population at each iteration is monotonically increasing.

Cheers,

Timkin

##### Share on other sites
Quote:
 Original post by Timkin and in these spaces, the rate of convergence of the GA will be higher (so crossover could be said to be more effective).

If I understand correctly, the degree to which the encoding is orthagonal is proportional to the effectiveness of cross-over.

Quote:
 ...a common misconception regarding GAs. Any string made up of characters of cardinality >2 can be mapped into a string with characters of cardinality 2. That is, I can trivially represent the gene 'cat' and its alleles ('tabby','moggy' and 'tom', for example ;) ) using the binary sequences '10','01' and '11' (or any other sequence I choose). Any set of tokens that you wish to use for your domain can be mapped into binary strings and thus I can still discuss the merits of the search algorithm applied to your example, but do so in the space of binary encoding strings.

I don't see it... Treating my operands like binary strings at a binary level changes the entire behaviour.

cat dog rat .... octopus
1011 1000 0100 .... 1111

If I say cat/rat, I'm not saying all values between 0100 and 1011. I'm not saying 0/1 0/1 0/1 0/1... I am specifically saying nibble 1011 _or_ nibble 0100, not a combination thereof.

If my encoding is specific, as in cat / dog / rat / octopus then I litereally mean I want crossover to chose ONE OF but not a mixture of or a range of values in between. The high-level representation serves to eliminate options. Just representing it as binary, and allowing the GA to work with the binary, does not let us eliminate options.

Quote:
Original post by Timkin
Quote:
 Doesn't this satisfy "GAs preserve good quality schema"? Adding a third parent wouldn't change that.

Yes, it would. Consider the schema you get if the third parent you select for the tuple is 'goat turkey fox elephant'. What are the set of offspring now? And what values of the objective function would be found in that set? The variance of the objective function over the offspring set is a measure of the quality of the schema represented in the parents.

The larger the offspring set, the more likely a higher variance (and vice versa).

I still do not see how this changes the schema. All of the same operands exist, in the same order they appeared in the original three parents, but in a different combination.

Quote:
Original post by Timkin
Quote:
 When, earlier, I said that the optimal number of parents might be related to dimensions in solution space I was considering the following:a = f( x, y, z);...

That's simply not true, since you can make no assumptions about whether the correct (solved) part of each parent chromosome will end up in the offspring.

This is no different than with two parents. We can make no assumptions about whether the correct (solved) part of each parent chromosome will end up in the offspring.

Quote:
 Original post by TimkinFurthermore, you're assuming that you are performing multi-site crossover, which is not a canonical operator.

I don't know what you mean by this.

Quote:
Original post by Timkin
Quote:
 I take it you mean 'no, unless.......'.

No, that's not what I'm saying. [...] it will allow you to find the string of highest fitness that IS contained within the set defined by the combinations of alleles in the initial population. Make sense?

That's what I understood the first time. I was trying to be funny, but I guess it didn't come across well via text. :)

We'll get the best solution possible given the current operands that exist. If we're missing an operand then...... At which point crossover will just be working to randomize the order of the operands.

Quote:
Original post by Timkin
Quote:
 The same can be said of using mutation AND cross-over. I'll take your answer as 'regretfully yes'.

In the absence of selection pressure ....

What, exactly, do you mean by this? We always have selection pressure-- the fit stay and multiply.

Quote:
 Original post by Timkinthey both are unlikely to find the global maxima. If we permit selection pressure, then mutation is still just randomly generating solutions with no regard to the solutions already tried.

With mutation, a child is still mostly it's parent. Doesn't this mean that it is an attempt to refine an already tried solution?

If we permit elitism (which is a pretty common practice I think) then we establish a nice baseline for testing out these new changes.

Quote:
Original post by Timkin
Quote:
 I do care, because if we're talking about swaping components of solutions, and those componets deal with solutions located at different global maxima, then they will be incompatible. By 'will' I mean 'will likely'.

They're not incompatible at all, just less 'fit', in which case they'll be discarded through selection, just as they would be if the mutation were unfit. No one ever claimed that a GA (and in particular crossover) only ever produced candidate solutions at each generation that had higher fitness than the previous generation. The only claim made by a GA is that the expected fitness of the population at each iteration is monotonically increasing.

No problem-- maybe I'm looking at it from a different perspective. You say less fit, I say not compatible, which means less fit to the evaluation function.

Crossing over a member at peak A with a member at peak B is very unlikely to produce a good candidate, as they are both set on very different goals. They have a high likelyhood of being incompatible. Cross a BMP with an EXE and you'll likely get neither a pretty picture or a working application. This is all I'm saying.

If the members were clustered around the same peak then I could see crossover working well.

Will

##### Share on other sites
Quote:
 Original post by RPGeezusIf I understand correctly, the degree to which the encoding is orthagonal is proportional to the effectiveness of cross-over.

As a general rule, I would say yes... but I'm not certain that it's an absolute.

Quote:
 I don't see it... Treating my operands like binary strings at a binary level changes the entire behaviour.

Only if you fail to encode correctly, so that there is a 1-1 mapping from your 'label' encoding to the binary encoding. 'cat' and 'dog' are just labels, as are '00' and '010'. So long as there are no binary alleles without corresponding 'string' alleles, then which I use is irrelevant.

Quote:
 If my encoding is specific, as in cat / dog / rat / octopus then I litereally mean I want crossover to chose ONE OF but not a mixture of or a range of values in between.

...and obviously crossover can be chosen only to apply at gene boundaries. If you crossover between boundaries (and meet the stipulation of a 1-1 mapping), then crossover is performing selective mutation (ie. its mapping the 'split' allele to another allele (depending on the other part of the allele string it receives from the parent) as well as crossing the tails of the two chromosomes. This is where your confusion lies. It's difficult to imagine this working if my encoding is 'tabby', 'moggy' and 'tom', but not difficult if my encoding is '00', '01', etc.

The high-level representation serves to eliminate options. Just representing it as binary, and allowing the GA to work with the binary, does not let us eliminate options.

Quote:
 I still do not see how this changes the schema. All of the same operands exist, in the same order they appeared in the original three parents, but in a different combination.

Remember that the schema represents the possible set of allele values for that set of chromosomes. So, if each of the parents has the same allele for the gene 'cat', then the schema will have that allele. If any one parent has a different allele than the others, then the schema will have a wildcard. The larger the set of chromosomes considered, the more likely that the schema representing that set will have more wildcards. Subsequently, my comments about the type of offspring that set will produce stand. Indeed, this is the whole point of GAs. As the population nears convergence, the population schema is refined (the number of wildcards are reduced) until, at convergence, the only possible offspring (ignoring mutation during reproduction) is a copy of the parents.

Obviously, all pairwise schema still exist if you have 3 parents... but if you have 3 parents, then pairwise schema don't count, because presumably you're using all 3 parents in reproduction at the same time (the GA-orgy!). Otherwise, you're just doing pairwise reproduction. See my point?

Quote:
 This is no different than with two parents. We can make no assumptions about whether the correct (solved) part of each parent chromosome will end up in the offspring.

Absolutely. But the probability of it happening is higher. (1/2 vs 1/3).

Quote:
 Crossing over a member at peak A with a member at peak B is very unlikely to produce a good candidate, as they are both set on very different goals.

I don't think that's absolutely true. In certain problems that have multiple global maxima (for example, those where any 'solution' is a global maxima, such as N-queens), there may indeed be properties common to every solution. Thus, crossing over any two solutions may preserve those properties. I can't back this up with hard evidence, but Terry Kwok's work seems to suggest that, at least for problems like N-queens, there is a sort of subspace made up of all solutions that his algorithm can work on. Certainly, the rate of finding solutions is proportional to the number already found. Admittedly his isn't a GA approach, but it may be the case that appropriate encodings exist for GAs for such problems where this 'subspace' property can be exploited. Maybe not though.

Cheers,

Timkin

##### Share on other sites
Quote:
 Original post by TimkinSo, now take our original 3 candidate solutions as a triplet. The schema resulting from this tuple is '###1####0', which, as you can see, is the worst (most general) schema from the pairwise set. Candidate solutions generated by producing offspring from a 3 way crossover would be randomly selected from this set and thus, would randomly select one of the possible fitness values spanned by this set. The probability of producing good quality offspring would be less than if pairwise crossover were performed.

Ok, let me just point out one tiny little detail :)

You're assuming that a # means there's a 50% chance of occuring a 1 (P(1)=50%) and 50% of a 0 (P(0)=50%). This is how you arrived at your conclusion: information is lost. The thing is, if you're chosing from 3 parents, and for example one of them has a 0 and two of them have 1s, the probabilities are actually: P(0)=33% and P(1)=67%. So it still preserves the information you said was lost. It's still arguable whether 3 parents are better or worse than 2.

##### Share on other sites
(I hate paralel discussions in the same thread but what the heck)

Quote:
 Original post by givebloodSorry if i'm missing something (i did skip over a lot of what i thought was irrelevant to what i'm about to ask) but if you MAKE the puzzle by starting with a completed one and taking away some values, what prevents you from creating a puzzle with more than oen solution? Surely there has to be a certain way you have to make them blank, right? I havent tried this yet so I don't know if it's true, it just seems like a possibility to me.

I don't think you have to do that. If there's another way to solve the puzzle and you found it, kudos to you :P It's a good thing in my opinion and I don't think no one will argue that a Sudoku puzzle generator will *have* to ensure there's only one solution. My proposed algorythm is simple and neat (and it's not mine :P ) and I'd be surprised if it's not the most common out there.

##### Share on other sites

All of this is of course in my humble opinion, but I figure I will give my two cents anyways.

I have quite a lot of experience with GA's going back into the late 80's. I guess the darwinian romance has worn off.

One of my biggest pet peves is the reliance on 'real world' visualizations of what GA's are supposedly implimenting, and then blindly implimenting them. A Genetic Algorithm is nothing more than a generalized directed stochastic search.

What defines a GA is how you move from one population to another.

Population(t+1) = GA(Population(t))

This GA() function must satisfy a few various well known properties, but none of them has the notion of a 'generation' in it. The DNA is not a description of an individual. It is a piece of information that can be used in a directed search.

Keep thinking 'directed search.'

Why exactly should we divide the creation and destruction of this information into large scale changes? After producing a new piece of information, don't you want to use it immediately?

Most people seem to be doing:

10 Produce a lot of new information from the set of retained information
20 Merge it all into the set of retained information
30 Goto 10

10 Produce a new piece of information from the set of retained information.
20 Merge it into the set of retained information
30 Goto 10

Or in GA speak:

10 Produce a new member of the population using crossover and/or mutation.
20 Merge Sort the member into the population, throwing out the worst member.
30 Goto 10

The generation method with regard to algorithm implimentation is a needless complication at best (can anyone justify it?) and detrimental to efficiency at worst.

As far as crossover goes.

Crossover can be completely useless or extremely valuable. There seems to be no middle ground here. This depends on the specific problem set. In general, when crossover isnt a valuable commodity, you should not be using a genetic algorithm.

What does crossover do?

Technically: Crossover takes two strings of information and attempts to form a union between them, in the off chance that a more correct string of information can be derived from such unions.

In those cases where crossover isnt doing its job (and there happens to be work to be done), it is because the breeding population isnt diverse enough. Population diversity is critical in all areas of GA's. Keep thinking 'directed search.'

Without diversity and crossover, all you have is a stochastic hill climber.

I have been measuring diversity using the mean squared difference between the DNA in the breeding population.

When the diversity approaches zero, I know that its time to stop running the genetic algorithm.

I don't know if the best solution has been found or not, but I know its now in a stochastic hill climb and I know *for certain* that there are algorithms that perform a lot better than a stochastic hill climb that I can seed with the information attained thus far.

- Rockoon (Joseph Koss)

##### Share on other sites
Quote:
 Original post by JotafYou're assuming that a # means there's a 50% chance of occuring a 1 (P(1)=50%) and 50% of a 0 (P(0)=50%).

No, I'm not assuming that. I'm simply saying that '#' is a wildcard, meaning that any gene depicted with that allele in the schema can take on any of its allele values in the chromosome. As to the probability of each allele value, that obviously depends on the frequency of specific allele values in the (sub)population that the schema represents (so I'm not assuming 2 or even 3 members... I'm not assuming any number at all).

Quote:
 This is how you arrived at your conclusion: information is lost.

No, it isn't, since you were wrong about my first statements. A schema doesn't represent 'information lost'.

Timkin

##### Share on other sites
Quote:
Original post by Timkin
Quote:
 Original post by JotafYou're assuming that a # means there's a 50% chance of occuring a 1 (P(1)=50%) and 50% of a 0 (P(0)=50%).

No, I'm not assuming that. I'm simply saying that '#' is a wildcard, meaning that any gene depicted with that allele in the schema can take on any of its allele values in the chromosome. As to the probability of each allele value, that obviously depends on the frequency of specific allele values in the (sub)population that the schema represents (so I'm not assuming 2 or even 3 members... I'm not assuming any number at all).

Ok. Let me rephrase.
Choose 2 parents in the population, P1 and P2.
For any given allele, if it has the same value in both barents, crossing-over will always chose that value. If the values are different (eg. P1=0 and P2=1), and assuming your random number generator is good, there's a 50% chance that one will be chosen over the other. The schema for P1=1101 and P2=1000 is 1#0#, and each # has a 50% chance of being replaced by a 0 or a 1.

Now choose 3 parents. 1/3 probability of choosing a value from either one. Sure, there's a lot more # in any genotype: P1=1101, P2=1000, P3=0101 -> ##0#.
It seems, as you suggested, that this is a worse representation since there are many more random elements present. However, hidden in there are the probabilities I mentioned:
1st allele: P1=1, P2=1, P3=0 -> P(0)=33%, P(1)=67%
2nd allele: P1=1, P2=0, P3=1 -> P(0)=33%, P(1)=67%
4th allele: P1=1, P2=0, P3=1 -> P(0)=33%, P(1)=67%

As you can see, the result is biased towards having more Ones, because they appear more often in those positions in the parents. Actually, as you take into account more and more parents, the result is more likely to be the "weighted average" of the whole population. Interesting :)

It depends on the specific application whether this is good or bad. But more parents isn't generally "bad" in all cases like you're sugesting.

##### Share on other sites
To Rockoon, The Anonymous Poster (you should get an account BTW :)

Loved your post. Great insight on how GAs are really a directed search and using real evolution as an analogue is not always appropriate.

I think another reason why most people implement generations is because doing all the sorting and fitness evaluation just to find one bad individual is a kind of a waste of resources :P A programmer thinks "why not do it to 99 more individuals for free?"! You already got them sorted out.

BTW, I hope I don't forget the remark about diversity the next time I code a GA. There's always the question of when to stop it, and you got it right on.

##### Share on other sites
Jotaf: obviously when taken with regard to a specific population, a schema is just a short hand notation for that set of individuals, which, as you say, has a particular probability distribution over allele values associated with it. However, that wasn't the point I was trying to get at by describing schema... and you have touched on the heart of the matter, which I was trying to describe. The more general the schema, the larger the proportion of the population it represents and correspondingly, the (typically) larger the spread in 'fitness' values represented within the subpopulation the schema represents. Furthermore, if you asked yourself what was the expect fitness of a member generated according to that subpopulation, then yes, it's the fitness of the expected member selected according to the aforementioned probability distribution.

If I write '#0##110#' though, without linking it to a specific population, then I'm talking about the full set of chromosomes that match that schema and my discussion is with reference to that theoretical population. Yes, that population has uniform statistics, but they're actually not important. What is important is that under some very mild assumptions, it's fairly easy to show that '#0##11#' represents a more diverse population than say '100#11#', with correspondingly broader density function over fitness values.

The key difference between talking about a general schema and one related to a specific population is actually the distribution of the schemata over subsets of the population represented by the most general schema. For two parents, there is only 1 schema that will be processed and so the GA can make a decision at that time, subject to evaluation of the offspring, as to the quality of that schema. If you start using more parents, that decision is far weaker and it will take more time for the population and algorithm to resolve the issue of whether or not the schemata represented by the parent set are of good quality (some might be good while others bad, causing some propogation of mixed good and bad information into the population).

I hope this makes my point clearer.

Cheers,

Timkin

##### Share on other sites
Reading through the posts, there seems to be one piece missing. Timkin eludes to it in his shcemata explanation, but perhaps it is beneficial to state it explicitly.

Cross-over works in domains where partial solutions can exist and contribute to the overall solution independently of other partial solutions. This may seem obvious, but I believe it warrants mentioning. Going to the bimodal example, let's suppose that solution A and solution B are near but not at the top of their maxima. It is possible that they each represent exceptional but not optimal combined partial solution. By using cross-over, you can accomplish recombining the contextually best portions of solution A and B to give two solutions which are effectively optimal. Mutation can also achieve this, but not as efficiently.

Regarding small vs. large mutations, I've always found it beneficial to have a range of mutation operators that span minimally to maximally destructive. Minimally destructive mutation corresponds to the point mutations which lead to stochastic hill-climbing. Maximally destructive mutations, I've found, help to keep the population diverse and introduce new information. Given, late in a run, a maximmaly destructive mutation operator will create a poor solution compared to the remainder of the nearly converged population.

Finally, there are indeed examples of crossover with more than 2 parents. I'll have to search a bit to find them...

Just my \$0.02.

-Kirk

##### Share on other sites
Quote:
 Original post by JotafTo Rockoon, The Anonymous Poster (you should get an account BTW :)

Done.

Quote:
 Original post by JotafI think another reason why most people implement generations is because doing all the sorting and fitness evaluation just to find one bad individual is a kind of a waste of resources :P A programmer thinks "why not do it to 99 more individuals for free?"! You already got them sorted out.

Even in those cases where re-ranking is necessary, the arbitrary concept of 'generations' is probably not the best way to minimize the cost of that operation. :)

Also, many implimentations (problem-specific) do not require a sorted population at all. They just require the knowledge of which member is worst.