Genetic Algorithm and Sudoku

AndyTang · 2005-12-07T12:59:00

I've recently been interested in Sudoku puzzles and I'm trying to figure out a way to generate the puzzles. At first I thought I could use a genetic algorithm to generate a puzzle through generation iterations as I heard a few people have already tried. Here lies the first stumbling block. I have a decent fitness method and I can choose two puzzles from the population (randomly generated grid filled with numbers) but I cannot figure out a way to cross breed these two grids. Since there doesn't seem to be a way that would help determine a possible solution except by randomly copying numbers from one or the other. Any suggestions? BTW, does any one knows what determines the difficulty of a sudoku problem? Is it just the number of missing spaces.

Artificial Intelligence Programming

Started by AndyTang November 04, 2005 11:22 AM

63 comments, last by Rockoon1 18 years, 5 months ago

Jotaf

280

December 06, 2005 05:57 PM

Quote:Original post by Timkin
Quote:Original post by Jotaf
You're assuming that a # means there's a 50% chance of occuring a 1 (P(1)=50%) and 50% of a 0 (P(0)=50%).

No, I'm not assuming that. I'm simply saying that '#' is a wildcard, meaning that any gene depicted with that allele in the schema can take on any of its allele values in the chromosome. As to the probability of each allele value, that obviously depends on the frequency of specific allele values in the (sub)population that the schema represents (so I'm not assuming 2 or even 3 members... I'm not assuming any number at all).

Ok. Let me rephrase.
Choose 2 parents in the population, P1 and P2.
For any given allele, if it has the same value in both barents, crossing-over will always chose that value. If the values are different (eg. P1=0 and P2=1), and assuming your random number generator is good, there's a 50% chance that one will be chosen over the other. The schema for P1=1101 and P2=1000 is 1#0#, and each # has a 50% chance of being replaced by a 0 or a 1.

Now choose 3 parents. 1/3 probability of choosing a value from either one. Sure, there's a lot more # in any genotype: P1=1101, P2=1000, P3=0101 -> ##0#.
It seems, as you suggested, that this is a worse representation since there are many more random elements present. However, hidden in there are the probabilities I mentioned:
1st allele: P1=1, P2=1, P3=0 -> P(0)=33%, P(1)=67%
2nd allele: P1=1, P2=0, P3=1 -> P(0)=33%, P(1)=67%
4th allele: P1=1, P2=0, P3=1 -> P(0)=33%, P(1)=67%

As you can see, the result is biased towards having more Ones, because they appear more often in those positions in the parents. Actually, as you take into account more and more parents, the result is more likely to be the "weighted average" of the whole population. Interesting :)

It depends on the specific application whether this is good or bad. But more parents isn't generally "bad" in all cases like you're sugesting.

Jotaf

280

December 06, 2005 06:09 PM

To Rockoon, The Anonymous Poster (you should get an account BTW :)

Loved your post. Great insight on how GAs are really a directed search and using real evolution as an analogue is not always appropriate.

I think another reason why most people implement generations is because doing all the sorting and fitness evaluation just to find one bad individual is a kind of a waste of resources :P A programmer thinks "why not do it to 99 more individuals for free?"! You already got them sorted out.

BTW, I hope I don't forget the remark about diversity the next time I code a GA. There's always the question of when to stop it, and you got it right on.

Timkin

864

December 06, 2005 07:06 PM

Jotaf: obviously when taken with regard to a specific population, a schema is just a short hand notation for that set of individuals, which, as you say, has a particular probability distribution over allele values associated with it. However, that wasn't the point I was trying to get at by describing schema... and you have touched on the heart of the matter, which I was trying to describe. The more general the schema, the larger the proportion of the population it represents and correspondingly, the (typically) larger the spread in 'fitness' values represented within the subpopulation the schema represents. Furthermore, if you asked yourself what was the expect fitness of a member generated according to that subpopulation, then yes, it's the fitness of the expected member selected according to the aforementioned probability distribution.

If I write '#0##110#' though, without linking it to a specific population, then I'm talking about the full set of chromosomes that match that schema and my discussion is with reference to that theoretical population. Yes, that population has uniform statistics, but they're actually not important. What is important is that under some very mild assumptions, it's fairly easy to show that '#0##11#' represents a more diverse population than say '100#11#', with correspondingly broader density function over fitness values.

The key difference between talking about a general schema and one related to a specific population is actually the distribution of the schemata over subsets of the population represented by the most general schema. For two parents, there is only 1 schema that will be processed and so the GA can make a decision at that time, subject to evaluation of the offspring, as to the quality of that schema. If you start using more parents, that decision is far weaker and it will take more time for the population and algorithm to resolve the issue of whether or not the schemata represented by the parent set are of good quality (some might be good while others bad, causing some propogation of mixed good and bad information into the population).

I hope this makes my point clearer.

Cheers,

Timkin

kirkd

505

December 07, 2005 11:43 AM

Reading through the posts, there seems to be one piece missing. Timkin eludes to it in his shcemata explanation, but perhaps it is beneficial to state it explicitly.

Cross-over works in domains where partial solutions can exist and contribute to the overall solution independently of other partial solutions. This may seem obvious, but I believe it warrants mentioning. Going to the bimodal example, let's suppose that solution A and solution B are near but not at the top of their maxima. It is possible that they each represent exceptional but not optimal combined partial solution. By using cross-over, you can accomplish recombining the contextually best portions of solution A and B to give two solutions which are effectively optimal. Mutation can also achieve this, but not as efficiently.

Regarding small vs. large mutations, I've always found it beneficial to have a range of mutation operators that span minimally to maximally destructive. Minimally destructive mutation corresponds to the point mutations which lead to stochastic hill-climbing. Maximally destructive mutations, I've found, help to keep the population diverse and introduce new information. Given, late in a run, a maximmaly destructive mutation operator will create a poor solution compared to the remainder of the nearly converged population.

Finally, there are indeed examples of crossover with more than 2 parents. I'll have to search a bit to find them...

Just my $0.02.

-Kirk

Rockoon1

104

December 07, 2005 12:59 PM

Quote:Original post by Jotaf
To Rockoon, The Anonymous Poster (you should get an account BTW :)

Done.

Quote:Original post by Jotaf
I think another reason why most people implement generations is because doing all the sorting and fitness evaluation just to find one bad individual is a kind of a waste of resources :P A programmer thinks "why not do it to 99 more individuals for free?"! You already got them sorted out.

Even in those cases where re-ranking is necessary, the arbitrary concept of 'generations' is probably not the best way to minimize the cost of that operation. :)

Also, many implimentations (problem-specific) do not require a sorted population at all. They just require the knowledge of which member is worst.

Genetic Algorithm and Sudoku

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

Genetic Algorithm and Sudoku

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

Reticulating splines