Simulated Anealing vs. Genetic Algorithimns

Started by
9 comments, last by Rockoon1 17 years, 1 month ago
which is better?
Advertisement
Depends.
Neither of them works particulary well in my opinion.

If its a non-linear regression problem, I strongly suggest:
-Levenberg-Marquardt's method
-Nelder-Mead's method
-Support Vector Machines (SVM) for regression.

LM is probably the most robust (and most used!) method, but it requires the computation of derivatives, where the other twos dont. NM do not require the computation of derivative, instead it will perform dynamic evaluation of the cost function. SVM for regression work on a set of examples, but will not compute a set of parameters like the previous ones.

If its a classification problem, I strongly suggest SVM. Theyre state-of-the-art right now. Also, you would be surprised on how well classification trees work in most cases. You can also look into Estimation-Maximization.

Really depends on what is the problem you are trying to solve.
Steadtler - how do you avoid local minima with any of those methods. They're all gradient based and don't get around the local problem unless extra measures are taken.

In comparing SA and GA, remember that GAs give you multiple start points whereas SA is limited to a single individual.

-Kirk
Quote:Original post by kirkd
Steadtler - how do you avoid local minima with any of those methods. They're all gradient based and don't get around the local problem unless extra measures are taken.

In comparing SA and GA, remember that GAs give you multiple start points whereas SA is limited to a single individual.

-Kirk


The only method that I am aware of that *can* avoid local minima is the SVM, which isnt gradient based by the way. If you choose the right kernel, then the non-linear problem becomes linear in the higher-order space. It does takes some tries to find the best-suited kernel, and I admit the chances of finding a perfect kernel are better in classification problem than in regression ones.

LM is only half-gradient based, and Nelder-Mead is, well, not exactly gradient based. You can try to avoid local minima in Nelder-Mead by restarting the algorithm around the last minima found. Ive had *excellent* results with it, altough the computation cost can be high. GA are basically a multiple-start point gradient descent method, they can and will get stuck on local minima.

Avoiding local minima is, of course, a problem in each and every numerical regression method that exist today.
Thanks for the details. I agree, local minima are a major issue. I didn't realize the Nelder-Mead wasn't gradient based - I'll have to look that one up.

As for SVM, I'm not convinced yet. I know that projection into higher dimensions can potentially identify a seperating hyperplane, but I've seen much better results come from ensembles of decision trees. Actually, I've seen ensemble of decision trees (Random Forests, mostly) outperform just about every other method so far - NN, SVM, Bayes.

EDIT:

I forgot to mention that I would say Steadtler is right that the methods he cites are best of numerical optimization problems. GAs and SAs become more interesting when you have an obscure or poorly defined optimization problem, or a massive search space within which you may have a chance at finding a solution in less than geologic time.

For example, finding a best fit line, maximal margin seperating hyperplane, or optimal weights for a defined neural network topology would fall under numerical optimization and Steadtler's ideas are best suited.

Finding a solution for the Rubik's cube (often asked on this forum) would NOT be a good place for GAs. An informed search method is your best bet.

Constructing novel neural network topologies or finding unique solutions to electronic component problems might be a place to try Evolutionary Computation. Check out NEAT and John Koza's latest book.

Just my $0.03.
While a GA has multiple start points, this isnt really a feature that can be used to make a valid comparison.

SA's dont need multiple starting points because the system relies on it being improbable for it to settle on a bad hill. There is no garantee of optimality with either methodology and both have a reasonably good probability of settling one of the best hills.


The nice thing about SA's is its graceful reduction into a gradient descent as it cools. Most of the GA's I have implimented kick out the best members of each generation and after population diversity crashes, those members are then passed through a more specific gradient descent algorithm. The second step isnt desirable with SA since it always settles on a local max (or min), so SA has the advantage of simplicity.

Unfortunately, good cooling rates for SA's are hard to define without an intimate knowledge of the state space topology. I always end up being way over on the safe side and have a much slower cooling rate than necessary.
One has to realize that SA and GAs (especially GAs) were designed as generic tools for solving problem where you basically have no idea what the solution may look like. As kirkd says, poorly defined problems, and mainly because we don't know how to define it any better. So, in general, algorithms like SA and GA are more like exploratory tools to find out what the solution looks like when we are handed with a problem with too many unknows, specifically, problems where we know what we're looking for, but not sure how to get it. GAs, for example, are notorious in finding loop holes in badly designed or defined problems.

Overall, given that you know exactly what the problem is and what you are looking for, you're better off using a method specifically designed to tackle that specific problem. A specialized tool will always work better than a general tool. Though you could customize a general tool to solve for something specific, most of the times, the effort and time spect is just not worth it.

Also, asking which is better is like asking whether a wrench is better than a screw driver. They were both designed with different types of problems in mind. So, it still boils down to what you want to do.
There is a lot of isomorphism between GA and SA. For example, the mutations in GA and the neighbors in SA can be seen as being isomorphic. The selection strategy in GA and the acceptance strategy in SA (including cooling schedule) are also pretty much isomorphic.

The only truly non-isomorphic thing is the combination operator in GA. This is quite unique in classical metaheuristics: instead of just letting a large number of solutions evolve individually to better solutions via mutations, we try to combine solutions into better ones. Personally, I think the GA community should concentrate more heavily on finding good combination strategies.

So in a way I would say that GA is "more powerful" since SA can be seen as a special case of GA.

- Mikko
I've used SA and GA a fair amount, particularly in high-dimensional spaces. In my experience, they both work about the same, jiggling away from high-up minima but often getting bogged down in lower-down local minima. Of the two, I rather prefer simulated annealing, which requires much less tweaking of parameters and inventing of operators and sprinkling of magical voodoo herbs to work decently.

But WeirdoFu is right: Both of these should be in the bottom of your toolbox, not the top. If you know stuff about the problem in question, GAs and SAs will willfully ignore that knowledge and do their own irascibly stochastic thing.

This topic is closed to new replies.

Advertisement