Representing decision trees as 'genes'

Artificial Intelligence Programming

Started by choffstein July 08, 2007 06:50 PM

24 comments, last by Timkin 16 years, 9 months ago

505

July 13, 2007 10:30 AM

Yeah, I have a similar response from my wife. Usually it's more of a general glazing of the eyes and occassionally I have to wake her up after I've been talking for a while. Ah, well...

As for MML vs MDL, the formulation for MDL uses a binomial distribution of the predictions which I can reasonably estimate by combinatorics computations. This is used to get a handle on the probability of having m incorrect predictions out of n observations ( n Choose m ). The length of the model is very straight forward and can ultimately be condensed down to number of non-leaf nodes in the tree. I can do this because every tree has access to the same data, so selection of features cancels out between trees - every tree has the same selection opportunity.

The only formulation I've seen for MML is based on Bayesian inference and probability distributions of classes in the feature space, so I need to have an expected distribution for the data. The features I use in computational chemistry are never Gaussian and many are binary features (is this chemical substructure present or not?). I could easily get away from Gaussian requirements by using a different distribution as the expected distribution and keeping things non-parametric (naive Bayes, for example), but not all features have the same distribution and worse yet, the distributions are quite often very ugly.

Steadtler

220

July 13, 2007 09:08 PM

Computer science graduate labs? Thats where I usually meet them :) You'ld be surprised... and no, engineering doesnt work.

Timkin

864

July 15, 2007 09:35 PM

OT:

Quote:Original post by Steadtler
Computer science graduate labs? Thats where I usually meet them :) You'ld be surprised... and no, engineering doesnt work.

You mean you've tried engineering them yourself? I think I saw a movie about that once that had Kelly Labrock in it! ;)

I must have been unlucky... the IT grad-chicks that I knew were nothing special and couldn't talk about anything beyond their thesis... now, the girls in atmospheric science and atrophysics, where I started out... now they were true geek-girls (and cute)!

Steadtler

220

July 17, 2007 08:45 PM

Quote:Original post by Timkin
OT:

Quote:Original post by Steadtler
Computer science graduate labs? Thats where I usually meet them :) You'ld be surprised... and no, engineering doesnt work.

You mean you've tried engineering them yourself? I think I saw a movie about that once that had Kelly Labrock in it! ;)

I must have been unlucky... the IT grad-chicks that I knew were nothing special and couldn't talk about anything beyond their thesis... now, the girls in atmospheric science and atrophysics, where I started out... now they were true geek-girls (and cute)!

Haha, no I meant that most women in engineering facs are bitter and resentful toward humanity in general, and men in particular. More than usual, I mean. And its mostly the guy's fault...

Yes, physics girls are hot, but usually too wierd for me.

Back on decision trees. Say I have a problem with a lot of items with the same variables that are part of different classes. Most DT would consider that as noise. Now I want to consider my set as a distribution instead. I guess what Im thinking about is a bayesian decision tree, but Im not sure its the same thing. Initially, all vars are free. I want to build the tree by finding the variable at each node that, by fixing its value, has the greatest effect on the posteriori distribution. Stopping criterion would be 1) negligable effect on posteriori distribution or 2)only one class in the posteriori distribution.

Does that make sense? Is that what they call bayesian DT? Know any work similar to this?

thanks guys.

kirkd

505

July 18, 2007 10:24 AM

I haven't done anything with Bayesian decision trees, but what you describe sounds like standard decision trees to me. At each node the induction algorithm tries to find the "best" possible split. How you define "best" is up to you, so my guess would be that you can easily create a split criteria that is related to your desired distributional effect.

-kirk

Timkin

864

July 18, 2007 11:13 PM

Steadtler, you'd probably find the answer you want if you looked into using MML for learning decision tree classifiers. Check out David Dowe's website at Monash Uni for links (iirc there should be a paper specifically on this topic there).

Cheers,

Timkin

Representing decision trees as 'genes'

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

Representing decision trees as 'genes'

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

Reticulating splines