Clustering for Supervised Learning

Started by
1 comment, last by Alrecenk 16 years, 11 months ago
I'm back on the tangent of good supervised topological evolution for function approximation. I've got a fairly optimized means of approximating linear and polynomial functions, and I've got a system set up for peicewise approximation using hyperplanes to break up my input space. So, I take a bunch of training vectors, approximate the function with an n degree polynomial, and then detect the largest error. Once I have the largest error I place a plane at random near the point with the highest error and then I split the input space into two input spaces and run more training cycles and I keep doing this until it gets fairly accurate. This method is really, really ineffecient, and my space ends up cut into an unneccessarily huge number of pieces and functions. I need some means of clustering my training vectors based on their function that would minimize the number of cuts I'd need to make in my space. I've read that SOMs are good for clustering, but I don't know how to apply that to my problem. Consider a function A that maps a bouncing ball's position and velocity and a constant to the balls next position and velocity. There is exactly one plane at y=0 that when the ball goes below it it's y is set to 0 and it's velocity is multiplied by -1. It's fairly clear this could be modelled with a piecewise linear function, with one function for y>0 and another function for y<=0. The problem is given a large number of training vectors how could I find (or atleast approximate) the y=0 plane.
Advertisement
Quote:Original post by Alrecenk
I've got a fairly optimized means of approximating linear and polynomial functions... This method is really, really ineffecient, and my space ends up cut into an unneccessarily huge number of pieces and functions.


Perhaps thinking on your problem in a slightly different way will help you to see an answer. What you have, essentially, is an optimisation problem in the parameter/coefficient space of your polynomial, subject to an objective of error minimisation (presumably least squares) w.r.t a set of discrete points. What you want to achieve is an optimal partitioning of the parameter space that minimises the error. Unfortunately, that partitioning equates to one spline per training point (so you end up with as many pieces to your polynomial as you have training datum).

The answer around this (and it's a fairly well known and well considered problem) is to penalise solutions that use too many partitions. You then perform your optimisation over this new performance measure. There are a variety of methods available for to do this, but fundamentally the one you should pursue is Minimum Message Length.

Here are some papers you should read if you're interested in pursuing this:
MML Grouping of Ordered Data

Univariate Polynomial Inference

There are a variety of reasons for my recommending MML over other methods (such as BIC, AIC, MDL)... I won't go into the details, other than to say that MML represents a computational "Occams Razor". There are several theoretical reasons to favour it... and at least one reason to avoid it (which is why it isn't used en masse)...but mostly it isn't used because it's not well understood!


Cheers,

Timkin
I'm not using least squares. I just adjust each coeffecient individually so no squares or squareroots are involved. Also, order isn't important in my data. Aside from that, the papers seem to be pretty much what I'm looking for, but it's going to take me a while to digest it and know for sure...

This topic is closed to new replies.

Advertisement