Learning system suitable for finding combinations of features?

Started by
5 comments, last by Emergent 12 years, 6 months ago
I have a problem where I have a set of inputs (maybe between 10 and 50), and I want to derive a single continuous output (eg. 0.0 to 1.0). I expect to train the system with a large batch of data where both the inputs and outputs are known, and then hope to run it to determine a score for fresh data later.

For the most part I expect to be able to train this based on simply working out weights for the inputs, but I also expect some of the end result to come from inputs that are correlated, maybe positively, maybe negatively. (I guess this is the classic 'XOR' problem.)

We know that neural networks can attempt handle this kind of problem, but we also know that they are generally not very effective. What would be an effective tool for a problem like this, given that it's not a typical classifier as such but something designed to produce a single output?
Advertisement
I would try to divide the problem into smaller sections and evolve each on it's own before combining them for fine tuning. If you see that one weight is not converging to anything, remove it. If two weights have a fixed relation, calculate one from the other.
That won't really work for me, because the inputs need judging as a whole. If I were to split the system up in any way, even just 1 split, it would take twice as long to train and run the risk of missing correlations between the 2 halves.
I'm currently taking a machine learning course (i.e. not a pro). Please take what I say (if I do say anything) with a grain of salt :D

1. How many data points do you have?
2. What is the target function's (the one you're trying to learn) estimated complexity? (i.e. linear, etc)
3. How much noise do you have?
What are the inputs and outputs? Knowing something about the problem often allows one to come up with more reasonable solutions than simply throwing a generic machine-learning algorithm at it.

That said, I would look into support vector machines.
I first posted this from my iPad and it didn't include any line spaces. Sorry. :) Edited to fix, and added a few more comments


Support vector machine will not work, as he wants a continuos output. He could try support vector regression. Basically want you want to do is regression. NOTE: Avlaro, if you meant SVR my apologies.

Goggle "regression analysis" and you will find enough to occupy you for a life time. The best choice depends on your data.

Your problem is non linear I am assuming, otherwise simple weighted linear multivatiate regression would work.

- There is nothing wrong with neural networks provided you convert your inputs in to a suitable range. Input normalization is where most people fail with neural networks.

- Random forests will do what you want, but depending on the data, a neural network could be better.

- Boosted regreression trees will also work, but you will want to be familiar with regression trees and boosting before you try it.

- Gaussian mixture regression reportedly works well on noisy data (data that obviously is missing at least one dimension of information), but the implementation isn't exactly straightforward.

- MARSplines are meant specifically for regression problems, I've never used them but from what I've read they seem like a good solution.

- Support vector regression is another excellent tool for regression problems. A lot like neural networks in terms of many training parameters to tweak.

- You could also try evolutationary programming. Slow training, but flexible.



All that said, and without seeing your data, I would recommend:

- Back propagation neural network
or
- a regression tree


I've attached some screen shots showing the differences between ANNs, CART, and Boosted regression on the same 'noisey' sine wave. The Mean Squared Error is in the top of each graph.

124v4a0.png

The ANN produces a smooth continous function but doesn't handle outliers very well.






wsoazb.png

Regression trees are able to handle data that isn't smooth and continous, but produces 'plateaus' where you see idential outputs regardless of the input.



opv1vp.png

Boosted regression gives you the best of both worlds-- smooth outputs, while handling outliers well. The above model is actually quite weak because I only boosted a few trees.

I have a problem where I have a set of inputs (maybe between 10 and 50), and I want to derive a single continuous output (eg. 0.0 to 1.0). I expect to train the system with a large batch of data where both the inputs and outputs are known, and then hope to run it to determine a score for fresh data later.

For the most part I expect to be able to train this based on simply working out weights for the inputs, but I also expect some of the end result to come from inputs that are correlated, maybe positively, maybe negatively. (I guess this is the classic 'XOR' problem.)


Let me get this straight. You've got a bunch of training instances (x[sub]1[/sub],y[sub]1[/sub]), ..., (x[sub]N[/sub], y[sub]N[/sub]), with each x[sub]i[/sub] in R[sup]n[/sup] (where n is some number between 10 and 50) and each y[sub]i[/sub] in R, and you want to find a function f : R[sup]n[/sup] --> R such that f(x[sub]i[/sub]) ~= y[sub]i[/sub] for all i. Fair enough?

I'd throw kernel-least-squares at it. It's easy, requiring nothing more complicated than inverting a big matrix. The problem boils down to choosing a kernel. For quadratic regression, which sounds like it'll do the trick for you (it can learn XOR), the kernel would be,

K(x[sub]i[/sub],x[sub]j[/sub]) = (<x[sub]i[/sub],x[sub]j[/sub]> + 1)[sup]2[/sup]

where <.,.> is just the standard Euclidean inner product in R[sup]n[/sup]. I've walked through least squares in a couple earlier posts here; let me find one of them and link to it here... Here, check out this post on basic least squares here, and replace every inner product <.,.> by an evaluation of the kernel function K(.,.). The reason this works is that the kernel function can be written as,

K(x[sub]i[/sub],x[sub]j[/sub]) = <F(x[sub]i[/sub]), F(x[sub]j[/sub])>

where F maps your inputs to some high-dimensional space, so, basically what you're doing is preprocessing your inputs with a bunch of nonlinearity, and then doing least squares on the processed points. E.g., for a quadratic regressor, F would return a vector containing all the pairwise products of its inputs as well as the inputs themselves, like F((u,v)) = (u, v, u[sup]2[/sup], u v, v[sup]2[/sup], 1). The kernel I gave above almost corresponds to exactly this F, except that it multiplies the various products by different weights.

Hope that's clear enough. Even if you don't use exactly this method and opt for something fancier, I think understanding least squares is a pretty good starting point for most other things.


Support vector machine will not work, as he wants a continuos output. He could try support vector regression.Basically want you want to do is regression.

(I hear people lump support vector classification and support vector regression together under the heading of "support vector machines.")

This topic is closed to new replies.

Advertisement