can i predict future data using a data history?

Started by
15 comments, last by wodinoneeye 10 years, 8 months ago

Oh, Alistair's post reminds me of something I read in the book "Thinking fast and slow", where Kahneman proposes using the sum of simple terms that are easy to measure, when trying to predict something. The examples from the book sound exactly like what Alistair described with rugby matches.

If you have a lot of data, you can try to estimate weights for each term. If you have a metric shit ton of data, you can get much fancier with non-linear schemes in (e.g., neural networks). There is a whole subject called Machine Learning about how to make predictions where you have an abundance of data. But Kahneman says that something simple like the sum often works just fine, and I think he might be right.

Advertisement

I am genuinely proud of myself for starting as "stupid" as possible. I started with a single variable per team and worked my way up.

I am genuinely proud of myself for starting as "stupid" as possible. I started with a single variable per team and worked my way up.

Good for you; people frequently mistake quantity of variables with resolution -- usually with terrible results.

--"I'm not at home right now, but" = lights on, but no ones home

Although there are methods in statistics to detect correlation between variables so you can remove them/make one dependent upon the other.

"Most people think, great God will come from the sky, take away everything, and make everybody feel high" - Bob Marley

Although there are methods in statistics to detect correlation between variables so you can remove them...

Can you give more details on this? It's a great idea.

I've seen a similar concept with boolean mathematics, used to factor out any random collection of IF/AND/OR logic and distill it down to the bare minimum that achieves the same input/output states.

--"I'm not at home right now, but" = lights on, but no ones home

Sure, have a look here

http://en.wikipedia.org/wiki/Correlation_and_dependence

Calculating the covariance between 2 variables is probably a good start, covariance is similar to the dot product of 2 vectors but for statistical observations instead

http://en.wikipedia.org/wiki/Covariance

(check out the "calculating the sample covariance" for how to calculate).

"Most people think, great God will come from the sky, take away everything, and make everybody feel high" - Bob Marley

You can predict (or rather the program can), but whether it will do so reliably/accurately is the question.

You will have to spot/identify in the data particular indicators (factoring) the data which in itself is a difficult problem. pAtterns of these factors will need to be assembled.

Assume that there is a sequence of cause and effect - clues/pattern that lead to some subsequent occurance (which you are trying to predict to be ready to counter or handle/whatever).

Buildling up this predictive knowledge about likely patterns to later recognize is a problem. Training data is needed (bracketing the problem space you are trying to solve for) . Guidance by a human is usually needed to inform the system what is relevant (even in a self training system you previously need to tell it what the good/bad results are or which cases are to be looked for).

Any pattern not previously seen or conflicting/contradicting with known patterns will result in questionable results for the prediction. Thus extensive cases often has to be built up for even simple systems being evaluated/predicted.

.

--------------------------------------------[size="1"]Ratings are Opinion, not Fact

This topic is closed to new replies.

Advertisement