can i predict future data using a data history?

Started by
15 comments, last by wodinoneeye 10 years, 8 months ago

An interesting program could be some method of taking a data stream then storing it in a way so i can compare it to the novel future and tell me how predictable it was, so it finds patterns in the data, its a pattern detector that works on bits.

this could be interesting, because the data could be anything, and would the exact same prediction technique work on any kind of data no matter what it is?

Advertisement

To answer the question of the title - you can't really predict the future based on past patterns and history, but, you can build probabilities of events happening if similar patterns emerge...

Basically, many stock predictors work on this model - some with surprising accuracy, however there is no good way to account for unforseen events, hence, you can only really build probabilities of what may occur. Of course the more event data you have, the more accurate of a model you can build.

Now, going about actually creating a program for this - you'll want to look into a number of AI techniques probably starting with neural network theory.

Any specific ideas in terms of exactly what sort of data and/or events you are interested in? Or, is it more or less of just a curiosity?

Google for "time series analysis" and "data compression": Both things involve finding patterns in a stream of data.

Of course different kinds of data call for different techniques. Like in most other realms, there is no silver bullet.

It's not really AI either... it's statistics.

The accuracy of the predictions depends how well the data fits the underlying assumptions (which will usually be a statistical model based on probability density functions).

"Most people think, great God will come from the sky, take away everything, and make everybody feel high" - Bob Marley

It's not really AI either... it's statistics.


Not to diminish the importance of statistics, but predicting what will happen next is most definitely within the realm of AI.

My own view of AI is that it consists of two things:
(1) Building a representation of the world, including how things are, how they behave and how they are likely to progress under different hypothetical scenarios.
(2) Selecting an action that maximizes the expected value of a utility function, based on the predictions set forth by (1).

I tend to think action selection is the core of intelligence, but other people (e.g. Jeff Hawkins) think prediction is the more important part.

Well, the answer to your title's question is simple: no.
If you have no guarantees of how the data is collected and what phenomenon you are observing, there is absolutely no reason for which past data describe a future event or a model.
Since you do not have a model you can not use any method to infer future events.
You can notice a pattern in which whenever a bird sings, a dolphin in the ocean is jumping. But since this two actions are not correlated, if you'll use it you'll have a wrong (or better, not reliable) prediction.
If you know the domain of the problem instead, you can start to build some rules (there are already some tools to do this. Search for "data mining", some programs do exactly want you want, they find patterns in data)

TL;DR
Without a knowledge of the domain you are observing, you can find patterns, but they do not mean anything and you can not safely use them to infer future events.
(Note: it can be that the data you are looking at actually belongs to a specific domain with some patterns, even if you don't know it. In that case you can use the pattern to predict future events, but since you do not know that the data belong to a specific domanin, you should not use them.)

@Alvaro: I'm not really informed about data compression, but i think the goal is to find the more common pattern and "replace" it with a shorter identifier. At least in the old days. It is not about predicting data that is not present. Am i right?


@Alvaro: I'm not really informed about data compression, but i think the goal is to find the more common pattern and "replace" it with a shorter identifier. At least in the old days. It is not about predicting data that is not present. Am i right?

One way to think about compression is, given the sequence up to a certain point, what is the distribution of probabilities for the next symbol? If you can answer that question (probabilistic prediction of the next symbol) well, you can use arithmetic coding to convert the whole string into a string whose length is the entropy of the original string, which is the best you can do in compression.

Some people view the link between compression and intelligence as being very close: http://prize.hutter1.net/hfaq.htm#compai

I think a neural network is your best bet for something general. A neural network could easily be trained for many different types of data.

Training is the key however. You couldn't just show it a totally new type of data and expect it's answer to be reliable without first training with that type of data.

storing [the data stream] in a way [to] tell me how predictable it was.

Principal Componant Analysis is the process of examining a set of data, and determining what are the most salient features of that data. And it's an algorithm, an automatic procedure of rotating the coordinate system to coincide with the length and density of the data.

The axes of the resulting new coordinate system represent recognition of measured as-yet unnamed features of that data.

--"I'm not at home right now, but" = lights on, but no ones home

Google for "time series analysis" and "data compression": Both things involve finding patterns in a stream of data.

Of course different kinds of data call for different techniques. Like in most other realms, there is no silver bullet.

And don't try to second guess what complexity of solution you will need - start simple and work your way up as needed. I did some work predicting results of rugby matches and a linear sum of 6 variables (3 per team) was enough to get results comparable with the bookmakers.

This topic is closed to new replies.

Advertisement