Finding maximum increases

Started by
22 comments, last by grhodes_at_work 19 years, 5 months ago
Hey, I am working on a problem where the brute force method isnt very feasible. The problem is this. I have a large amount of data. a few hundred sets of 30,000+ data points. For each point in the time series, I would like to know the average of the top 10 highest slopes between other points. In otherwords, define si j to be the slope between the current point i and the point j points after it (the points are evenly spread out along the x axis). I would like to find, for each i (point in the set), the largest values of si j. Then, taken the largest 10 slopes, and find their average, preferably just an arbitrary function not restricted by the algorithm to be an average. A brute force search between 30,000 data points would be rediculous, even after some basic optomizations such as only testing the local maximums for slopes. It still is a very large problem. My data is also not going to be this small forever. Does anyone have any ideas on how I can solve this problem efficiently? I was thinking that maybe starting the search from the far right might alow some interesting optimizations. Thanks Dwiel
Advertisement
"A brute force search between 30,000 data points would be rediculous"

What's so rediculus about that?

Your processor is capable of searching thru millions of data points like that in a matter of seconds. 30000 shouldn't be a problem.

Edited:

What is the nature of your data anyway?
You could start from the back, and keep track of the ten highest points seen so far and build a new set of data points consisting of the average of those ten for each point i.

Makes si j an O(1) operation, after the O(N) initial calculation.

edit: that IS the problem you're trying to solve... right? your description seems kinda vague.
how smooth is your data?
you might be able to take advantage of smoothness to find that... within some region the total slope is greater than this other region, and thus disregard a bunch of points all at once

if your data is totally jagged like a fractal tho
i dont see how actually checking each point is avoidable
its only one pass tho, and you only have to save the 10 largest values as you go thru
so it doenst sound too bad....
are you saying that a single pass through the data is too much? or that sorting the data, then picking off the top ten is too much?

You can do this in a single pass. Just keep two arrays sized 10, index and value. keep track of the lowest value in the ten. Any time your current (sweep) value is higher than lowest_of_ten value, replace the lowest value with the new value and index, and search through your list of 10 again to find the new lowest. go through your whole data set once, and you have the top ten contenders. Then you can do this incrementally as new points are added.

[edit]
spelling and last sentence
[/edit]

[Edited by - lonesock on November 9, 2004 2:15:37 AM]
Can point j be any point after i, or is it just simply the next point after i.. is s_ii+1? In latter case you can find the largest slope in one pass. If you mean the first case, then I understand your comment about 30000 data points being ridiculous.
What type of data is this? Keep in mind, this forum is for math & physics as related to GAME development, not data processing in general.
Graham Rhodes Moderator, Math & Physics forum @ gamedev.net
Sorry, I guess I wasnt very clear. The data very fractal like. I'm not trying to find the next 10 highest points, but rather the next 10 points whose slope with respect to the current one, is the largest. Example:
y| |4|3|          *2|  * *   *1|*     * +-----------    pt #  1 2 3 4 5 6

First, find the largest slopes with respect to point #1. The BEST slope is between point 1 and point 2. The slope between those two points is 1. The next best is between 1 and 3, the slope between those is 1/2. After that, 1 and 5 form a slope of 1/4 and finally point 6 forms a slope with point 1 of 1/5. What I want is, for each point in the set, the average value of the 10 best slopes between other points which come after it. Just because point 6 has the highest value, it is sufficiently far away from point 1, that it's slope is not as high as some other ones.

The reason why a few hundred sets, each of 30,000 data points is large, is because the brute force method, I would need to perform 450,015,000+ multiplies and sorted inserts per set. And then do that a few hundred times. Also, this data might change, in which I would need to do it all over again.

I was thinking that maybe I could find the maximum value in the set and then as soon as I find that no other points could possibly have a higher slope, break and go on to the next one. If you have more ideas on how to speed the algorithm up, I would greatly appreciate it.

In the future, my sets of 30,000 could easily become sets of well over a million, which I am sure would slow down a brute force algo pretty quickly (7*1011 mults and sorted inserts per set (probably closer to 2000 instead of 300))

Thanks

Dwiel
You don't need to do sorted inserts constantly if you only need the top 10 slopes per point. Just keep a running list and which value is the lowest, and only insert into the list if the currently examined slope is higher than the lowest already in the list. That alone should reduce the time spent by an order of magnitude or two, since for most points you'll only be calculating and comparing the slope which only involves a very few operations.

You should try implementing this obvius algorithm and see how slow it is since nobody seems to have any alternatives. It might end up being fast enough anyway, or you might find an easy way to optimize it after usnig a profiler.
"Walk not the trodden path, for it has borne it's burden." -John, Flying Monk
I think you can do it in O(n). I'm asumming that the interval is 1 over the x axis just so I can explain this easier. I haven't checked the math so this is just an idea.

If you think about how slopes over shorter intervals combine into slopes of higher intervals, I'm pretty certain that the greatest slope has be over an interval of 1 (greater intervals has to contain either all peices of equal slope or at least one piece of greater slope). So you go through once finding the 10 greatest slopes of interval 1. Next check the slopes of interval 2 that include one of the 10 pieces of interval 1. If you find no new numbers then your done. else check interval 3 with the new pieces of interval 2 and so on, it should stop at interval 10 at most.

-Cocalus

This topic is closed to new replies.

Advertisement