Jump to content
  • Advertisement
Sign in to follow this  
Dwiel

Finding maximum increases

This topic is 5151 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

Hey, I am working on a problem where the brute force method isnt very feasible. The problem is this. I have a large amount of data. a few hundred sets of 30,000+ data points. For each point in the time series, I would like to know the average of the top 10 highest slopes between other points. In otherwords, define si j to be the slope between the current point i and the point j points after it (the points are evenly spread out along the x axis). I would like to find, for each i (point in the set), the largest values of si j. Then, taken the largest 10 slopes, and find their average, preferably just an arbitrary function not restricted by the algorithm to be an average. A brute force search between 30,000 data points would be rediculous, even after some basic optomizations such as only testing the local maximums for slopes. It still is a very large problem. My data is also not going to be this small forever. Does anyone have any ideas on how I can solve this problem efficiently? I was thinking that maybe starting the search from the far right might alow some interesting optimizations. Thanks Dwiel

Share this post


Link to post
Share on other sites
Advertisement
"A brute force search between 30,000 data points would be rediculous"

What's so rediculus about that?

Your processor is capable of searching thru millions of data points like that in a matter of seconds. 30000 shouldn't be a problem.

Edited:

What is the nature of your data anyway?

Share this post


Link to post
Share on other sites
You could start from the back, and keep track of the ten highest points seen so far and build a new set of data points consisting of the average of those ten for each point i.

Makes si j an O(1) operation, after the O(N) initial calculation.

edit: that IS the problem you're trying to solve... right? your description seems kinda vague.

Share this post


Link to post
Share on other sites
how smooth is your data?
you might be able to take advantage of smoothness to find that... within some region the total slope is greater than this other region, and thus disregard a bunch of points all at once

if your data is totally jagged like a fractal tho
i dont see how actually checking each point is avoidable
its only one pass tho, and you only have to save the 10 largest values as you go thru
so it doenst sound too bad....

Share this post


Link to post
Share on other sites
are you saying that a single pass through the data is too much? or that sorting the data, then picking off the top ten is too much?

You can do this in a single pass. Just keep two arrays sized 10, index and value. keep track of the lowest value in the ten. Any time your current (sweep) value is higher than lowest_of_ten value, replace the lowest value with the new value and index, and search through your list of 10 again to find the new lowest. go through your whole data set once, and you have the top ten contenders. Then you can do this incrementally as new points are added.

[edit]
spelling and last sentence
[/edit]

[Edited by - lonesock on November 9, 2004 2:15:37 AM]

Share this post


Link to post
Share on other sites
Can point j be any point after i, or is it just simply the next point after i.. is s_ii+1? In latter case you can find the largest slope in one pass. If you mean the first case, then I understand your comment about 30000 data points being ridiculous.

Share this post


Link to post
Share on other sites
Sorry, I guess I wasnt very clear. The data very fractal like. I'm not trying to find the next 10 highest points, but rather the next 10 points whose slope with respect to the current one, is the largest. Example:

y|
|
4|
3| *
2| * * *
1|* *
+-----------
pt #
1 2 3 4 5 6

First, find the largest slopes with respect to point #1. The BEST slope is between point 1 and point 2. The slope between those two points is 1. The next best is between 1 and 3, the slope between those is 1/2. After that, 1 and 5 form a slope of 1/4 and finally point 6 forms a slope with point 1 of 1/5. What I want is, for each point in the set, the average value of the 10 best slopes between other points which come after it. Just because point 6 has the highest value, it is sufficiently far away from point 1, that it's slope is not as high as some other ones.

The reason why a few hundred sets, each of 30,000 data points is large, is because the brute force method, I would need to perform 450,015,000+ multiplies and sorted inserts per set. And then do that a few hundred times. Also, this data might change, in which I would need to do it all over again.

I was thinking that maybe I could find the maximum value in the set and then as soon as I find that no other points could possibly have a higher slope, break and go on to the next one. If you have more ideas on how to speed the algorithm up, I would greatly appreciate it.

In the future, my sets of 30,000 could easily become sets of well over a million, which I am sure would slow down a brute force algo pretty quickly (7*1011 mults and sorted inserts per set (probably closer to 2000 instead of 300))

Thanks

Dwiel

Share this post


Link to post
Share on other sites
You don't need to do sorted inserts constantly if you only need the top 10 slopes per point. Just keep a running list and which value is the lowest, and only insert into the list if the currently examined slope is higher than the lowest already in the list. That alone should reduce the time spent by an order of magnitude or two, since for most points you'll only be calculating and comparing the slope which only involves a very few operations.

You should try implementing this obvius algorithm and see how slow it is since nobody seems to have any alternatives. It might end up being fast enough anyway, or you might find an easy way to optimize it after usnig a profiler.

Share this post


Link to post
Share on other sites
I think you can do it in O(n). I'm asumming that the interval is 1 over the x axis just so I can explain this easier. I haven't checked the math so this is just an idea.

If you think about how slopes over shorter intervals combine into slopes of higher intervals, I'm pretty certain that the greatest slope has be over an interval of 1 (greater intervals has to contain either all peices of equal slope or at least one piece of greater slope). So you go through once finding the 10 greatest slopes of interval 1. Next check the slopes of interval 2 that include one of the 10 pieces of interval 1. If you find no new numbers then your done. else check interval 3 with the new pieces of interval 2 and so on, it should stop at interval 10 at most.

-Cocalus

Share this post


Link to post
Share on other sites
Sign in to follow this  

  • Advertisement
×

Important Information

By using GameDev.net, you agree to our community Guidelines, Terms of Use, and Privacy Policy.

GameDev.net is your game development community. Create an account for your GameDev Portfolio and participate in the largest developer community in the games industry.

Sign me up!