• Advertisement
Sign in to follow this  

Finding Best Fit Lines Algorithm

This topic is 3497 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

So I have a basic system set up where I have N (x,y) points in an array. My goal is to find all linear regressions with r^2 greater than 0.95 of these points. For a naive implementation, I did something like this:
#points is our array of points.  n = points.size
1.upto(2**n-1) { |i|
     s = i.to_s(2)
     s = '0'*(n-s.size)+s
			
     included_pairs = []
     (s.length-1).downto(0) { |j|
          included_pairs << points[j] unless s[j] != 1
     }
    
     #perform linear regression on included pairs
}
Maybe not the best way of doing it. If you can't tell what I am doing, I am basically being cute and turning a number into a binary string, then using that string to determine which members of the array I should test on. This allows me to tells all 2**n-1 combinations of points for linear regression. Now, this works great. Except for the whole 2^n thing. After thinking for a bit, I realized that I could be culling a whole lot with a little logic (and what seems like a lot of work). The thought is this: if I start with [1,1,0,...,0] for my inclusion array, and my line doesn't work, then I can just jump to testing [1,0,1,0,...,0]. If it does work, than I can try [1,1,1,0,...,0]. If that doesn't work, then I can stop there and go on to [1,1,0,1,0,...,0]. This allows me to cull lots and lots of costly matrix multiplications (ordinary least squares, whoo!), and should drastically cut down on my total loop time (nobody likes an exponential loop!). Except I am having a brain far and can't code up this logic. In my head it makes sense, but actually getting it to work seems to be rather tricky. Cookie for the first working implementation (or shove in the right direction!) My first thought is defining a 'major' and 'minor' bit mask for each loop. When the minor doesn't work, you shift right and retest, adding a 0 to the major bit mask. If it does work, you add it to the major bit mask and shift the minor right and start testing again... but the logic gets really complicated really quickly trying to remember where you were. perhaps some sort of stack with pushing and popping is in order? Corey

Share this post


Link to post
Share on other sites
Advertisement

def test_regression(major, minor)
if minor == 0
return []
end

s = (major | minor).to_s(2)
s = '0'*(n-s.size)+s

included_pairs = []
(s.length-1).downto(0) { |j|
included_pairs << points[j] unless s[j] != 1
}

### do linear regression here

if worked
### here we fork -- continuing on two paths
return [r_squared, etc] + test_regression(major | minor, minor>>2) + test_regression(major, minor>>1)
else
return test_regression(major, minor>>1)
end
end

working_regressions = []
n.downto(1) { |i|
j = 2 ** (i-1)

major = j
minor = j/2

working_regressions = working_regressions + test_regression(j, j/2)
}


maybe? does this look right (besides the scoping issues with n)? testing this is fucking impossible.

[Edited by - visage on July 30, 2008 10:02:50 AM]

Share this post


Link to post
Share on other sites
Figured it out!


n = points.size
valid_trends = []

def test_regression(points, major, minor, n, type, date)
delta = 0.025

if minor == 0
return []
end

s = (major | minor).to_s(2)
s = '0'*(n-s.size)+s

included_pairs = []
0.upto(s.length-1) { |j|
included_pairs << points[j] if s[j].to_i == 1
}

#### test linear regression here

if !invalid
### here we fork -- continuing on two paths
#puts "\t\tValid: #{est[0]}x + #{est[1]} @ #{start}"
return [[r, est, start, included_pairs.map { |e| e[0] }]] +
test_regression(points, major | minor, minor>>1, n, type, date) +
test_regression(points, major, minor>>1, n, type, date)
else
return test_regression(points, major, minor>>1, n, type, date)
end
end
end

if n > 1
j = 2**(n-1)
while j > 0
valid_trends = valid_trends + test_regression(points, j, j/2, n, type, date)
j = j >> 1
end
end


I think that is the solution...it seems to work in empirical tests of the data...

Basically, I was trying to derive an equation whose expected runtime was less than 2^n (though worst case was still 2^n) by using logic to cull out possibilities that couldn't occur...

Share this post


Link to post
Share on other sites
Sign in to follow this  

  • Advertisement