# How to get equation given many (x,y) points?

## Recommended Posts

Let's say you have a thousand (x,y) points. You want to get an equation, such that: f(x) = y How to get f(x)? (Assuming value of x is always unique)

##### Share on other sites
What you are looking for is probably Curve Fitting. Hope this helps.

##### Share on other sites

Hi Omid, thanks for the link!

I looked into it, if i use that, then if I have a thousand (x,y) points, my final equation would have a degree of 999 right? (a long equation, but it'll work)

Is there another method, where it does the same thing, but gives a shorter equation?

or is there a way to make an equation of degree 999 into something shorter that people can write?

thanks! :)

##### Share on other sites
Quote:
 Original post by helloworld123Is there another method, where it does the same thing, but gives a shorter equation? or is there a way to make an equation of degree 999 into something shorter that people can write?
Are your points random, or do you have some information/educated intuition about the type of curve they form?

##### Share on other sites
Quote:
Original post by swiftcoder
Quote:
 Original post by helloworld123Is there another method, where it does the same thing, but gives a shorter equation? or is there a way to make an equation of degree 999 into something shorter that people can write?
Are your points random, or do you have some information/educated intuition about the type of curve they form?

Hi,

yes the points are random, and all x-values in all (x,y) are non-repeating (unique).

I'm not really interested with the curve, or the interpolation/extrapolation data that could be derived, i'm only interested in getting y, given x, using an equation.

is there a way to simplify an equation of degree 999?

##### Share on other sites
If I understood you correctly and your points are randomly distributed then you can't find a function f(x)=y. But of course you could use a lookup-table. But maybe you could give some more information about your points.

##### Share on other sites
Quote:
 Original post by __MosteM__If I understood you correctly and your points are randomly distributed then you can't find a function f(x)=y. But of course you could use a lookup-table. But maybe you could give some more information about your points.

actually you can, the points are random and are unique (let's say x-values are guaranteed to be unique).

i looked into lagrange interpolation, which does exactly this, it takes many control points (unique points), and give you an equation too.

but problem is, what about for 1000 control points, it'll give a polynomial of degree 999.

i'm just wondering if there is an alternative that gives another equation or some sort, something shorter than an equation of degree 999.

(or is there a way to represent an equation of degree 999 into something shorter? by shorter i mean something you could even write down on a small piece of paper)

thanks!

##### Share on other sites
Fundamentally that is not a useful thing to be doing.

Whilst it is true that for an arbitrary number of points you can find an equation which will satisfy all points it isn't going to be useful for anything.

The point of an equation is that you can put in an input and get an output.

Making up an arbitrary number of random points and finding an equation to fit them is only going to give you more random points.

Summary:
random input -> random output

##### Share on other sites
Subdivide the entire domain in N equal intervals. Build an approximation curve for each of the intervals. If your 1000 points are uniformly distributed, each of the curves will have the degree (1000 / N - 1).

##### Share on other sites
The way of fitting many points to a much smaller equation is known as least-squares regression.

However, I think there may be confusion about your statement "They are random points."
If the x values are "random" then we are talking, and you may be able to fit an equation...
BUT if the y values are ALSO "random" then you are saying there is NO implicit structure to the point distribution, and thus no equation can fit that structure.

It would be very helpful if you could post an example "plot"of some of the points you want to fit.

##### Share on other sites
Do you need the exact values at the initial points? If not, you could also try it with an approximation. If you just knew the very general structure of data, things become much easier. If the input for example in some way resembles something sinusy (or a combination of several frequencies), or a coarse function of some degree.
If not, then there still is the possibility to fit a spline to the data. Nth order B-Splines are one way, piecewise cubic Bezier splines another, and there are some more.

The bottom line however is: If you need exact values of a completely arbitrary set of data, the smallest representation you can have without impossibly (read: almost infinitely) much effort (and not even granted results then) is the data itself, maybe entropy compressed.
Since you seem not to be interested in inter- and extrapolation, I assume shrinking the data must be what you aim for, which is why I would suggest a form of compression instead.

##### Share on other sites
@davetyler - sorry to be blunt, but whether it is useful or not, is another matter. your summary shows that you didn't understand my question, it seems.

@steve132 - yup all x is random (and unique, there is never two or more x with the same value). all y is random also, but aren't unique.

ie, i can have control points like: (5, 10) (2, 10) (1, 42), and so on...as for a sample "plot", imagine getting a pen and paper and randomly stabbing the paper lightly on different locations 1000 times.

assuming you didn't stab on the same location twice, you'd have 1000 unique points.

lagrange interpolation allows me to get the equation i need, but is of degree 999 (since there are 1000 points).

would least-square regression help in this problem? i mean i know how to get the equation for these 1000 (x,y) points, i just need to have the equation shorter.

@medium9 - yeah i kinda need exactly values, so i can't use any approximation :(

i looked into splines. correct me if i'm wrong, but when i have 1000 unique control points, and used it in spline, i get an equation of degree 999 still right?

##### Share on other sites
If you want a parameric family of curves on which you can impose 1000 conditions, it must have at least 1000 parameters. If it's a polynomial, it will have degree 999, but any other solution will also have 1000 "knobs" to adjust to match your data.

##### Share on other sites
Quote:
 Original post by helloworld123lagrange interpolation allows me to get the equation i need, but is of degree 999 (since there are 1000 points)....i looked into splines. correct me if i'm wrong, but when i have 1000 unique control points, and used it in spline, i get an equation of degree 999 still right?
This is fundamentally unavoidable. There is no way you are going to derive a simple equation for a curve through 1,000 random points, except for very specific cases where they happen to fall in a line/whatever.

It might help if you explain the larger problem you are trying to solve. It is possible there are simpler ways to do this...

##### Share on other sites
Least Squares Regression will allow you to fit an arbitrary number of points to a smaller equation. Unfortunately, however, it does so by assuming that the points have some inherent structure that can be exploited, and fits an equation that will fit all the given points not exactly, but within some margin of error.

If there is NO structure to your points, then what you are asking is COMPLETELY IMPOSSIBLE. It violates the Shannon coding theorem to compress arbitary random points with 0 information loss.

Your points must either A) have non-random patterns you can exploit, such as fitting some equation, or B) throw away information about the points, or C) a little of both. Least squares finds the combination of both which is optimium for the least amount of error.

##### Share on other sites

i see, that's disappointing to know.

i wonder how people back then (before computers and calculators) solved very very high degree polynomials without resulting to manually solving them?

##### Share on other sites
You say these numbers are random... where are they coming from?

if they are coming from rand(), keep in mind that rand() itself is a mathematical function, so it is giving you what you want already! (kind of... it can be made to work for you, depending on what you need exactly)

Can you give some more info about what you are trying to do and why you need to come up with an equation to match the numbers?

I bet there's a better way to get what you want.

##### Share on other sites
Quote:
 Original post by Atrix256You say these numbers are random... where are they coming from?if they are coming from rand(), keep in mind that rand() itself is a mathematical function, so it is giving you what you want already! (kind of... it can be made to work for you, depending on what you need exactly)Can you give some more info about what you are trying to do and why you need to come up with an equation to match the numbers?I bet there's a better way to get what you want.

well, i was just curious at how to have f(x)=y, where you have an equation that could be solved manually.

i mean given the original problem, if i have 1000 points, i'd get an equation of degree 999.

if this were the 1800s, no calculators and computers (maybe abacus, yes), and you have an equation with very very very high degree polynomial, would you solve it manually? or is there someone who's found a solution to make a very high degree equation into something bearable with pen and paper?

##### Share on other sites
Quote:
Original post by helloworld123
Quote:
 Original post by Atrix256You say these numbers are random... where are they coming from?if they are coming from rand(), keep in mind that rand() itself is a mathematical function, so it is giving you what you want already! (kind of... it can be made to work for you, depending on what you need exactly)Can you give some more info about what you are trying to do and why you need to come up with an equation to match the numbers?I bet there's a better way to get what you want.

well, i was just curious at how to have f(x)=y, where you have an equation that could be solved manually.

i mean given the original problem, if i have 1000 points, i'd get an equation of degree 999.

if this were the 1800s, no calculators and computers (maybe abacus, yes), and you have an equation with very very very high degree polynomial, would you solve it manually? or is there someone who's found a solution to make a very high degree equation into something bearable with pen and paper?

I would guess that usually you'd have collected data from observing some real-world phenomenon, and you'd be hoping to find some mathematical expression. If the data is not completely random you will be able to find a reasonable expression; position of free falling object for example. If the data is completely random you won't find a mathematical expression worth using, not in the 1800s and not today. You can't magically decrease the degree of a polynomial without information loss. You either need to add additional information about the shape of the curve, which you have somehow acquired (observation etc). Or you'll have to settle for a less accurate formula by approximating.

As for evaluating a high degree polynomial, there are several tricks which can be used to find the solution for extreme values -- if your input value is small (<1) for example, you approximate large powers terms to be roughly zero; if your input is very large you can assume that large power terms will dominate the result, and thus smaller power terms can be neglected. But actually solving accurately for any point requires performing all calculations.

##### Share on other sites
To complete things: Yes, before mechanical and/or electric calculation devices were present (some crude forms like the abakus have been around for quite some time, but didn't help too much with tasks of that complexity, so just pretend they don't count), a mathematician's work consisted widely of solving such large things by hand. Doing so for just one formula could easily take days, but they didn't have much of a choice.
Also, a lot of theories and conventions in notation are comparably young, and a lot of problems and techniques weren't even known at times, and thus no need to solve them in the first place. Math was mostly much simpler, and it was the technological revolutions over time that led to more and more discoveries that demanded more and more complex systems to describe and solve. It's likely, around 1800, no one ever faced the problem of having to solve a function of degree 999, and if, it might just as well have been considered unsolvable or at least impractical and substituted by a simpler approximation.

The whole problem itself is however interesting, and touches the question: What is compression?
If you take a simple 32 bit RNG, it can yield 4GB of apparently unorganized data. Compressing this with one of the typical entropy based techniques like LZW, Huffman or arithmetic will usually not make much of a difference, and the maximum compression dictated by entropy will also still be large. Yet, a single line of code could suffice to recreate the whole output - it just has to be found. If you assumed a RNG, and also knew which type, you could at least brute-force it's parameters and be done with it, but then you'd be back at square one: You implied a specific structure of your data, and your arbitrarity is gone. It would be just another specialized compressor, nothing suitable for any other data than such coming out of that one specific RNG.

The whole thing also raises the question of what compressed data is, and simplifying as you seem to want it is basically compression. It could be understood as a series of commands that a machine can interpret, and as a result of that spit out the data you wanted.
At this point, things become rather abstract. You could see WinZip for example as a virtual machine, and a zip file as the code it runs. Output of that is the restored files of the archive. But the VM here could be anything, and of very varying power. It could just as well be a C compiler + native runtime environment, and the compressed data is reverse engineered source code. Or a solver for polynoms! The big question is then: Is there a practical way to get from the uncompressed data to a form that interpreted by a VM leads back to it, and is that any smaller? An important part is the capabilities of the VM, and the "language's" vocabulary, and their fit to the data. And it's mostly that last condition that makes this quite an impossible endeavour. For arbitrary data, you'd need an arbitrary machine - such thing doesn't exist.
There are some cases where knowledge of the general structure of data can be used to make a specialized compressor that works well for that particular kind of data, but there probably is no optimal or even valid generic approach for the time being.

[Edited by - Medium9 on March 8, 2010 11:50:42 AM]

## Create an account

Register a new account

• ### Forum Statistics

• Total Topics
628333
• Total Posts
2982130

• 24
• 9
• 9
• 13
• 11