How to get equation given many (x,y) points?

Started by
18 comments, last by Medium9 14 years, 1 month ago
Do you need the exact values at the initial points? If not, you could also try it with an approximation. If you just knew the very general structure of data, things become much easier. If the input for example in some way resembles something sinusy (or a combination of several frequencies), or a coarse function of some degree.
If not, then there still is the possibility to fit a spline to the data. Nth order B-Splines are one way, piecewise cubic Bezier splines another, and there are some more.

The bottom line however is: If you need exact values of a completely arbitrary set of data, the smallest representation you can have without impossibly (read: almost infinitely) much effort (and not even granted results then) is the data itself, maybe entropy compressed.
Since you seem not to be interested in inter- and extrapolation, I assume shrinking the data must be what you aim for, which is why I would suggest a form of compression instead.
Advertisement
@davetyler - sorry to be blunt, but whether it is useful or not, is another matter. your summary shows that you didn't understand my question, it seems.



@steve132 - yup all x is random (and unique, there is never two or more x with the same value). all y is random also, but aren't unique.

ie, i can have control points like: (5, 10) (2, 10) (1, 42), and so on...as for a sample "plot", imagine getting a pen and paper and randomly stabbing the paper lightly on different locations 1000 times.

assuming you didn't stab on the same location twice, you'd have 1000 unique points.

lagrange interpolation allows me to get the equation i need, but is of degree 999 (since there are 1000 points).

would least-square regression help in this problem? i mean i know how to get the equation for these 1000 (x,y) points, i just need to have the equation shorter.



@medium9 - yeah i kinda need exactly values, so i can't use any approximation :(

i looked into splines. correct me if i'm wrong, but when i have 1000 unique control points, and used it in spline, i get an equation of degree 999 still right?
If you want a parameric family of curves on which you can impose 1000 conditions, it must have at least 1000 parameters. If it's a polynomial, it will have degree 999, but any other solution will also have 1000 "knobs" to adjust to match your data.

I don't quite know if that answers your question.
Quote:Original post by helloworld123
lagrange interpolation allows me to get the equation i need, but is of degree 999 (since there are 1000 points).

...

i looked into splines. correct me if i'm wrong, but when i have 1000 unique control points, and used it in spline, i get an equation of degree 999 still right?
This is fundamentally unavoidable. There is no way you are going to derive a simple equation for a curve through 1,000 random points, except for very specific cases where they happen to fall in a line/whatever.

It might help if you explain the larger problem you are trying to solve. It is possible there are simpler ways to do this...

Tristam MacDonald. Ex-BigTech Software Engineer. Future farmer. [https://trist.am]

Least Squares Regression will allow you to fit an arbitrary number of points to a smaller equation. Unfortunately, however, it does so by assuming that the points have some inherent structure that can be exploited, and fits an equation that will fit all the given points not exactly, but within some margin of error.

If there is NO structure to your points, then what you are asking is COMPLETELY IMPOSSIBLE. It violates the Shannon coding theorem to compress arbitary random points with 0 information loss.

Your points must either A) have non-random patterns you can exploit, such as fitting some equation, or B) throw away information about the points, or C) a little of both. Least squares finds the combination of both which is optimium for the least amount of error.


i see, that's disappointing to know.

i wonder how people back then (before computers and calculators) solved very very high degree polynomials without resulting to manually solving them?

You say these numbers are random... where are they coming from?

if they are coming from rand(), keep in mind that rand() itself is a mathematical function, so it is giving you what you want already! (kind of... it can be made to work for you, depending on what you need exactly)

Can you give some more info about what you are trying to do and why you need to come up with an equation to match the numbers?

I bet there's a better way to get what you want.
Quote:Original post by Atrix256
You say these numbers are random... where are they coming from?

if they are coming from rand(), keep in mind that rand() itself is a mathematical function, so it is giving you what you want already! (kind of... it can be made to work for you, depending on what you need exactly)

Can you give some more info about what you are trying to do and why you need to come up with an equation to match the numbers?

I bet there's a better way to get what you want.



well, i was just curious at how to have f(x)=y, where you have an equation that could be solved manually.

i mean given the original problem, if i have 1000 points, i'd get an equation of degree 999.

if this were the 1800s, no calculators and computers (maybe abacus, yes), and you have an equation with very very very high degree polynomial, would you solve it manually? or is there someone who's found a solution to make a very high degree equation into something bearable with pen and paper?
Quote:Original post by helloworld123
Quote:Original post by Atrix256
You say these numbers are random... where are they coming from?

if they are coming from rand(), keep in mind that rand() itself is a mathematical function, so it is giving you what you want already! (kind of... it can be made to work for you, depending on what you need exactly)

Can you give some more info about what you are trying to do and why you need to come up with an equation to match the numbers?

I bet there's a better way to get what you want.



well, i was just curious at how to have f(x)=y, where you have an equation that could be solved manually.

i mean given the original problem, if i have 1000 points, i'd get an equation of degree 999.

if this were the 1800s, no calculators and computers (maybe abacus, yes), and you have an equation with very very very high degree polynomial, would you solve it manually? or is there someone who's found a solution to make a very high degree equation into something bearable with pen and paper?


I would guess that usually you'd have collected data from observing some real-world phenomenon, and you'd be hoping to find some mathematical expression. If the data is not completely random you will be able to find a reasonable expression; position of free falling object for example. If the data is completely random you won't find a mathematical expression worth using, not in the 1800s and not today. You can't magically decrease the degree of a polynomial without information loss. You either need to add additional information about the shape of the curve, which you have somehow acquired (observation etc). Or you'll have to settle for a less accurate formula by approximating.

As for evaluating a high degree polynomial, there are several tricks which can be used to find the solution for extreme values -- if your input value is small (<1) for example, you approximate large powers terms to be roughly zero; if your input is very large you can assume that large power terms will dominate the result, and thus smaller power terms can be neglected. But actually solving accurately for any point requires performing all calculations.
Best regards, Omid
To complete things: Yes, before mechanical and/or electric calculation devices were present (some crude forms like the abakus have been around for quite some time, but didn't help too much with tasks of that complexity, so just pretend they don't count), a mathematician's work consisted widely of solving such large things by hand. Doing so for just one formula could easily take days, but they didn't have much of a choice.
Also, a lot of theories and conventions in notation are comparably young, and a lot of problems and techniques weren't even known at times, and thus no need to solve them in the first place. Math was mostly much simpler, and it was the technological revolutions over time that led to more and more discoveries that demanded more and more complex systems to describe and solve. It's likely, around 1800, no one ever faced the problem of having to solve a function of degree 999, and if, it might just as well have been considered unsolvable or at least impractical and substituted by a simpler approximation.

The whole problem itself is however interesting, and touches the question: What is compression?
If you take a simple 32 bit RNG, it can yield 4GB of apparently unorganized data. Compressing this with one of the typical entropy based techniques like LZW, Huffman or arithmetic will usually not make much of a difference, and the maximum compression dictated by entropy will also still be large. Yet, a single line of code could suffice to recreate the whole output - it just has to be found. If you assumed a RNG, and also knew which type, you could at least brute-force it's parameters and be done with it, but then you'd be back at square one: You implied a specific structure of your data, and your arbitrarity is gone. It would be just another specialized compressor, nothing suitable for any other data than such coming out of that one specific RNG.

The whole thing also raises the question of what compressed data is, and simplifying as you seem to want it is basically compression. It could be understood as a series of commands that a machine can interpret, and as a result of that spit out the data you wanted.
At this point, things become rather abstract. You could see WinZip for example as a virtual machine, and a zip file as the code it runs. Output of that is the restored files of the archive. But the VM here could be anything, and of very varying power. It could just as well be a C compiler + native runtime environment, and the compressed data is reverse engineered source code. Or a solver for polynoms! The big question is then: Is there a practical way to get from the uncompressed data to a form that interpreted by a VM leads back to it, and is that any smaller? An important part is the capabilities of the VM, and the "language's" vocabulary, and their fit to the data. And it's mostly that last condition that makes this quite an impossible endeavour. For arbitrary data, you'd need an arbitrary machine - such thing doesn't exist.
There are some cases where knowledge of the general structure of data can be used to make a specialized compressor that works well for that particular kind of data, but there probably is no optimal or even valid generic approach for the time being.

[Edited by - Medium9 on March 8, 2010 11:50:42 AM]

This topic is closed to new replies.

Advertisement