Polynomial Interpolation

Started by
6 comments, last by Dave Eberly 13 years, 6 months ago
hi,

im kinda new to advance math, and have a problem that i need to solve using c++. so i had a 2 dimensional data (x, y). it contains let say 90000 data. now i want to find the best curve (interpolate) that represent the data using some kind of polynomial. now those 90000 data needed to be compressed into let say 300 data that will form a polynomial

a(0) + a(1)x^1 + a(2)x^2 + a(3)x^3 + ..... a(300)x^300

and represent the value that presented in the original data. can anyone give a hint how can i do that? what should i try to figure out first?

thanks in advance
Advertisement
One conceptually simple mechanism is least squares polynomial fitting. If you aren't interested in the theory skip down to equation 13.

That said, polynomial curve fitting isn't a particularly good tool when you're dealing with high degree polynomials. Unless your data really is polynomial in nature you tend to get really strange data when fitting high degree polynomials. At degree 300 I wouldn't be surprised if you ended up getting results where the polynomial coefficients or intermediary calculations aren't representable by finite precision floating point numbers. Depending on the nature of your data you might be better off investigating Fourier transforms or wavelets.
What SiCrane said. I want to add a suggestion to look into cubic splines. They are intuitive, relatively easy to work with and they don't blow up like high-degree polynomials do.
thx,

yeah i also notice this gonna be precision error fest 2010 but im using double is that increase my chance to get the 'right' number? but ouch thats seems pretty crazy math. is there some kind of library that i can use or someone kind enough to share the code around? sorry for being lazy-alike, but i dont think my time is forgiving enough for this kind of equation. :)

Let me put it this way: in order to do least squares with a polynomial of degree 300 you need to calculate x600. The maximum value of an IEEE double is about 1.79769e+308. This means that the maximum x value that your data can contain is only 3.26406. Anything bigger will overflow a double, just for that one calculation. And that number gets multiplied and added to other numbers.
Do you have to use a polynomial? Can you use any kind of curve fit? Depending on your data I might fit to some kind of basis function like a sine or cosine wave. You might also try some kind of orthonormal polynomial to avoid least squares on a very large data set.

If you do stick with the standard 300 d polynomial, you can avoid some (not all) of the numerical stability issues that were mentioned if you evaluate the polynomial using horners rule.

You asked about a library...although I don't know of any curve-fitting general purpose libraries out there, its pretty easy to implement on top of any matrix math library you want. Eigen is my favorite.
Here's least-squares in a nutshell. This describes Fourier transforms, wavelet transforms, b-splines, and polynomial interpolation. They're all just special cases of one simple, geometric idea.

Your data is an n-dimensional vector 'b.'

You have a bunch of n-dimensional vectors

v1, v2, ..., vm .

These can be the Fourier basis vectors, a polynomial basis, b-spline "bumps," whatever. They're a bunch of vectors that you're going to add up to approximate your data.

A geometric interpretation of least squares is that you want to project b onto the linear subspace (e.g., a plane passing through the origin) spanned by the vectors v1,...,vm. Any point 'p' in this subspace can be written

p = x1 v1 + ... + xm vm

where x1,...,xm are scalar coefficients. For p to be the projection of b onto this basis means, among other things, that the component of this p along each of the basis vectors is the same as the component of b along that vector. I.e.,

<v1, p> = <v1, b>
<v2, p> = <v2, b>
...
<vm, p> = <vm, b>.

Expanding p (according to the sum given previously), these are equivalently,

<v1, x1 v1 + ... + xm vm> = <v1, b>
<v2, x1 v1 + ... + xm vm> = <v2, b>
...
<vm, x1 v1 + ... + xm vm> = <vm, b>.

or, breaking apart the dot products and pulling out the scalars x1,...,xm,

<v1, v1> x1 + ... <v1, vm> xm = <v1, b>
...
<vm, v1> x1 + ... <vm, vm> xm = <v1, b> .

This can be written in matrix form as,

[ <v1, v1>   <v1, v2> ... <v1, vm> ] [ x1 ]   [ <v1, b> ][ <v2, v1>   <v2, v2> ... <v2, vm> ] [ x2 ]   [ <v2, b> ][   .       .                .     ] [ .  ] = [    .    ][   .              .         .     ] [ .  ]   [    .    ][   .                    .   .     ] [ .  ]   [    .    ][ <vm, v1>    .   .   .   <vm, vm> ] [ xm ]   [ <vm, b> ]


or just,

G x = B.

The symmetric matrix G of inner products is called the Gramian of the vectors v1,...,vm.

In any event, the matrix G will be invertible so long as the vectors v1,...,vm are linearly independent, and you can find x as

x = G^(-1) B.

That's basically all there is to it.

One last thing: Certain sets of basis functions have special properties. The Fourier basis is orthogonal, so in that case G is just a multiple of the identity matrix and you really only have to compute some inner products to get B; there aren't even any equations to solve, really. Same with wavelets.

Go pick your favorite set of basis vectors and go for it.
SiCrane's response is the most important one. With a polynomial of degree 300, double-precision floating-point is still not enough to avoid serious numerical issues. Although Horner's rule for polynomial evaluation minimizes the operation count, it still has the same numerical issues with overflow. Even for smaller degree polynomials, you have to worry about oscillatory behavior of the fitting polynomial.

I suggest fitting with a low-degree B-spline curve. See B-Spline Curve Fitting. This is for 3D curves, but the same ideas apply to 2D curves and to graphs of functions f(x). The general code and documentation is Curves, Surfaces, Volumes.

For least-squares polynomial fitting, see Approximations (not that this will help when you have degree 300 requirements).

This topic is closed to new replies.

Advertisement