Sign in to follow this  
HellRaiZer

Non linear regression (data fitting)

Recommended Posts

Hi... I want to fit a non-linear function in the form :
f(x, k1, k2, k3) = k1^2 * g(x, k2) + k3
to some data. I tried Levenberg - Marquardt method (as it appears in Numerical Recipes) but the results aren't good enough. I guess the problem is at the initial guess values for k1, k2, k3, because when i generate "experimental" data from the above function and the try to fit it to them, the results are very different depending on the initial values. When i say "experimental" data i mean that i evaluate the function at some (2048) points and i add some small random noise to them. If i choose initial values near the correct values, then the algorithm finds the exact values. Otherwise it returns completely incorrect results. Is there any method that can converge even with completely wrong initial values? I know that this depends on the function (how many local mins/maxs it has, where it has them, etc.). What method do you prefer / suggest for robust results? Note that the curve can't be linearized (it contains cosines; too many of them (and complex) for expanding them to series) so it has to be a nonlinear method. Thanks in advance. HellRaiZer

Share this post


Link to post
Share on other sites
You probably have several local minimums. If you can model your data as f(x,k1,k2,k3) + n, where n is some random variable representing noise _and_ if you know approximately the variance of the noise, then you can deduce whether or not you are close to the global minimum from the mean squared error (MSE).

If the minimum is found and the MSE is much larger that you have estimated, then you choose new values for k1,k2 and k3 and start over. (not perhaps the most sophisticated method but beats total enumeration :))

For more info refer to this Mathworld page

Share this post


Link to post
Share on other sites
Thanks for the quick reply.

Quote:

You probably have several local minimums. If you can model your data as f(x,k1,k2,k3) + n, where n is some random variable representing noise _and_ if you know approximately the variance of the noise, then you can deduce whether or not you are close to the global minimum from the mean squared error (MSE).


I can't do that for one reason. In case to calcualate the variance of the noise you mention i need more than one sample at the exact same time and the exact conditions. I don't have that. The procedure that generates the data runs in real-time, meaning that i have a complete list of samples (2048) every 0.5 secs and the total time is about 15 secs. So i can't have, not even approximately, two or more samples at the same time.

Btw, something i forgot to mention because i thought it was obvious. The variable in the above function is x. k1, k2 and k3 are the parameters which i have to calculate in case to fit the curve to the data.

Quote:

If the minimum is found and the MSE is much larger that you have estimated, then you choose new values for k1,k2 and k3 and start over. (not perhaps the most sophisticated method but beats total enumeration :))

What you describe is more like a brute force approach, which i have already considered, but not implemented, in the hope to find something more sophisticated and fast. The selection of the new k1, k2, k3 could be done using a deepest-decent-like method (local optimization) (avoids complete enumaration), but it will be too slow and there a chance the get stuck at a local minimum.

Because the whole thing runs at the above mentioned rate (2048 samples per 500 msec), i was thinking about an algorithm that can run in parallel. This means in real-time. It doen't have to be that way (i can fit the data after the procedure finishes) but if it was, it would be better.

Thanks for the link, and please correct me if i haven't understood you point correclty.

HellRaiZer

Share this post


Link to post
Share on other sites
Would you mind telling us what g(x,y) is and what the data that you are dealing with is? It might be easier to recommend a method if we know a little more about the situation.

Also, if you could put one of those sets of 2048 samples somewhere for download, maybe we can even test a little. I would do it just for fun.

I'll probably try simulated anhealing, or something inspired by genetic algorithms that I have been thinking about lately.

Share this post


Link to post
Share on other sites
Quote:
Original post by HellRaiZer
Thanks for the quick reply.

Quote:

You probably have several local minimums. If you can model your data as f(x,k1,k2,k3) + n, where n is some random variable representing noise _and_ if you know approximately the variance of the noise, then you can deduce whether or not you are close to the global minimum from the mean squared error (MSE).


I can't do that for one reason. In case to calcualate the variance of the noise you mention i need more than one sample at the exact same time and the exact conditions. I don't have that. The procedure that generates the data runs in real-time, meaning that i have a complete list of samples (2048) every 0.5 secs and the total time is about 15 secs. So i can't have, not even approximately, two or more samples at the same time.


I don't follow. You say you have 2048 samples, but you still can't use more than one sample at a time? Do you mean that you're streaming the data with strict latency requirements?

So infact you need an online algorithm?

But still, if you can assume that the fitting is correct for the past samples you can always estimate the error variance (MSE) with running variance and assume that the variance is stationary or atleast changes slowly.

Quote:

Btw, something i forgot to mention because i thought it was obvious. The variable in the above function is x. k1, k2 and k3 are the parameters which i have to calculate in case to fit the curve to the data.

Yes.
Quote:

Quote:

If the minimum is found and the MSE is much larger that you have estimated, then you choose new values for k1,k2 and k3 and start over. (not perhaps the most sophisticated method but beats total enumeration :))

What you describe is more like a brute force approach, which i have already considered, but not implemented, in the hope to find something more sophisticated and fast. The selection of the new k1, k2, k3 could be done using a deepest-decent-like method (local optimization) (avoids complete enumaration), but it will be too slow and there a chance the get stuck at a local minimum.


I wouldn't go so far as to saying that it is a brute force algorithm. The selection of the new guess for k1, k2 and k3 can be made by looking at the trajectory of the solution (k1,k2,k3) when it hit the local minimum. So if the selection (k1,k2,k3) is done "optimally" the worst case scenario has to run the optimizer as many times as there is local minimums.

Quote:

Because the whole thing runs at the above mentioned rate (2048 samples per 500 msec), i was thinking about an algorithm that can run in parallel. This means in real-time. It doen't have to be that way (i can fit the data after the procedure finishes) but if it was, it would be better.

HellRaiZer


OK.

Share this post


Link to post
Share on other sites
Hi...

alvaro:
Quote:

Would you mind telling us what g(x,y) is and what the data that you are dealing with is? It might be easier to recommend a method if we know a little more about the situation.
Also, if you could put one of those sets of 2048 samples somewhere for download, maybe we can even test a little. I would do it just for fun.
I'll probably try simulated anhealing, or something inspired by genetic algorithms that I have been thinking about lately.


Thanks for the interest. A little description first for the whole thing.
The process for generating the data is a physical process, a dissolution. It measures a physical quantity of a sample (its depth) based on the amount of light reflected off its surface. The sample's depth changes continually and the machine that measures the reflected light, collects data all the time. But in case to have a complete view of the process we have to analyze all the wavelengths involved (from about 300nm to 1100 nm). The machine gathers data for (approx.) 15 msec for a single wavelength, so in case to have a complete set it takes about 500 msec. The rest of the 2048 wavelength are calculated using linear interpolation (i think so at least; this is done internally by the machine). That's why i said i can't have two or more samples at the same time and the same conditions.

Here are the complete equation and a sample data file.
If you can make something from it, please post it here. The equation is in a word document (obviously) using Math equation 3.0.

Winograd:
Quote:

I don't follow. You say you have 2048 samples, but you still can't use more than one sample at a time? Do you mean that you're streaming the data with strict latency requirements?

Sorry but my english are really bad. As i said above, i have 2048 data points (x, y) but each data point isn't measured at the same exact time as the rest.

Quote:

So infact you need an online algorithm?

I thought it could be better this way, but if i can't find an online one, i'll have to fit the data off-line after the whole procedure finishes.

Quote:

But still, if you can assume that the fitting is correct for the past samples you can always estimate the error variance (MSE) with running variance and assume that the variance is stationary or atleast changes slowly.

The key phrase here is "if you can assume that the fitting is correct for the past samples". If this holds then you are correct that i can use the previous values as guesses for the new fitting. But the problem is how to get those first-time fitting parameters. If i had them, i think every fitting algorithm would perform fine. The problem is with the first time.

I still haven't read the MathWorld page. I'll check it now, and i'll come back with possible questions.

Thank you for helping. I hope it will be solved eventually!

HellRaiZer

Share this post


Link to post
Share on other sites
Do you know in what range d1 (a.k.a. k2) is likely to be? Is a value of -800 reasonable, for instance?

Can you check if this code computes the right thing? If not, can you fix it and post it?
double Q[4]={1.003,1.4800,1.4537,3.0166};
double W[4]={0.000,0.003577,0.00342,0.4281};
double S[4]={0.000,0.000170,0.000,0.000};

double d2=1044;

double Four_Pi=16*atan(1.0);

double n[4];
double r[4][4];

void update_n(double inv_lambda){
double inv_l2=inv_lambda*inv_lambda;
double inv_l4=inv_l2*inv_l2;

for(int i=0;i<4;++i)
n[i]=Q[i]+W[i]*inv_l2+S[i]*inv_l4;
}

void update_r(double inv_lambda){
update_n(inv_lambda);

for(int i=0;i<4;++i)
for(int j=0;j<4;++j)
r[i][j]=(n[i]-n[j])/(n[i]+n[j]);
}

double f_over_g(double lambda, double d1){
double inv_lambda=1.0/lambda;
double FPIL=Four_Pi*inv_lambda;

update_r(inv_lambda);
double r012=r[0][1]*r[0][1];
double r122=r[1][2]*r[1][2];
double r232=r[2][3]*r[2][3];

double c1=cos(FPIL*n[1]*d1);
double c2=cos(FPIL*n[2]*d2);
double c1p2=cos(FPIL*(n[1]*d1+n[2]*d2));
double c1m2=cos(FPIL*(n[1]*d1-n[2]*d2));

double g =
1.0 + r012*r122 + r012*r232 + r122*r232 +
2*r[1][2]*r[2][3]*c2*(1+r012) +
2*r[0][1]*r[2][3]*c1p2 +
2*r[0][1]*r[1][2]*c1*(1+r232) +
2*r[0][1]*r122*r[2][3]*c1m2;

double f =
r012+r122+r232+2*r[0][1]*r122*r[2][3]+r012*r122*r232+
2*r[0][1]*r[2][3]*c1p2+
2*r[1][2]*r[2][3]*c2*(1+r[0][1]*r[2][3])+
2*r[0][1]*r[1][2]*c1*(1+r[0][1]*r[2][3]);

return f/g;
}



Share this post


Link to post
Share on other sites
Hi again...

Quote:

Do you know in what range d1 (a.k.a. k2) is likely to be? Is a value of -800 reasonable, for instance?

No. d1 is the depth of the dissolved surface, so it must be greater or equal to 0. For the data you have (which corresponds to the beginning of the process) it must be greater than 0, but i don't know an exact upper bound. It must be under 1000.
Also 'A' must be greater than zero. If you prefer (and if you don't want to put more constraints to the system) you can replace it with e.g. B = A^2, and B can take any value.

I made some corrections to the source. Q[0], S[0] and W[0] have changed (i found out that they were wrong after i uploaded the .doc). One more thing changed is the calculation of n[i]. See the source (multiply with 10e6 and 10e12 the two additional terms). Sorry for the mistakes :)


double Q[4]={1.4930,1.4800,1.4537,3.0166};
double W[4]={0.004864,0.003577,0.00342,0.4281};
double S[4]={0.000071,0.000170,0.000,0.000};

double d2=1044;

double Four_Pi=16*atan(1.0);

double n[4];
double r[4][4];

void update_n(double inv_lambda){
double inv_l2=inv_lambda*inv_lambda;
double inv_l4=inv_l2*inv_l2;

for(int i=0;i<4;++i)
n[i]=Q[i]+W[i]*inv_l2 * 10e6 + S[i]*inv_l4 * 10e12;
}

void update_r(double inv_lambda){
update_n(inv_lambda);

for(int i=0;i<4;++i)
for(int j=0;j<4;++j)
r[i][j]=(n[i]-n[j])/(n[i]+n[j]);
}

double f_over_g(double lambda, double d1){
double inv_lambda=1.0/lambda;
double FPIL=Four_Pi*inv_lambda;

update_r(inv_lambda);
double r012=r[0][1]*r[0][1];
double r122=r[1][2]*r[1][2];
double r232=r[2][3]*r[2][3];

double c1=cos(FPIL*n[1]*d1);
double c2=cos(FPIL*n[2]*d2);
double c1p2=cos(FPIL*(n[1]*d1+n[2]*d2));
double c1m2=cos(FPIL*(n[1]*d1-n[2]*d2));

double g =
1.0 + r012*r122 + r012*r232 + r122*r232 +
2*r[1][2]*r[2][3]*c2*(1+r012) +
2*r[0][1]*r[2][3]*c1p2 +
2*r[0][1]*r[1][2]*c1*(1+r232) +
2*r[0][1]*r122*r[2][3]*c1m2;

double f =
r012+r122+r232+2*r[0][1]*r122*r[2][3]+r012*r122*r232+
2*r[0][1]*r[2][3]*c1p2+
2*r[1][2]*r[2][3]*c2*(1+r[0][1]*r[2][3])+
2*r[0][1]*r[1][2]*c1*(1+r[0][1]*r[2][3]);

return f/g;
}




Because i couldn't find a good solution to the whole problem (i tried a commercial program too, GraphPad Prism, but with no luck) i asked the guy who performed the experiments in case to see if there is something wrong about the data. I'll have an answer tomorow (it's 21:00 here now, so i'll have to wait a bit). I'll post his answer here as well as new data if those were wrong. I'll also check the equation to confirm it's the correct one, in case to eliminate all potential problems.

Once again, thanks a lot for the help and interest. I'll keep you informed.

HellRaiZer

Share this post


Link to post
Share on other sites
I tried with the corrected code, and it looks like the data doesn't match something of the form A*f/g+C at all. I suspect there are still errors in the formula... :(

Share this post


Link to post
Share on other sites
You need something a little more powerful than regressing to a curve. Your problem sounds like an ideal one for particle filtering. It's ideal for determining the hidden parameters of a system given a noisy nonlinear model of the system and noisy indirect measurements over time. Linkowski sum

Share this post


Link to post
Share on other sites
Thanks for the link Sneftel. I've read it and i need to find more info on the subject. I'll check it tomorow.

alvaro:
Quote:

I tried with the corrected code, and it looks like the data doesn't match something of the form A*f/g+C at all. I suspect there are still errors in the formula... :(

Or the data! It is possible that the data aren't correct, and as i said i'm waiting for an answer. Anyway thanks for trying.

HellRaiZer

Share this post


Link to post
Share on other sites
Hi...

Fortunetely the mistake was in the data. These data points correspond to a second set of data measured by the machine. The real data are very similar to how the function looks, with one difference.

Instead of having :

E(l) = A * f(l) / g(l) + C

we have :

E(l) = A * I(l) * f(l) / g(l) + C

Where I(l) is the data i posted. I don't have a correct data set at the moment (waiting), but i think it will work ok.

alvaro:
If you are interested in the correct data set, tell it and i'll upload a file when i have it. I'm saying that because you said you are implementing GAs so i thought you may need something to test.

Thanks a lot for the help, for once more, and if there are any problems i'll post again.

HellRaiZer

Share this post


Link to post
Share on other sites
Please, post the correct data.

Also, after your redefinition of E, I am not sure what the problem is anymore. I thought you were trying to estimate parameters such that E would be a good approximation to the data. But if the definition of E depends on the data, this doesn't make sense.

Share this post


Link to post
Share on other sites
Here are the new data.

Quote:

Also, after your redefinition of E, I am not sure what the problem is anymore. I thought you were trying to estimate parameters such that E would be a good approximation to the data. But if the definition of E depends on the data, this doesn't make sense.

As you can see in the .rar there are two files. One that describes I(lamda) (there is no analytical form for this) and one with the data to be fitted (e_lamda.txt). Does this makes sense? There is no dependance between the data and the definition of E. E depends on I, which in turn is described by another set of data.

I don't think there will be a problem with those, because the e_lamda.txt data points seems a lot like the analytical form of E(lamda), with a little difference.

The old E(lamda) is a sinusoidal-like function with decreasing height (range) and increasing period. With the introduction of I(lamda) in the equation (which looks like a gaussian), i think it will match the experimental data. But i haven't got the time to test it. I'll do it now.

HellRaiZer

PS. One thing to keep in mind is that the values of i_lamda.txt are at different wavelengths (the 1st column) than those in e_lamda.txt. This is how the machine measured them. You can make the assumption that they are at the same wavelengths (the differ only about 0.05 nm) or you can linear interpolate/extrapolate in case to find the values of I(lamda) at the exact wavelengths for which we have data for E(lamda).

Share this post


Link to post
Share on other sites
They must be kidding me!!! (They == who made me solve this thing)

After trying the whole thing with all the changes, i couldn't find a good solution to the problem. The error is still too big. I suspect the whole thing is wrong.

alvaro did you have any luck fitting the function to the data?

I tried to brute force it, by giving every parameter a wide range and a small step, and after approx 1:30 hours, the algorithm finished. The result is wrong and the error is too big. The answer for the data i posted was :

d1 = 443.2118
A = 0.7346
C = 226.27

with Sum((f(x_i) - y_i)^2) about 290 millions!!!!! This is completly off, so there must be a serious error in the whole thing.

One question. How can somebody tell if the model it uses is correct or not, for the data it has? From my results can i say with confidence that the model is wrong?

HellRaiZer

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this