Math.Sin replaced with lookup table

Started by
8 comments, last by alvaro 12 years, 8 months ago
I finally made my first lookup table because profiling my code revealed my Math.Sin to be the cause of a slowdown. This makes sense, as I am using a sin wave to fill an audio buffer. I created a lookup table with all the sin values from 0 to 360 degrees by every tenth of a degree. I had another program spit out all the values with commas, so I copied and pasted it in and made a static array of floats. I then looked it up with the code:

public static float fastSin(float angle)
{
int a = (int)(angle * 10);

a %= 3600;

return sinTable[a];
}


When I tried it again, I noticed absolutely no performance improvement. I changed the array to use values every 1 degree, and this did not help at all. Why is this?
Advertisement
Only a profiler can really answer this, but my guess is because memory is *slow* to access.

What is the behavior of the processor caches? Does your code's working set fit within L1 cache? L2? How much time are you spending fetching stuff from RAM? How often does your lookup table get evicted from L1?


Also, you'll have to be more specific with this statement: "profiling my code revealed my Math.Sin to be the cause of a slowdown"

How is Math.Sin implemented? Does it also have a lookup table? Why are you confident that Math.Sin was the bottleneck? What about it is a bottleneck? Function call overhead? Fetching stuff from memory? what?
Also where exactly does the input "angle" come from? Depending on your platform you might have a CPU that executes instructions out-of-order to try and hide memory latency. Profilers will often detect large amounts of activity on instructions, not because of their specific cost, but more often because the input to the instruction came from memory, and the processor is spending a long time waiting for that data to become available. I suspect this might be why you originally thought sin() was slow, and could point at the algorithm calling sin() as being the true culprit.
I am working currently on a core 2 duo in my macbook, but I am targeting the xbox 360 with XNA. I suspect that I am frequently accessing RAM because I am filling 9 buffers 48000 floats long. I am not too sure what the profiler is telling me because I am very inexperienced with it. It is the one that comes with visual studio 2010 professional.

I finally made my first lookup table because profiling my code revealed my Math.Sin to be the cause of a slowdown. This makes sense, as I am using a sin wave to fill an audio buffer. I created a lookup table with all the sin values from 0 to 360 degrees by every tenth of a degree. I had another program spit out all the values with commas, so I copied and pasted it in and made a static array of floats. I then looked it up with the code:

public static float fastSin(float angle)
{
int a = (int)(angle * 10);

a %= 3600;

return sinTable[a];
}


When I tried it again, I noticed absolutely no performance improvement. I changed the array to use values every 1 degree, and this did not help at all. Why is this?


The sin function really isn't that slow these days but memory is (relative to the cpu) so its hard to write a faster one using lookup tables.

It might be better to try to reduce the number of sin calls entierly, why not post the function that you call sin from instead ? you could possibly also change that to access your table directly to cut out the int a= .... a%=3600 part. (Depending on what you're doing it might even be possible to precalculate far more than just the sin values aswell which would make a lookup table far more effective)
[size="1"]I don't suffer from insanity, I'm enjoying every minute of it.
The voices in my head may not be real, but they have some good ideas!

I am working currently on a core 2 duo in my macbook, but I am targeting the xbox 360 with XNA. I suspect that I am frequently accessing RAM because I am filling 9 buffers 48000 floats long. I am not too sure what the profiler is telling me because I am very inexperienced with it. It is the one that comes with visual studio 2010 professional.


I assume you are using intrinsics to get the SIMD versions of the XNA math functions? The XNA sine function uses a Taylor series with 12 terms. Using a different algorithm (Chebyshev Equioscillation Theorem applied to sin(x)/x), you can double the speed (with a polynomial of 6 terms) at half the maximum error. From my math libraries. It is easy to convert this to SIMD, whether Intel SSE or Xbox360.


template <typename Real>
Real Math<Real>::FastSin1 (Real angle)
{
Real angleSqr = angle*angle;
Real result = -(Real)2.39e-08;
result *= angleSqr;
result += (Real)2.7526e-06;
result *= angleSqr;
result -= (Real)1.98409e-04;
result *= angleSqr;
result += (Real)8.3333315e-03;
result *= angleSqr;
result -= (Real)1.666666664e-01;
result *= angleSqr;
result += (Real)1.0;
result *= angle;
return result;
}
Since you're evaluating sin not at a whole bunch of arbitrary angles, but instead for angles with a fixed increment, there's another, very fast way to do this.

Store a unit vector (x,y) whose initial value is (1,0). Then, each timestep, multiply this vector by a (precomputed) rotation matrix [cos(dTheta), -sin(dTheta); sin(dTheta), cos(dTheta)], where dTheta is your angle increment. If you plot 'y' vs. time, you'll get a sine wave. (And 'x' vs time is a cosine.)

This requires just 4 muls and 2 adds at each increment and requires very little state.

The only issue with this approach is that you'll need to make sure that rounding error doesn't cause x^2 + y^2 != 1, so, occasionally, you'll need to renormalize (x,y). Even so, this is reasonably cheap (2 muls, an add, a sqrt, and two divides), and moreover you don't need to do it every increment.

(There are also some other ways to stabilize (x,y) to the unit circle, which might be cheaper...)

When I tried it again, I noticed absolutely no performance improvement. I changed the array to use values every 1 degree, and this did not help at all. Why is this?


During last 10 years, the speed of CPUs increased a 1000 times.
Access rates of RAM increased about 5 times.
The speed of light remained the same (latency of memory access).

In short, accessing memory has become increasingly expensive. Or, if memory access took 1 cycle 10 years ago, it now takes around 200. The lookup tables have last been competitive on 386 before pipelining and memory caches became the norm.

There are still cases where precomputed values can help, but usually not when used in such manner.


You could try Taylor series approximation.

I assume you are using intrinsics to get the SIMD versions of the XNA math functions? The XNA sine function uses a Taylor series with 12 terms. Using a different algorithm (Chebyshev Equioscillation Theorem applied to sin(x)/x), you can double the speed (with a polynomial of 6 terms) at half the maximum error. From my math libraries. It is easy to convert this to SIMD, whether Intel SSE or Xbox360.


template <typename Real>
Real Math<Real>::FastSin1 (Real angle)
{
Real angleSqr = angle*angle;
Real result = -(Real)2.39e-08;
result *= angleSqr;
result += (Real)2.7526e-06;
result *= angleSqr;
result -= (Real)1.98409e-04;
result *= angleSqr;
result += (Real)8.3333315e-03;
result *= angleSqr;
result -= (Real)1.666666664e-01;
result *= angleSqr;
result += (Real)1.0;
result *= angle;
return result;
}



AHH its slower. Thanks anyways. The sine function I am using is not XNA's, it is Microsoft's included with the standard c# library. I also had to look up intrinsics and SIMD and I do not fully understand them, but I am not sure what I can do with them through XNA. I know I can not use any unmanaged code if that means anything...

I will try the Taylor series and Emergent's idea later on and I will let you know how they go.
[font="arial, verdana, tahoma, sans-serif"][font="arial, verdana, tahoma, sans-serif"]Dave Eberly's code would be fast if you were using a compiled programming language. Emergent's solution is probably your best bet. If you have a type that represents complex numbers, it's really easy to do: Start with some arbitrary number and at each step, multiply by exp(2*pi*i/3600). The real part will follow a sine wave.[/font][/font][font="arial, verdana, tahoma, sans-serif"] [/font]
[font="arial, verdana, tahoma, sans-serif"][font="arial, verdana, tahoma, sans-serif"] [/font][/font]
[font="arial, verdana, tahoma, sans-serif"][font="arial, verdana, tahoma, sans-serif"][/font][/font]#include <iostream>
#include <complex>


typedef std::complex<double> C;


int main() {
C z(0.0,1.0);
double PI = std::atan(1.0)*4.0;
C rotation = std::exp(C(0.0, PI/1800));

for (int i=0; i<3600; ++i) {
z *= rotation;
std::cout << real(z) << '\n';
}
}


This topic is closed to new replies.

Advertisement