Jump to content

  • Log In with Google      Sign In   
  • Create Account

Trigonometric Lookup Table: Float vs. Double


Old topic!
Guest, the last post of this topic is over 60 days old and at this point you may not reply in this topic. If you wish to continue this conversation start a new topic.

  • You cannot reply to this topic
14 replies to this topic

#1 epicpunnum   Members   -  Reputation: 458

Like
0Likes
Like

Posted 19 February 2013 - 12:21 PM

For a game that I am designing, I have been designing a barebones 2D game framework that includes physics, boundng boxes, cameras, game states, etc. Naturally, because my game is largely physics based, I felt the necessity to design a lookup table for my trig functions.

Currently, this Trig class uses only integer values for angles, and three arrays for sine, cosine, and tangent (Each as a Double wrapper object so that values could pass by reference).

In an attempt to try and make my framework less memory intensive, I have been going through my code an using appropriately sized variables over simply ints and doubles. It occurred to me that instead of saving hundreds of doubles (at 64 bytes each), I could cut that in half by using floats (at 4 bytes each).

My question comes in multiple parts: In your experience, would this help a good amount in terms of memory management - especially for lower end computers/netbooks? Would the loss of precision from a double to a float be significant to the player? Could this potentially increase the number of trigonometry-based object at any given time?

Thank you for your time :)



Sponsor:

#2 frob   Moderators   -  Reputation: 22714

Like
2Likes
Like

Posted 19 February 2013 - 12:30 PM

Naturally, because my game is largely physics based, I felt the necessity to design a lookup table for my trig functions.

Why did you feel "the necessity" ?

Twenty or so years ago, when floating point was slow and memory access speeds were not a bottleneck, back then it made sense. On hardware that has software floating point, such as handhelds and certain older consoles, it can also make sense. But on most modern development they are the exception, not the rule.

On a modern desktop CPU --- any x86 processor since about 1997 or so --- the CPU operations are much faster and more accurate than a memory lookup.

Edited by frob, 19 February 2013 - 12:31 PM.

Check out my book, Game Development with Unity, aimed at beginners who want to build fun games fast.

Also check out my personal website at bryanwagstaff.com, where I write about assorted stuff.


#3 TheUnnamable   Members   -  Reputation: 803

Like
0Likes
Like

Posted 19 February 2013 - 12:53 PM

Also, your trigonometric lookup tables are currently taking up 360*3*8 bytes, that is 8640 bytes. Cutting this in half would free up a little more than 4 kilobytes.



#4 SimonForsman   Crossbones+   -  Reputation: 6295

Like
0Likes
Like

Posted 19 February 2013 - 01:19 PM


Naturally, because my game is largely physics based, I felt the necessity to design a lookup table for my trig functions.

Why did you feel "the necessity" ?

Twenty or so years ago, when floating point was slow and memory access speeds were not a bottleneck, back then it made sense. On hardware that has software floating point, such as handhelds and certain older consoles, it can also make sense. But on most modern development they are the exception, not the rule.

On a modern desktop CPU --- any x86 processor since about 1997 or so --- the CPU operations are much faster and more accurate than a memory lookup.


On any x86 processor the CPU fails to meet Javas accuracy requirements for trig functions (Allthough the new intel CPUs(not sure if its just IA64 though) does have more accurate trig functions available, not sure if the JVM can use those yet though) so Java will perform a software argument reduction(possibly using fixed point arithmetics) for you before calling the native trig functions, this can get significantly more expensive(for large arguments) than a table lookup.

Allthough with a lookuptable you have to perform argument reduction in software anyway so unless your lookuptable fits in cache it should pretty much always be faster to do a quick and dirty argument reduction yourself and the cost to precision will be smaller.

if you don't need high accuracy the fastest method in java(on x86, normal trig is really fast on ARM, Sparc, etc) is most likely to use an aproximation function.

Edited by SimonForsman, 19 February 2013 - 02:08 PM.

I don't suffer from insanity, I'm enjoying every minute of it.
The voices in my head may not be real, but they have some good ideas!

#5 BGB   Crossbones+   -  Reputation: 1554

Like
0Likes
Like

Posted 19 February 2013 - 01:40 PM

 

Naturally, because my game is largely physics based, I felt the necessity to design a lookup table for my trig functions.

Why did you feel "the necessity" ?

Twenty or so years ago, when floating point was slow and memory access speeds were not a bottleneck, back then it made sense. On hardware that has software floating point, such as handhelds and certain older consoles, it can also make sense. But on most modern development they are the exception, not the rule.

On a modern desktop CPU --- any x86 processor since about 1997 or so --- the CPU operations are much faster and more accurate than a memory lookup.

 

On any x86 processor the CPU fails to meet Javas accuracy requirements for trig functions (Allthough the new intel CPUs does have more accurate trig functions available, not sure if the JVM can use those yet though) so Java will perform a software argument reduction(possibly using fixed point arithmetics) for you before calling the native trig functions, this can get significantly more expensive(for large arguments) than a table lookup.

Allthough with a lookuptable you have to perform argument reduction in software anyway so unless your lookuptable fits in cache it should pretty much always be faster to do a quick and dirty argument reduction yourself and the cost to precision will be smaller.

if you don't need high accuracy the fastest method in java is most likely to use an aproximation function.

 

pretty much.

 

but, if the number of table entries is reasonable (for example, 256 or 1024, or similar), fitting in cache shouldn't be too much of an issue.

 

for example (for a 4096-entry sin table):

f=sinTable[((int)(angle*651.8986469+0.5))&4095];

 

where the magic constant there is: 4096/(2*PI), though slightly rounded.

 

 

sometimes, similar also makes sense when working with fixed-point, generally as conversions between floating-point and integer/fixed-point values are expensive. generally, fixed-point mostly makes sense for signal processing tasks (audio/video codecs), generally as both the input and output are usually small integer values, and doing lots of these sorts of type conversions can eat a lot of clock-cycles.



#6 EWClay   Members   -  Reputation: 659

Like
0Likes
Like

Posted 19 February 2013 - 02:03 PM

Regarding precision, you will probably find that you can use float without a problem. I can switch my entire physics code from double to float and I see no significant difference. Certainly for trig functions, which don't have that much range, you will get away with it.

 

It doesn't make any difference to calculation speed, because the CPU will be using double precision registers, and surely the memory cost of double over float is not significant, but it can mean less data to read and therefore better cache performance.

 

Whether you should be using lookup tables at all is another matter.



#7 epicpunnum   Members   -  Reputation: 458

Like
0Likes
Like

Posted 19 February 2013 - 03:08 PM

To clarify, I felt that a lookup table would be a necessity because:

  • More than half of my ingame entities would be using some form of rotation
  • Rotation is a key aspect of the gameplay
  • Elastic and inelastic collisions are frequent

If I understand correctly, the Math variations on trig functions run an algorithm  to accurately determine it each call. For lookup tables, it only has to run those calculations once for each angle and function (less if you understand the symmetry of the unit circle). So let's assume that you only really need to run 270 (aka 90*3) calls to a trig function and its really accurate algorithm. Because of the high frequency of trig functions in my game, it wouldn't suprise me that at any given time I could be running AT LEAST 50 of these functions on any given frame.

For the record, I honestly don't mind using only integer angles or sacrificing some accuracy. I do most of my away-from-home programming on an outdated netbook (32-bit processor, 2 GB RAM, 1.6 Ghz CPU, >4 years old), where Flash can lag easily. Anything that can optimize calculations is useful for me.

 

EDIT: Thank you EWClay, that at least addresses the issue of precision. As long as it comes close and there are no discernible differences visually, its fine for me.


Edited by epicpunnum, 19 February 2013 - 03:17 PM.


#8 frob   Moderators   -  Reputation: 22714

Like
1Likes
Like

Posted 19 February 2013 - 03:20 PM

it wouldn't suprise me that at any given time I could be running AT LEAST 50 of these functions on any given frame.

That is a trivial workload for a modern processor.

Let's assume you are running on a single 2GHz processor.
Let's also assume 60FPS. That's over 33.3 MILLION cycles per frame.
Let's assume a bad-case scenario of 50 cycles per operation. (In practice they are around 10-20 cycles).

So 50 cycles per * "at least 50" of these functions = 2500 cycles.
2500 cycles = 1.25 microseconds.

That is less than 0.001% of your processor time per frame.

I'm curious what you are doing with the remaining 33,330,833 cycles per frame.

Check out my book, Game Development with Unity, aimed at beginners who want to build fun games fast.

Also check out my personal website at bryanwagstaff.com, where I write about assorted stuff.


#9 Geometrian   Crossbones+   -  Reputation: 1601

Like
0Likes
Like

Posted 19 February 2013 - 03:22 PM

Twenty or so years ago, when floating point was slow and memory access speeds were not a bottleneck, back then it made sense. On hardware that has software floating point, such as handhelds and certain older consoles, it can also make sense. But on most modern development they are the exception, not the rule.

On a modern desktop CPU --- any x86 processor since about 1997 or so --- the CPU operations are much faster and more accurate than a memory lookup.

Just to clarify, while it is true that some floating point operations now can often be done in comparable time to some integer operations, do not mistake trigonometric functions for a next-to-free operation. Even on systems where trig operations are only one machine instruction, it won't be one clock cycle. I'd guess at least ten. These problems are exacerbated in a VM (although Java, in general, may be able to make it almost as fast as native C by allocating code in an executable segment).

That said, as you mentioned, CPU operations are usually better than incurring a memory lookup. Hence:

if the number of table entries is reasonable (for example, 256 or 1024, or similar), fitting in cache shouldn't be too much of an issue.

This. Accessing in L1 cache is generally the same cost (or one, three clock cycles more) as register access. Consequently, you will gain a lot by using a table that fits into it.

The smaller your trig table is, the more of it is likely more of it is to be in cache. For most applications, if you're careful, you need shockingly little precision, too. I would probably store values of sine and cosine in a single signed byte, and then index into it with another signed byte. This table is tiny, and should fit entirely into L2 cache. The greatest cost would be casting the result to a float--which, shouldn't hurt much at all--especially if all you wanted to do was just check the sign instead. Tangent can be calculated as sin/cos which still isn't too bad (but how often do you need to calculate it anyway?).

You mentioned rotations--an application that requires higher precision. The best choice is to apply these as offsets to a base matrix each frame, so you don't accumulate error. If constantly updating matrices is a problem, you'd best use quaternions so that your rotation matrix doesn't accumulate skew (you should do that anyway, since all floating point operations accrue error in general, just more slowly).


And a Unix user said rm -rf *.* and all was null and void...|There's no place like 127.0.0.1|The Application "Programmer" has unexpectedly quit. An error of type A.M. has occurred.

#10 SimonForsman   Crossbones+   -  Reputation: 6295

Like
0Likes
Like

Posted 19 February 2013 - 03:30 PM

To clarify, I felt that a lookup table would be a necessity because:

  • More than half of my ingame entities would be using some form of rotation
  • Rotation is a key aspect of the gameplay
  • Elastic and inelastic collisions are frequent
If I understand correctly, the Math variations on trig functions run an algorithm  to accurately determine it each call. For lookup tables, it only has to run those calculations once for each angle and function (less if you understand the symmetry of the unit circle). So let's assume that you only really need to run 270 (aka 90*3) calls to a trig function and its really accurate algorithm. Because of the high frequency of trig functions in my game, it wouldn't suprise me that at any given time I could be running AT LEAST 50 of these functions on any given frame.

For the record, I honestly don't mind using only integer angles or sacrificing some accuracy. I do most of my away-from-home programming on an outdated netbook (32-bit processor, 2 GB RAM, 1.6 Ghz CPU, >4 years old), where Flash can lag easily. Anything that can optimize calculations is useful for me.
 
EDIT: Thank you EWClay, that at least addresses the issue of precision. As long as it comes close and there are no discernible differences visually, its fine for me.


50 per frame is not a problem, while frob understates the cost you shouldn't go far above 200 cycles per call unless your arguments go really far out of range so a few thousand trig calls per frame is still perfectly fine.
I don't suffer from insanity, I'm enjoying every minute of it.
The voices in my head may not be real, but they have some good ideas!

#11 Schrompf   Prime Members   -  Reputation: 954

Like
1Likes
Like

Posted 20 February 2013 - 03:48 AM

I still wonder why you'd even bother to optimize this. 50 calls per frame? Incredible! Make that 500k calls and you'd need to start worrying. But even then the first thing you should do is to MEASURE. Stop guessing that something might be a problem, start measuring what actually consumes the most processing time. I'd wager that sin()/cos() won't even show up on the list.

 

Also: packing a double into an Java object to be able to pass it around by reference? Your math does really create thousands of small objects every frame, and most but not all of them are outscoped at the end of the frame. The garbage collector will love you.


Edited by Schrompf, 20 February 2013 - 03:55 AM.

----------
Gonna try that "Indie" stuff I keep hearing about. Let's start with Splatter.

#12 TheChubu   Crossbones+   -  Reputation: 4756

Like
0Likes
Like

Posted 20 February 2013 - 06:46 AM

^This. In Java is very, very easy allocate thousands of new objects without noticing (a common example is making tons of String objects). Allocation time by itself wont be noticeable (unless you're allocating many complex objects, it those cases it is recommended that you reuse such objects where makes sense), but the allocation time + garbage collection time can build up and be a problem eventually.

 

Though I'm not sure if he meant a wrapper coded by him or the boxed Double biggrin.png

 

By looking around I'm inclined to say that Java has earned its reputation of being "slow" because is very easy and convenient to allocate new objects around at every single line of code, inside loops (worst), at the "return" of all methods (not so bad), etc.


Edited by TheChubu, 20 February 2013 - 06:49 AM.

"I AM ZE EMPRAH OPENGL 3.3 THE CORE, I DEMAND FROM THEE ZE SHADERZ AND MATRIXEZ"

 

My journals: dustArtemis ECS framework and Making a Terrain Generator


#13 BGB   Crossbones+   -  Reputation: 1554

Like
0Likes
Like

Posted 20 February 2013 - 02:01 PM

^This. In Java is very, very easy allocate thousands of new objects without noticing (a common example is making tons of String objects). Allocation time by itself wont be noticeable (unless you're allocating many complex objects, it those cases it is recommended that you reuse such objects where makes sense), but the allocation time + garbage collection time can build up and be a problem eventually.

 

Though I'm not sure if he meant a wrapper coded by him or the boxed Double biggrin.png

 

By looking around I'm inclined to say that Java has earned its reputation of being "slow" because is very easy and convenient to allocate new objects around at every single line of code, inside loops (worst), at the "return" of all methods (not so bad), etc.

 

Java has a lot of problems here...

 

basically, it is fairly easy to write code which starts spewing garbage at high speeds, and internally (in its libraries) makes considerable use of operations which solve problems simply by creating new objects only to discard them again, coupled with a lack of compound value-types, limited ability to perform lifetime inference, lack of reference arguments, ... leaving "new" as mostly the primary solution to "pretty much everything".

 

writing code which is more conservative with memory in Java is possible, just it requires care and isn't usually really all that pretty.

 

usually the problem is mostly not obvious as the JVM also has a pretty fast GC (so, the GC may run once every few seconds, and largely escape notice).

 

(I still though much prefer other more sanely designed languages...).

 

 

now as for sin/cos performance:

usually this becomes more relevant when these are being used many millions of times per second or similar.



#14 Álvaro   Crossbones+   -  Reputation: 13901

Like
0Likes
Like

Posted 20 February 2013 - 02:19 PM

now as for sin/cos performance:

usually this becomes more relevant when these are being used many millions of times per second or similar.

 

At which point one should step back and think hard about why so many sin/cos evaluations are needed. Geometric computations made using the language of vectors and matrices can very often be performed without any evaluations of trigonometric functions at all.



#15 BGB   Crossbones+   -  Reputation: 1554

Like
0Likes
Like

Posted 20 February 2013 - 03:30 PM

now as for sin/cos performance:

usually this becomes more relevant when these are being used many millions of times per second or similar.

 

At which point one should step back and think hard about why so many sin/cos evaluations are needed. Geometric computations made using the language of vectors and matrices can very often be performed without any evaluations of trigonometric functions at all.

 

pretty much...

 

it could also be taken as evidence that micro-optimizing sin/cos is probably premature optimization.

 

I have more often ran across this being an issue with sqrt, and generally this being because sqrt is needed for things like vector length and renormalization, which are typically much more common IME.






Old topic!
Guest, the last post of this topic is over 60 days old and at this point you may not reply in this topic. If you wish to continue this conversation start a new topic.



PARTNERS