• Advertisement
Sign in to follow this  

Finding the optimal domain for piecewise function

This topic is 468 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

So the motivation behind my question comes from my attempt at creating a fairly small 2d lookup table for the square-root function (with a small domain - from 0 to ~1.5), assumption being that by optimizing my sub-domain for my piecewise function, I can maximize my accuracy.

Having covered combinatorics in one of my classes years ago, i got to thinking on how a 2d table can essentially functions as a numeral system, and that if you have a function f(x), how can one extract a numeral system to serve that function better (somewhat akin to how factoradics are adapted to numbering permutations)?

Also, how do I determine if a LUT to big, that memory lookup costs exceed fast interative approximations. Would a "float sqrt_tab[4096]" terribly inefficient? Not looking for anything more than a heuristic because I understand every system is different and programs are definitely not compiled equally.

Edit: more importantly, is it possible to get more accuracy out of something like a smart "float sqrt_tab_pw[8][8]" than the naive, fixed interval approach like "float sqrt_tab[1000]".

Share this post


Link to post
Share on other sites
Advertisement

It doesn't matter how big your LUT is, if the cache line you're looking up isn't hot in CPU cache, you will wait longer for the memory lookup than it takes to just calculate the sqrt on the fly. LUTs for expensive operations (sin,cos,tan) used to be common, but nowadays it's usually a pessimization. Even back in the day, sqrt was not often tabled. Usually it was approximated with cheaper operations since they were "good enough" for game purposes. You didn't mention your platform btw, if this is for something like a raspberry pi with limited CPU horsepower, maybe this could be a win.

Share this post


Link to post
Share on other sites

It doesn't matter how big your LUT is, if the cache line you're looking up isn't hot in CPU cache, you will wait longer for the memory lookup than it takes to just calculate the sqrt on the fly. LUTs for expensive operations (sin,cos,tan) used to be common, but nowadays it's usually a pessimization. Even back in the day, sqrt was not often tabled. Usually it was approximated with cheaper operations since they were "good enough" for game purposes. You didn't mention your platform btw, if this is for something like a raspberry pi with limited CPU horsepower, maybe this could be a win.

 

In the event that I'm calculating tens of thousands of square roots per frame, could the initial hit of pulling that LUT into memory be offseted by the possible gain? If the size of the table is smaller, then do the chances of it staying in cache rise since it is vying for fewer resources? 

 

Thanks for taking the time to respond. 

 

Edit: my platform isn't limited (my game engine currently targets an i3 level processor).

Edited by fais

Share this post


Link to post
Share on other sites

An i3 is still a modern processor and cach misses supposedly cost much more than calculations.

Why dont you first do the simplest possible thing that works, then look if its really too slow, and in that case profile and optimize the real bottlenecks?

Share this post


Link to post
Share on other sites

An i3 is still a modern processor and cach misses supposedly cost much more than calculations.
Why dont you first do the simplest possible thing that works, then look if its really too slow, and in that case profile and optimize the real bottlenecks?


It's like this site has an irrational fear of LUTs.

Maybe several thousand square roots being executed ~60 frames a second isn't the ideal candidate for LUTs.

I implemented an LUT table for exponential decay in a real atmospheric scattering demo I wrote years ago (early gen i7) to illuminate the exp() call being invoked several thousand times a frame, and my frame rate jumped drastically. I will pull that code out on my year old i3 laptop and retest, but I highly suspect the results will be the same.

Also, if my CPU is frequently paging out a table that is being addressed several 100 thousands of times a second , then I would think my CPU has bigger problems to deal with and this whole point is moot.

Share this post


Link to post
Share on other sites
Sign in to follow this  

  • Advertisement