Finding the optimal domain for piecewise function

Started by
3 comments, last by fais 7 years, 3 months ago
So the motivation behind my question comes from my attempt at creating a fairly small 2d lookup table for the square-root function (with a small domain - from 0 to ~1.5), assumption being that by optimizing my sub-domain for my piecewise function, I can maximize my accuracy.

Having covered combinatorics in one of my classes years ago, i got to thinking on how a 2d table can essentially functions as a numeral system, and that if you have a function f(x), how can one extract a numeral system to serve that function better (somewhat akin to how factoradics are adapted to numbering permutations)?

Also, how do I determine if a LUT to big, that memory lookup costs exceed fast interative approximations. Would a "float sqrt_tab[4096]" terribly inefficient? Not looking for anything more than a heuristic because I understand every system is different and programs are definitely not compiled equally.

Edit: more importantly, is it possible to get more accuracy out of something like a smart "float sqrt_tab_pw[8][8]" than the naive, fixed interval approach like "float sqrt_tab[1000]".
Advertisement

It doesn't matter how big your LUT is, if the cache line you're looking up isn't hot in CPU cache, you will wait longer for the memory lookup than it takes to just calculate the sqrt on the fly. LUTs for expensive operations (sin,cos,tan) used to be common, but nowadays it's usually a pessimization. Even back in the day, sqrt was not often tabled. Usually it was approximated with cheaper operations since they were "good enough" for game purposes. You didn't mention your platform btw, if this is for something like a raspberry pi with limited CPU horsepower, maybe this could be a win.

It doesn't matter how big your LUT is, if the cache line you're looking up isn't hot in CPU cache, you will wait longer for the memory lookup than it takes to just calculate the sqrt on the fly. LUTs for expensive operations (sin,cos,tan) used to be common, but nowadays it's usually a pessimization. Even back in the day, sqrt was not often tabled. Usually it was approximated with cheaper operations since they were "good enough" for game purposes. You didn't mention your platform btw, if this is for something like a raspberry pi with limited CPU horsepower, maybe this could be a win.

In the event that I'm calculating tens of thousands of square roots per frame, could the initial hit of pulling that LUT into memory be offseted by the possible gain? If the size of the table is smaller, then do the chances of it staying in cache rise since it is vying for fewer resources?

Thanks for taking the time to respond.

Edit: my platform isn't limited (my game engine currently targets an i3 level processor).

An i3 is still a modern processor and cach misses supposedly cost much more than calculations.

Why dont you first do the simplest possible thing that works, then look if its really too slow, and in that case profile and optimize the real bottlenecks?

An i3 is still a modern processor and cach misses supposedly cost much more than calculations.
Why dont you first do the simplest possible thing that works, then look if its really too slow, and in that case profile and optimize the real bottlenecks?


It's like this site has an irrational fear of LUTs.

Maybe several thousand square roots being executed ~60 frames a second isn't the ideal candidate for LUTs.

I implemented an LUT table for exponential decay in a real atmospheric scattering demo I wrote years ago (early gen i7) to illuminate the exp() call being invoked several thousand times a frame, and my frame rate jumped drastically. I will pull that code out on my year old i3 laptop and retest, but I highly suspect the results will be the same.

Also, if my CPU is frequently paging out a table that is being addressed several 100 thousands of times a second , then I would think my CPU has bigger problems to deal with and this whole point is moot.

This topic is closed to new replies.

Advertisement