Finding the optimal domain for piecewise function

Math and Physics Programming

Started by fais January 12, 2017 10:06 AM

3 comments, last by fais 7 years, 3 months ago

462

Author

January 12, 2017 10:06 AM

So the motivation behind my question comes from my attempt at creating a fairly small 2d lookup table for the square-root function (with a small domain - from 0 to ~1.5), assumption being that by optimizing my sub-domain for my piecewise function, I can maximize my accuracy.

Having covered combinatorics in one of my classes years ago, i got to thinking on how a 2d table can essentially functions as a numeral system, and that if you have a function f(x), how can one extract a numeral system to serve that function better (somewhat akin to how factoradics are adapted to numbering permutations)?

Also, how do I determine if a LUT to big, that memory lookup costs exceed fast interative approximations. Would a "float sqrt_tab[4096]" terribly inefficient? Not looking for anything more than a heuristic because I understand every system is different and programs are definitely not compiled equally.

Edit: more importantly, is it possible to get more accuracy out of something like a smart "float sqrt_tab_pw[8][8]" than the naive, fixed interval approach like "float sqrt_tab[1000]".

y2kiah

2,032

January 12, 2017 02:39 PM

It doesn't matter how big your LUT is, if the cache line you're looking up isn't hot in CPU cache, you will wait longer for the memory lookup than it takes to just calculate the sqrt on the fly. LUTs for expensive operations (sin,cos,tan) used to be common, but nowadays it's usually a pessimization. Even back in the day, sqrt was not often tabled. Usually it was approximated with cheaper operations since they were "good enough" for game purposes. You didn't mention your platform btw, if this is for something like a raspberry pi with limited CPU horsepower, maybe this could be a win.

http://jeffkiah.com

fais

462

Author

January 13, 2017 04:12 AM

It doesn't matter how big your LUT is, if the cache line you're looking up isn't hot in CPU cache, you will wait longer for the memory lookup than it takes to just calculate the sqrt on the fly. LUTs for expensive operations (sin,cos,tan) used to be common, but nowadays it's usually a pessimization. Even back in the day, sqrt was not often tabled. Usually it was approximated with cheaper operations since they were "good enough" for game purposes. You didn't mention your platform btw, if this is for something like a raspberry pi with limited CPU horsepower, maybe this could be a win.

In the event that I'm calculating tens of thousands of square roots per frame, could the initial hit of pulling that LUT into memory be offseted by the possible gain? If the size of the table is smaller, then do the chances of it staying in cache rise since it is vying for fewer resources?

Thanks for taking the time to respond.

Edit: my platform isn't limited (my game engine currently targets an i3 level processor).

wintertime

4,154

January 13, 2017 10:17 AM

An i3 is still a modern processor and cach misses supposedly cost much more than calculations.

Why dont you first do the simplest possible thing that works, then look if its really too slow, and in that case profile and optimize the real bottlenecks?

fais

462

Author

January 13, 2017 12:26 PM

An i3 is still a modern processor and cach misses supposedly cost much more than calculations.
Why dont you first do the simplest possible thing that works, then look if its really too slow, and in that case profile and optimize the real bottlenecks?

It's like this site has an irrational fear of LUTs.

Maybe several thousand square roots being executed ~60 frames a second isn't the ideal candidate for LUTs.

I implemented an LUT table for exponential decay in a real atmospheric scattering demo I wrote years ago (early gen i7) to illuminate the exp() call being invoked several thousand times a frame, and my frame rate jumped drastically. I will pull that code out on my year old i3 laptop and retest, but I highly suspect the results will be the same.

Also, if my CPU is frequently paging out a table that is being addressed several 100 thousands of times a second , then I would think my CPU has bigger problems to deal with and this whole point is moot.

Finding the optimal domain for piecewise function

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

Finding the optimal domain for piecewise function

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

Reticulating splines