Sign in to follow this  
Necrolyte

Constant array declaration

Recommended Posts

I need 2 float arrays : sine[360] and cose[360]
The values of these arrays must be sine[x]=sin(x{degrees});
But I would like to save a small bit of performance by declaring them at the start(something like this: sine[360]={0.123,0.123,...} but that would take too much time to put in all the values manually. Anyway of doing this with preprocessor or something?

What I have right now is a loop at the start of my code which calculates sin and cos for every value and sets it.

Share this post


Link to post
Share on other sites
First, it sounds like you're trying to optimize a problem you don't have. Usually, the optimizing comes when you need it and know where the bottlenecks are.

However, if you don't want to enter the values manually (easily understood) generate a text file in a console app with the format you want and do an #include somewhere, or just a copy and paste. Take you maybe an hour.

You can get into recursive #defines but trouble-shooting those isn't worth the time.

Share this post


Link to post
Share on other sites
Quote:
Original post by Necrolyte
It takes a very short amount of time but any optimization is always good, and many small optimizations can add up and become significant.


Emphasis mine. I can't even begin to tell you how wrong that statement is. Perhaps the Wikipedia page on program optimization can provide you with some wisdom.

Share this post


Link to post
Share on other sites
Actually, this is the kind of "optimization" that can actually slow your code down. If you embed the data manually that information needs to be loaded from disk at runtime. Disk access speeds on modern hardware is several orders of magnitude slower than than processor speeds. For that matter, since processor speeds are significantly faster than memory speeds, using a lookup table in the first place may actually be slower than just doing the calculations.

Share this post


Link to post
Share on other sites
Quote:
Original post by alvaro
Quote:
Original post by Necrolyte
It takes a very short amount of time but any optimization is always good, and many small optimizations can add up and become significant.


Emphasis mine. I can't even begin to tell you how wrong that statement is. Perhaps the Wikipedia page on program optimization can provide you with some wisdom.


I read that part about readability. For my current problem doing what I want actually increases readability as it removes a loop from my code.



Also I did what buckeye suggested and my issue is solved

Share this post


Link to post
Share on other sites
Quote:
Original post by SiCrane
Actually, this is the kind of "optimization" that can actually slow your code down. If you embed the data manually that information needs to be loaded from disk at runtime. Disk access speeds on modern hardware is several orders of magnitude slower than than processor speeds. For that matter, since processor speeds are significantly faster than memory speeds, using a lookup table in the first place may actually be slower than just doing the calculations.


What is faster? Loading 2880(float=4 bytes,4*720=2880) bytes from the harddrive or:
loading the part of the code which sets the variables,
and 720 times: converting degrees to radians(1 multiplication) and calculating sin/cos



I'd tend to beleive that loading 2880 bytes is faster

Share this post


Link to post
Share on other sites
Quote:
Original post by Necrolyte
What is faster? Loading 2880(float=4 bytes,4*720=2880) bytes from the harddrive or:
loading the part of the code which sets the variables,
and 720 times: converting degrees to radians(1 multiplication) and calculating sin/cos



I'd tend to beleive that loading 2880 bytes is faster


It really doesn't matter. If you had a bug in the program that generated the included file (say, you used 3.14 for pi or you printed only a few significant digits, and that turns out to not be good enough), you'll have a really hard time finding and fixing the problem. You have made your program more complex without gaining anything measurable in return.

A loop is a lot more readable than 360 mysterious values whose origin is not documented anywhere, and its correctness is much more easily verified. You are just making your code worse.

This is the type of optimization that should only be done when your program turns out to be too slow and a profiler tells you that this is a place where some performance can be gained. Even then, you need to make sure you are actually gaining some performance that justifies the loss of clarity.

Share this post


Link to post
Share on other sites
After some testing I found I was saving 0.001 seconds by loading the variables rather than calculating them at startup! Timed with Code::Blocks.

I guess this isn't worth having an execuateable file 2kb larger

Share this post


Link to post
Share on other sites
Quote:

I'd tend to beleive that loading 2880 bytes is faster

Irrelevant; programming is not about faith. Precalculating a trig lookup table on modern machines is nowhere near the clear-cut advantage it was fifteen years ago. If you aren't profiling and basing your results on hard data and logical cost/benefit analysis, you're wrong.

Quote:

After some testing I found I was saving 0.001 seconds by loading the variables rather than calculating them at startup! Timed with Code::Blocks.

How did you time it? That's just as important as timing it at all. If you don't supply the detailed background behind the benchmark -- build tools, build configuration, code for the benchmark itself, et cetera, you could be doing something completely incorrectly and not know it.

Share this post


Link to post
Share on other sites
My computer benchmarks 63 million sin operations per second. (~2.7Ghz clock) At 4 bytes per result, that's 252,000,000 bytes of data calculated per second, which is more than the throughput of pretty much any hard drive on the market.

The first rule of optimisation is to profile first, then profile *again* to make sure you didn't make things worse. There's no way you're saving 0.001s on your operation when you only compute 720 values. Your timing code is wrong.

Share this post


Link to post
Share on other sites
Quote:
Original post by Necrolyte
but any optimization is always good
It's already been said, but this is extremely misguided. I doubt even hard-core, performance-obsessed 'low-level' coders would agree with you on this, and in any case it's completely contrary to what most developers would consider to be good practice.
Quote:
I read that part about readability. For my current problem doing what I want actually increases readability as it removes a loop from my code.
But it's less clear and readable than just using the sine and cosine functions that your language, language's standard library, or API of choice provides.
Quote:
After some testing I found I was saving 0.001 seconds by loading the variables rather than calculating them at startup!
In order for this result to have any meaning, you'd need to provide the info that jpetrie mentioned (timing method, build configuration, etc.).

Share this post


Link to post
Share on other sites
Quote:
After some testing I found I was saving 0.001 seconds by loading the variables rather than calculating them at startup!


All that for a millisecond? Now do you see why pointless optimizations are pointless?

Also, how did you measure that? I can't get 1ms accurate timing on my machine, so I can assume that your "perceived" decrease in time is due to margin of error.

A memory lookup table for Sin/Cos is not a performance improvement. You don't seem to think so, but it takes less time to calculate sin then it takes to retrieve the values from ram, maybe even from cache.

Share this post


Link to post
Share on other sites
15 or so years ago when my software renderer project was newish, I used fixed-point values with lookup tables for sin and cos. I even cleverly overlapped them so that it took only 0.25 times extra space to use a table for cos as well as sin. I also used a unit of angle with a power of two for a fuil circle. E.g. 1024 = full circle instead of 360, since in theory it was much faster to do (angle & 1023) than (angle % 1024).

Then several years later after all that "cleverness", reality hit me.
It wasn't actually faster than calling the built-in sin and cos functions! Not only that but the slight lack of accuracy turned out to occasionally be noticeable.

Who knows, maybe way back when I first did it, it may have actually been a tad faster. But I never profiled it back then so I guess I'll never know. One thing's for sure though, it's pretty much a waste of time doing it that way nowdays.
Learn from my mistake and don't waste your time going down that road.

Share this post


Link to post
Share on other sites
In the olden days, processors were slow, and memory was (comparatively) fast. Nowadays, memory is the bottleneck, so lookup tables simply are not a good idea anymore.

In fact, memory is so hideously slow these days we need several layers of caches. If you buy a $100 processor, $99 are spent on the clever cache parts, and only $1 on the actual computing part.

Herb Sutter's video Machine Architecture: Things Your Programming Language Never Told You has all the details.

Share this post


Link to post
Share on other sites
Quote:
Original post by DevFred
In the olden days, processors were slow, and memory was (comparatively) fast. Nowadays, memory is the bottleneck, so lookup tables simply are not a good idea anymore.

In fact, memory is so hideously slow these days we need several layers of caches. If you buy a $100 processor, $99 are spent on the clever cache parts, and only $1 on the actual computing part.

Herb Sutter's video Machine Architecture: Things Your Programming Language Never Told You has all the details.


Sometimes, LUTs still outperform live calculation, i.e. it can't be generalised whether they are worthy or not.

I remember having had performance gains for sine-tables of size 256 for an SSE heavy ray tracer, but the larger the table got, the more the advantage voided, and for large enough tables the effect was the opposite. Basically, one has two consider the resolution of the set and the computation time put into calculating set-members.

An example that ppl usually don't think about are lightmaps. One could calculate every sample live, but computation time is large enough to justify use of LUTs, i.e. lightmaps. (Simple) dot product diffuse lighting is not complex enough, therefore most applications don't use a LUT.

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this