Jump to content
  • Advertisement
Sign in to follow this  
TerrorFLOP

Optimization Puzzle

This topic is 4935 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

Dymtry... Thx for submitting a response which, although terse, is infinitely more useful to me... Especially on the subject of N-body theory (which I already implemented a while ago... but will look into short with)

However, I just want to sort a few things out before we proceed, 'cos the welcome here for new peeps (I sure ain't a newbie programmer), in these forums leaves a LOT to be desired... Hence the moderators appeal for rationality.

I deal with your points in sequential order:

Mate (sorry English / Australian expression there); what makes you think I DON'T test my code??? I've been testing my code for a couple of days now... Look, the problem I have is that I got code timing routines I wrote years ago. I'm ALWAYS testing code 'cos I like to optimize. Having got the time critical section of my code finished, I was concerned about the 'slowness' mention in several books I've read on the 'sqrt' function. I was curious to test this hypothesis and proceeded to time the 'sqrt' function... But to no avail... As it appears the complier... as usual... worked its magic behind the scenes so that I couldn’t get an accurate test. Obviously I need to test ‘sqrt’ against other types of code... Hence my SIMD function.

And something sure doesn’t add up if I get results that claim 'sqrt' has the SAME speed as a null function... Perhaps it does... But I just wanted to check its actual speed... No harm in experimentation and testing... I generally do that most times I code... It’s MY habit and I don't expect you to understand... But help.... If you can, in answering some of my queries...

We're all individuals here mate... You don't like my style... or my double lines…


TOUGH!!! LOL


Next point... I signed up to this site some years ago... But this is the first time I've used in a LONG while... And my my, how things have changed (for the worse it appears)... Not very friendly for new peeps looking for help... Go to the correct forum you say? Well, I have a complier query but it’s tied in with other queries about optimizations in general... And since I'm doing a Physics project... Ah well.. You do the math... Perhaps I should have split up my problem into nice neat little bits and distribute them ALL over the site... Taking care to put each query into its right cubby-hole of a forum... I didn’t realise I broke the MYSTIC oath. Opps... Sorry about that!


Mmmm... Isn't simplicity better when asking a question??? I could bamboozle you with something you are not familiar with... Who knows... Perhaps I could give you all my source code and you can take over the project for me or something (over my dead body LOL)... No... I asked a specific question, based on optimization in general... But I wanted to eliminate some strange results I got in some code timings (blah blah blah... as outlined above... don't want to go over this ancient ground ever again) and I thought perhaps the complier was to blame as it re-wrote my code. I'm a straight talker and I like straight answers.... Like... 'Sure Terror, in order to disable the loop invariant coding optimizations that the complier performs, thus skewing your results... You could do A, B or....' Catch my drift?

And please don't assume ALL new guys are ALL newbie programmers... Using radius squares in collision detection is old news to me... I just AIN'T doing collision detections... At least not yet...

KK... Now on to the HELPFUL part of your post... One step ahead of you there mate... But its good to see your math is sharp... Here are the core math formulae which update the positions and velocities of N objects:

Image Hosted by ImageShack.us

Apart from the sigma terms... And the fact you calculated the forces instead; we're pretty much in agreement...

As I previously mention, I've excluded self interactions, and I did consider caching the results of each calcaulation due to the symmetry of forces between any pair of particles.

That’s one idea I might look at again... I didn't implement it before 'cos I wanted to get the main body of the code done... But I'm glad you mentioned it... It’s definitely room for possible improvement.

As for you last idea.... Sure... If this was some kind of 2D game or approximate simulation... But I want it to be both fast AND accurate... Tall order I know... But accuracy wins out here... It’s meant to be a ‘realistic’ simulator... Of sorts.

KK Dymtry... Thx for your helpful input this time around...

Next time, don't give new peeps a hard time KK?

You were NEW yourself once.

Thx Again!!!

Share this post


Link to post
Share on other sites
Advertisement
Np Sneftel...

My mum always said stand up to and face down aggressive peeps...

So thats what I do LOL!!!

Can't break the habit of a lifetime... But sure I'll watch my tone...

Would be better if SOME peeps give new guys a break...

After reading the moderators guildlines which I TOTALLY support 100%, I thought things had changed for the better...

Jury's out on that one...

Hugs You Back!

Share this post


Link to post
Share on other sites
Quote:
Original post by TerrorFLOP
Eelcho...

As I said I'm doing a simulation based on star clusters... I.e. galactic formations.

The numbers DO get big believe me... Especially in terms of MASS and DISTANCE.

I did write my code using floats, but anomalies crept in due to the floats 'limited' range:

1.2e-38 to 3.4e38

Doubles on the other hand were designed for high precision maths, especially for use in scientific applications. Here I can let my simulation run for LONG periods of time as the range and accuracy of calculations are vastly improved:

Range of doubles: (in addition to double precision)

2.2e-308 to 1.8e308

i know all that. but assuming you take a unit to be a meter (which would be an unoptimal choice, since e-38 meter is of no interest to this simulation), you could still cover a range of e38 meter. thats e22 lightyear. quite a few orders of magnitude more than the visible universe streches.

if you really cared about precision, youd use fixed point math.

but its not that important really.

Quote:

The application runs fine using doubles, able to simulate the gravitational (or electrical if one so wishes) interactions of up to 10,000 objects simultaneously.

One CAN go beyond 10,000... But even on a release build, the render rate slows down drastically... Not surprising really, since 10,000 objects equates to around 99,990,000 calculations per render (when self interactions are excluded).

The simulation runs fine... I'm using point pixels to simulate the objects for now (and for speed tests) and, depending on the objects initial conditions, it can simulate spiral galactic formations, star system formations in which lighter objects orbit a heavier one... And, if you really wanna crank up the mass on a particular object, you can even simulate crude accretion disks of black holes.

Don't wanna drag on... But I plan to add further features such as bounding the area the objects move in... Thus simulating bounded particles in a box (assuming the objects masses are set to the atomic scale and the force constant is altered accordingly)

One can even change the polarity of the force...

Anyway... getting ahead of myself here...

It’s all done in DirectX for speed considerations, but I'm a greedy programmer who’s always seeking more speed if possible (speed is good).

So, in order to test speeds, I need to time code. In order to time code, I need my complier to behave during the timing process, so accurate readings can be recorded.

ok, sounds nice and all, test however much you want, but i still dont think you can beat the standard sqrt by a significant amount. it might be slow, but not intentionally so.

Quote:

I’ve timed plenty of code using timing algorithms I wrote years ago, but in my interest to test just how slow ‘sqrt’ is, I get erroneous results, even though plenty of books... Perhaps you may have read some of them... Perhaps not, say that 'sqrt' IS a slow function...

It’s NOT an emergency... The code works (despite the fact that I don't have a supercomputer)... I'm just in the stages of tweaking the math so that it runs optimal.

In order to TEST optimizations... I need to suppress certain features on my complier.

I have a few books on algorithms which is a topic unto itself… But as yet I am unsure as to turn a O(N^2) algorithm into a much desired O[N*log(N)] algorithm. All objects MUST interact with each other in order to realistically simulate the effects of a force field…

Well... That’s my physicist brain talking... But if you have any ideas… Plz let me know.

there is no magic recipy for turning an N^2 algo into an NlogN one. however, there has been quite some research going on into turing the N-body problem into an NlogN algorithm. basically, it involves building a subdivion tree of your universe, and finding out where it is appropriate to lump an entire branch of said tree into one mass. dont dismiss this as inaccurate: you can balance speed and accuracy however you want, and if you pick it conservatively the error will be neglegible compared to that of your integration and calculation roundoff, whilst still speeding up your simulation like crazy.

Share this post


Link to post
Share on other sites
You will find people suggesting using squared distance to anyone on any forum anywhere, unless you have already stated that you know about such things and that they wont help you in this case. The least you can do is politely explain that it will not help you, which we couldn't have known.

I think the problem is that you don't know how to do benchmarking properly. All you need to do is actually use the results of the calculations in order to make sure that those calculations are actually performed.
For example, each iteration of the loop generate random doubles in your vector, then call 'Norm', and add the result to some variable e.g. 'total', declared outside of the loop. Last of all, to make sure it can't eliminate the whole thing, cout 'total'.
The compiler should be set to release build with all optimisations on.

Share this post


Link to post
Share on other sites
Way Walker... Thx for your post.

You've put forward an interesting idea in letting my simulator run for days...

I've let it run for hours... To come back and see stable solar systems had formed out of a jumble of chaotic activity...

But for DAYS.... Sounds like a damn good idea... And as I'll only need to cache pixel plots... say in a long data type (4 bytes only).

However, I just done a rough calculation in that a render speed of 10,000 stars per second (which is about what I'm getting on my 3GHz CPU with Radeon 9800 Pro), equates to a colossal ~3.22 Gigs of HD per DAY....

Which, I suppose, is NOT bad... Not bad at all... Thx for the idea LOL!!!

Yeah I'm sticking with 'sqrt' for now... The books which claimed 'sqrt' was way too slow are a couple of years old... Which I suppose is OLD for books on coding... Thinks its best I'd check out some new material... Perhaps MS have implemented 'sqrt' to its optimum... Still can't quite believe it’s as fast as a null tough LOL.

Thx to dymtry, I got some ideas on optimizing the n-body problem. If you know of any books that might be of some use... Then plz let me know.

I have one... Buts it’s a whole damn course on algorithms... In 6 parts spread over two LARGE books. Guess I'm looking for a quick fix LOL...

Thx!

Share this post


Link to post
Share on other sites
Eelco...

Slight over sight... I was thinking more about MASS than distance...

And I guess I wanted the flexibility of being able to use obscene masses that are said to be found in galactic cores as massive black holes (known to have masses which range in millions of solar masses).

As I said... I DID go with floats and I had a run of overflows...

Sure I could have toned down the masses perhaps... But I think I was getting precision anomalies from that dreaded 'sqrt' function. It uses doubles and I know converting doubles to floats can be not only inefficient but precision losses out too.

No... It’s primarily a scientific endeavour and doubles are the preferred data type.

Well... Going to get some new material on optimization... Nothing wrong in trying to optimize code. It's a challenge... But given today's CPU's a dam hard one LOL.

Besides... optimization isn't simply ALL about asm.... C / C++ have a LOT of do's and don'ts.... I know quite a few... But that’s the thing about these languages.. You NEVER stop learning them.. Even after many years.

Looking into N body situations now... There IS room for one or two more optimizations.

Share this post


Link to post
Share on other sites
iMalc

Did that trick mate... The complier doesn't stand for it... It still optimizes the actual test code by moving it OUTSIDE the main test loop.

And as I said earlier... A specific answer to a specific problem if you please.

Of course you have all optimizations on in your final build... I did not dispute that... I'm testing code... There is a difference.

Share this post


Link to post
Share on other sites
TerrorFLOP: I think you'd get an enormous speedup if you used the compiler in the free Microsoft Visual C++ Toolkit 2003 or even gcc instead of VS6. VS6 is just plain outdated, and newer compilers are much better than older ones at optimization.


Also, I think you misunderstood Dmytry. All he was saying is that timing made-for-benchmarking sections of code isn't going to tell you what you should change to improve the speed of your program.

To time the difference sqrt makes, for example, you can't just do it a bunch in a loop. That might tell you if one takes less clock cycles, but it won't tell you if it will siginificantly speed up your program.

You need to profile your program. If the profiler says sqrt is taking up more time than anything else, THEN you know it might be worth it to try different approximations of sqrt. Make your own sqrt, put it in your program, and profile again. Is something else the bottle neck now or did you at least make sqrt a significantly smaller portion of your programs execution time? If not, you didn't make much of a difference. Repeat as needed to get rid of all the major bottlenecks.

Share this post


Link to post
Share on other sites
here is a starting point if you feel like improving your algorithm:

http://www.cs.hut.fi/~ctl/NBody.pdf

Share this post


Link to post
Share on other sites
Quote:
Original post by TerrorFLOP
iMalc

Did that trick mate... The complier doesn't stand for it... It still optimizes the actual test code by moving it OUTSIDE the main test loop.

And as I said earlier... A specific answer to a specific problem if you please.

Of course you have all optimizations on in your final build... I did not dispute that... I'm testing code... There is a difference.
So sorry to be disagreeing with you, but you didn't. I mean you can't have - It's not possible. I will perform the exact benchmark I described here in VS6 and it works. I do know exactly how to benchmark properly...

void sqrtTest()
{
double total = 0;
srand(time(0));
for (int i=0; i < 100000; ++i)
{
double x = rand() / 32767.0, y = rand() / 32767.0; // ***
total += sqrt(x * x + y * y); // ***
}
printf("%f\n", total);
}
Granted I don't have your member function call inside the loop here, but that works in your favour, not mine.

If you work backwards from what I described, the resulting value of total that is outputted must be calculated by summing together the results of all the calls to Norm as described, thus it cannot remove the calls to Norm. It can inline them as it pleases, but the code in them must run in order to produce the correct result. Also, because the random number generator is seeded using the current time (ok I didn't mention that, but it's obvious) the result that is output cannot be predetermined. So as you see, the value of total output depends upon all of the code as written, and the time that the code is run. It isn't logically possible for it to optimise out the square roots, and it doesn't.

Time taken when removing lines marked *** -> 0.203ms
Time taken as is -> 2.199ms, or about 10 times longer.
Just taking the sqrt out of this gives me 1.719ms, so you're looking at around half of the time taken here inside the sqrts.

As a side note, the VS6 compiler is crap. It has crap support for STL and templates in general, is obsolete, and as of about right now is no longer supported by Microsoft. I also know that it's sqrt is indeed quite slow, as I use it regularly.

I don't mean to sound harsh saying any of that, but lets not patronise each other either eh[wink].

Share this post


Link to post
Share on other sites
Sign in to follow this  

  • Advertisement
×

Important Information

By using GameDev.net, you agree to our community Guidelines, Terms of Use, and Privacy Policy.

GameDev.net is your game development community. Create an account for your GameDev Portfolio and participate in the largest developer community in the games industry.

Sign me up!