Speed of Euler, Verlet and RK4 with loads of objects

Started by
23 comments, last by Raghar 16 years, 9 months ago
Wow guys, thx for the answers!

taby: I heared about RK2 (midpoint) but haven't tested it yet.

As for the accuracy of simulation, since I'm going to be simulating sth more complicated than FPS shooter, that is: more advanced version of Gish, I do care about it, but since levels are going to be huge with lots of objects, I'm also very careful about speed. So far it looks that RK4 is out because of speed, and Euler is out because of accuracy. This leaves me with some variation of Verlet...

Numerical imprecision: I won't be doing astronomical calculations :-) first speed, then accuracy. Also, I can't do much about imprecision of floats (which is quite good, AFAIK from GPG #4).



raigan: I already did some research and the page you mentioned was one of the first I found :-) good stuff, I'll consider it. Do you have any other links to pages that show how to practically use NSV it in games?

And I'm also worried why the fps is so low, that's why I started this thread :-)



Mike2343: you're right, I'm using SDL which gives at best 1ms resolution. Some code:

 // populated by SDL_GetTicks void Update(float _frame_dt)    {     if (paused)      return;     ++frames; ++total_frames;     last_frame_index = this_frame_index;     this_frame_index += _frame_dt * scale;         delta_time = (this_frame_index - last_frame_index) / 1000.0f;       /*        Start frames at 0, and store the time. Each frame, put the frame counter up,        and check if the current time is the stored time + 1000.        When it is, set the frames count as current FPS, then reset it to 0 and put the  current time in the stored time.         FPS will update every second, showing the average FPS over the last one second.      */           if (this_frame_index >= (marker + 1000))      {       fps = frames;       frames = 0;       marker = static_cast<int>(this_frame_index);      }             }


Are you sure that 1ms is not enough? You see, I don't count the time before and after the loop and subtract it which would be obviously require nanosecond resolution, I'm only counting frames and for this task it should be sufficient.



My_Mind_Is_Going: yup, I know that RK4 is much more stable than Euler but I want to be able to throw at least a few dozens of objects at level (not to mention CPU particles), a number at which RK4 starts to perform at amazing speeds of seconds per frame...



Raghar:
Quote:BTW what's your timestep, how much copying are you doing, and where is that O(n^2) algorithm? 70 - 100 : 30 - 5 it's an increase of amount of object by half followed by decrease of speed by 6. It looks more like O(n^3) algorithm, even if I'd account for increased memory allocation.


You hit the nail right in the head! I really don't know what's causing it. I scanned through all the code and can't find anynthing that would look like O(n^2), let alone O(n^3). No copying done. Objects are allocated on heap and virtual methods are used for Update and Render (but that can't be that slow!)

Here's the zip with all src code (4 short files, 2 contain actual physics code) and compiled application. Maybe somebody could confirm that I'm right/wrong? (++karma :-)) You can also see for yourself whether the numbers I gave are accurate.

Download (1.7 MB zip, win32 exe)

Timestep = 0.01



jjd: I would like to know what a person with experience in physics field thinks about the thing: is it normal what is happening in my case...?

lonesock: thx for the tip, but jjd already corrected you :-) as for the ++ in preview window, they don't show up and that's normal(?) here on gamedev.
Advertisement
Quote:Original post by My_Mind_Is_Going
Quote:Original post by Raghar
A speed of a SGDI (or NSV) and a sixth order symplectic integrator are about... Sixth order integrator is seven times slower than SGDI (NSV, symplectic euler, or whatever you are calling that simple algorithm.).

So you might throw away RK4 from the window, it's crap anyway, and use sixth order integrator.

BTW what's your timestep, how much copying are you doing, and where is that O(n^2) algorithm? 70 - 100 : 30 - 5 it's an increase of amount of object by half followed by decrease of speed by 6. It looks more like O(n^3) algorithm, even if I'd account for increased memory allocation.


I'm just curious, but why do you say RK4 is crap? And I'm not real familiar with numerical methods as they apply specifically to games but I've never heard of using a 6th order method for, well anything really.


Because when I tried a symplectic sixth order integrator in a n-body problem it blowed to pieces incredibly quickly. (Basically it deviated from optimal solution at the exponential speed.) An eight order integrator will not fare much better, because the precision required to have full effect (on long term) is higher that dynamics of 64 FP numbers, a sixth order integrator is still able to work with 64 FP numbers nicely.

Now if you look at more common applications, these are either well solved by SGDI (NSV), or an sixth order symplectic integrator creates more smooth results than RK4 at a cost of a small increase in complexity. Basically if you would like to accept disadvantages of a fourth order integrator, you might just as well accept a symplectic sixth order integrator, these are quite close in speed.
I hope whatever it is you are doing is working well with Verlet. I have heard from other people as well that Verlet is useful for realtime integration of non-linear functions (ex: Newtonian gravitation, tire rotations per second).

I will try it out. Thank you for the code.

RK2 looks something like this:
// Where gravitation == acceleration.inline void proceed_rk2(const double &dt){	vector_3 k1_velocity = mercury.velocity;	vector_3 k1_acceleration = gravitation(mercury.position, k1_velocity);	vector_3 k2_velocity = mercury.velocity + k1_acceleration*dt*0.5;	vector_3 k2_acceleration = gravitation(mercury.position + k1_velocity*dt*0.5, k2_velocity);	mercury.velocity += k2_acceleration*dt;	mercury.position += k2_velocity*dt;}


[Edited by - taby on June 22, 2007 6:19:06 PM]
I see now what Gish is.

I think that the author perhaps modelled three spheres of increasing blurriness (ex: metaball), and then formed a mesh around it by using some isosurfacing algorithm like Marching Cubes. They then shaded it with an algorithm that accounts for specularity (ex: 2-tone toon shader).
Quote:Original post by Raghar
Quote:Original post by My_Mind_Is_Going
Quote:Original post by Raghar
A speed of a SGDI (or NSV) and a sixth order symplectic integrator are about... Sixth order integrator is seven times slower than SGDI (NSV, symplectic euler, or whatever you are calling that simple algorithm.).

So you might throw away RK4 from the window, it's crap anyway, and use sixth order integrator.


I'm just curious, but why do you say RK4 is crap? And I'm not real familiar with numerical methods as they apply specifically to games but I've never heard of using a 6th order method for, well anything really.


Because when I tried a symplectic sixth order integrator in a n-body problem it blowed to pieces incredibly quickly. (Basically it deviated from optimal solution at the exponential speed.) An eight order integrator will not fare much better, because the precision required to have full effect (on long term) is higher that dynamics of 64 FP numbers, a sixth order integrator is still able to work with 64 FP numbers nicely.


It really, really isn't simple as saying the higher order integrator is the thing that made your simulation stable, or that higher order is always going to improve stability or quality of results. Accuracy is not always associated with improved stability. Its great that this worked for you, and that you are pleased with the results. But that probably wasn't the only solution. A low-order fully implicit integrator also might have been more stable, or an explicit upwind integrator. Every integrator has certain stability characteristics that determine the growth or decay of error based on inputs, and the stability characteristics change depending on the underlying governing equations being integrated. The goal, of course, is to choose an integrator that, for your particular governing equations, does not result in unbounded growth of error.
Graham Rhodes Moderator, Math & Physics forum @ gamedev.net
Quite interesting discussion so far, however, still no one has directly adressed my question. Some folks who are quite advanced in physics have written here but gave no hints about what they think of the subject. This gives me feeling that for them it's so obvious that integrating such number of objects is slow, they find it useless to state it directly.


So, let me rephrase it a little.
If you think that moving on the screen few hundreds of objects without any collision tests whatsover and with minimal amount of other computations using simplest integration schemes should knock down today computers, there's no need to write here. Otherwise, please tell me that am I doing sth wrong since I feel a little stupid right now ("all those people know sth that I don't"). You don't even have to tell me WHAT am I doing wrong, just simple "You should be able to move 10x as much objects at 10x the FPS" will do, so I will know that I have to dig deeper for truth.



Quote:Original post by taby
I see now what Gish is.

I think that the author perhaps modelled three spheres of increasing blurriness (ex: metaball), and then formed a mesh around it by using some isosurfacing algorithm like Marching Cubes. They then shaded it with an algorithm that accounts for specularity (ex: 2-tone toon shader).


I recall reading that jelly body was done using particles connected by springs that are integrated using Verlet etc, just like presented here: http://www.gdmag.com/src/jun06.zip

Your approach also may work (heck, it may work even better) but IMHO it sounds like an overkill :-)

If you aren't doing any collision handling, and it is only simple time step integration of independent bodies that is affecting your frame rate (as it appears is the case from your description and sample code), then I would guess you are doing something wrong. You should be able to have a larger number of objects with no noticeable affect on frame rate. I would estimate you should be able to run some thousands of objects, as long as all you are doing is time integration with perhaps a constant gravitational force or some other independent and cheaply computed force per object.

So...I have a suspicion that part of the problem may not be physics at all. It may in fact be a bottleneck on the rendering side. My assumption here is that you're using a traditional graphics API (OpenGL or Direct3D), perhaps within a game engine that you either wrote yourself or that is open source. Performance of these traditional API's is heavily a function of geometry batching. How many draw calls do you have? If you essentially have one, or more independent meshes *per* object, e.g., # of graphics objects with an independent mesh == the # of physics objects, then that absolutely can cause your rendering to bottleneck at some point. And the # of objects won't be too large (a few hundred objects) before batching issues slow things down. It can be quite difficult to do batch management for modern graphics cards. But, I may be telling you something you already know, or providing not enough detail. At least perhaps it is a clue.

The batching issue is real, but doesn't necessarily explain why RK4 is 4x slower than Euler in terms of measured fps. The graphics rendering bits shouldn't be different beween RK4 and Euler. So, lets see if anything else might obviously be wrong. Looking at your code...you have a lot of lines like "Derivative2d a = Evaluate...". Here, you are creating an object on the stack, caling its constructor, then calling an assignment operator with the assigned value coming from another function call. So, three function calls and a stack allocation per Derivative2d object just to construct the temporary objects. And upon exiting the function, the destructor for the object is called and the item is popped off of the stack. So, in reality, for each of those objects, you have a minimum of 4 function calls (constructor, destructor, assignment, evaluate function), a stack push and pop. Just dealing with temporary objects. For RK4 you do all of this 4 times. So, 16 function calls just for the derivatives + 8 stack alloc/push and free/pop operations. Additionally, you allocate your derivate values dxdt, etc. on the stack for each time step. All of these object constructions/destructions, stack operations, function calls....take time and they alone can kill performance. Clean it up....keep some container class that is allocated once with all the deritive objects, temp values, etc., and reuse the preallocated objects or member variables for every time step until you are ready to stop the simulation altogether. You may very find that this is one of the root problems with the simple integration/no collision case.

To test whether the integration stuff is the real cause vs. batching, try running the sim with nothing being rendered for the physics objects. Then, once you've solved the pure physics stuff, go back and work the rendering side.

Please let me know if any of this actually does produce a beneficial gain in performance. Its a first step, and ultimately there can be memory cache hit/miss issues that creep in... I'd really like to hear how this works out!
Graham Rhodes Moderator, Math & Physics forum @ gamedev.net
By the way, let me be the first to apologize for going off on a tangent. When talking about something of interest, its easy to go with the flow. But it didn't help you.
Graham Rhodes Moderator, Math & Physics forum @ gamedev.net
Yeah, you should be able to have ~10000 objects at once even with RK4. On my Macbook Pro 2.16 Ghz I got about 5 fps for something like that. I disagree with grhodes_at_work's suggestions of function call/stack allocation optimizations. These are just microoptimizations that aren't going to cause the huge decrease in performace that you're seeing. Look for O(n^2) operations like another person said.

Another thing, are you compiling in release mode? That would definitely speed things up.
Please post the whole source. Point out areas that are suspect.

This topic is closed to new replies.

Advertisement