• entries
    232
  • comments
    1462
  • views
    956650

Planetary engine, Part VIII and Motion blur

Sign in to follow this  
Ysaneya

675 views

Performance optimizations

I spent some time to add some classes to the code to handle multi-threading tasks. This allows to scale performance-hungry code to any number of cores / cpus. Of course, this optimization can only apply to code that is easily parallelizable, for example heavy processing on sequential or independant data.

One area that benefits a lot from multi-threading is the terrain patch generation. A terrain patch is made of NxN vertices, with N = 2^i + 1 ( i is an integer ). The default is 33x33 vertices.

When the quadtree gets subdivided ( you zoom on the terrain ), 4 child terrain patches are created: that's a total of 33x33x4 vertices ( 4356 ) to generate. Each vertex requires various processing, listed here by decreasing order of complexity.

- generating the height value by evaluating many procedural noise functions ( by far the slowest ).
- generating two displacement values for the cliff/overhangs effect
- generating a normal at this vertex.
- generating the vertex attributes such as position, tangent space and texture coordinates.

The first two tasks are the important ones ( as far as performance goes ). The height generation requires, for each vertex, to evaluate 3 or 4 procedural functions ( typically, ridged fbm, a few fractals, terrace effect, maths operations ). Because you need enough details when you descend at ground level, I'm forced to use a high amount of octaves in those procedurals ( the main one, the ridged fbm, is 16 octaves ). Counting the total, I'm probably calling the 3D perlin noise basis 30-40 times per vertex, and that's still excluding all the intermediary maths operations which add some serious overhead.

The displacement for cliffs/overhangs require evaluating two fractals that are 7-8 octaves each, so it's also pretty performance-hungry.

The net result is that on complex planets, when you are moving around and a node gets subdivided in its 4 children, you see a small slowdown. It's very short ( like 20 milliseconds ), but long enough to make the camera motion feel unsmooth. This problem cannot be fixed due to the nature of procedurals ( else I'd go back to diamond-square and have boring planets but okay performance ) but it can be lessened with various tricks, and one of them is this multi-threading.

I have a quad-core Q6600 at home, it's good to know that it'll eventually be used in something :)

Another idea to improve performance is to write SSE-optimized noise generation code, but it's a pretty low-level optimization, so I'd rather do that as a "last resort" solution.

About procedural details

The good thing with using procedural generation is that you are more future-proof than other engines/games.

What I mean is this: just by changing a detail level in the config dialogs, you will be able to increase the quality and details to extreme levels. No computer that exists today has enough memory, or video cards and cpus good enough to make it run on the highest detail levels, simply because there's no hard limit to how far those details can go.

Case in point: at the moment the terrain patches are 33x33 vertices; that's the default for medium to high-end gaming computers, for example AMD FX/Intel Core2Duos on a NV7800, NV8800 or ATIX1800.

If your computer is good enough, you can try to increase the details by 4: terrain patches of 65x65. I tried on my Quad-core machine ( Q6600 with a 8800 GTX ), and as expected it slowed down quite a lot, but it was still real-time ( 40-50 fps ). Compare the screenshots below, and notice how the silhouette of mountains, and the details in lighting / shadows are better in the bottom version.

The 65x65 version almost brings my Q6600+8800 to its knees. But it doesn't stop here. Although I haven't been able to see it myself, I'm guessing the best computers out here, with 4 gigs of ram, extreme quad-core cpus and a 8800 ultra, would be able to run a 129x129 version at low framerates. Of course that wouldn't be good enough to play comfortably, but..

Imagine in a few years. The graphical details will scale to your new computers/video cards.

33x33 terrain patches:




65x65 terrain patches:




Planetary surface types

I've cleaned the code a bit to support multiple planetary types ( I was previously experimenting everything in the same class, commenting / uncommenting pieces of code ). Now I do have some factories for different planet types with different procedural noise functions.

The first one, which I've already presented in depth in the previous updates, are what I described as a "Gaian" planet.

I added a "Desert" planet type, and a "Selenian" planet type. The desert one is based on ridged perlin noise to generate sand dunes. It's extremely boring, so I need to work on it a bit before presenting it ( plus texturing will be important ).

The "Selenian" type is basically a planetary surface based on craters. Lots of them. To implement that, I created a new type of noise which I called "spherical", and which is based on the idea on Voronoi diagrams. A set of spheres are pre-generated, randomly placed in a cube, and when I evaluate the noise at run-time, I calculate the distance between the position and each sphere, and based on the distance, I generate a value that looks like the one of a crater.

It works pretty well, although it's quite slow, but it's no big deal, as there is lots of room to optimize this "spherical" noise.

Why spherical by the way ? Craters are 2D, but the noise basis must be 3D so that you don't get strange seams on the planet ( since all the noise functions are based on the position of the vertex on the planet sphere surface ).

While playing with craters, I found an interesting place, that I didn't know could even exist. It's basically some small craters that got displaced by some extreme cliffs/overhangs. It's hard to see in still pictures, but when you're exploring it really in 3D, it looks surreal. Like alien rock formations.

I love getting surprised by my own creations :)





Motion blur

Last but not least, I spent some time to add a render pipe for motion blur. The basic motion blur technique is rendering the scene to a texture and alpha-blending it with the previous frame, but it's tricky and doesn't look very good ( but it's extremely fast, so I might fall back on that method on computers that have performance problems ).

What I implemented is "true" motion blur: each object gets transformed both by the current frame's transformation matrices and by the old frame's, and a velocity is computed in eye-space. This velocity is stored in a velocity texture. The scene is also rendered into a texture. In post-processing, each pixel samples the velocity texture, and blurs the scene texture N times ( in my current shader, N = 32 ) in the direction of the velocity vector.

The results are pretty good in motion, but show lots of artifacts on screenshots. One of the problem is that no blur is happening outside the silhouette of a moving object, the so-called "occlusion" problem.

In OpenGL shader tricks GDC 2003, Nvidia describes a "solution" to that problem by computing in the shader the dot product between the normal and the motion vector, and using as a vertex either the one of the current frame ( angle > 0 ), either the one of the previous frame ( angle < 0 ). The idea is to extrude the object in the inverse direction of the motion.

I have experimented that idea, but unfortunately it creates more artifacts than it solves.

Project offset apparently uses a similar technique but has found a way to reduce / remove all those artifacts. If anybody has an idea about what they did to solve it, I'm interested.

Because the motion blur requires rendering the scene to a separate velocity texture ( a second pass ), and because there's already 200-300K triangles per frame, there's a serious slowdown ( 50% performance hit ) to enabling true motion blur.







Sign in to follow this  


12 Comments


Recommended Comments

Quote:
Original post by Ysaneya

Another idea to improve performance is to write SSE-optimized noise generation code, but it's a pretty low-level optimization, so I'd rather do that as a "last resort" solution.



I would recommend you doing this and not leaving it as a last resort. I wrote a fractal SSE-simplex noise function. In the worst case it was slower (for 1 iteration) by a couple of percent, but for every other case (more than 1 iteration) it was faster. Over 4 iterations and it had easily doubled in speed.

Share this comment


Link to comment
Quote:
Imagine in a few years. The graphical details will scale to your new computers/video cards.
Very interesting.

I'd be very interested in any thoughts you might have on which component will allow the most improvements. For example, moving from 2->4->8->16 core CPU's or is it the amount of [V]RAM that'll have most impact?

Myself I'm wondering if there'll be a point in the near future that the memory bandwidth will be the real problem. Having many powerful cores churning through huge amounts of data with potentially very different access patterns could easily saturate a weak memory interface [oh]

Keep up the good work!
Jack

Share this comment


Link to comment
This might come off as a bit naive, but the occlusion problem might be mitigated by blurring the velocity texture a little. Some velocity would 'bleed' to the area outside the silhouette of a moving object. It makes sense, but I can't be sure if the effect would be noticeable without experimenting.

Share this comment


Link to comment
Impressive, as always. That is the best motion blur I've seen.

Have you check this post?

There somebody explains the technologies used by Capcom's engine, and there is a place where they talk about Motion Blur. I think is interesting because in the screenshots the Motion Blur looks really shitty but in the game you never realize, so dont worry about how it looks in a screenshot.

Share this comment


Link to comment
Quote:
Original post by jollyjeffers
I'd be very interested in any thoughts you might have on which component will allow the most improvements. For example, moving from 2->4->8->16 core CPU's or is it the amount of [V]RAM that'll have most impact?

Myself I'm wondering if there'll be a point in the near future that the memory bandwidth will be the real problem. Having many powerful cores churning through huge amounts of data with potentially very different access patterns could easily saturate a weak memory interface [oh]


The amount of VRAM as well as the on-board bandwidth is continuously increasing. The PCI Express bandwidth could become a bottleneck, but I doubt so; the tendency is to move as much as possible to video ram, doing render to vertex buffers, render to textures, procedurally generating textures in shaders, etc.. which means that the bus bandwidth requirements aren't likely to grow up too quickly.

I think the bottleneck in a few years will be similar to what we find today: fillrate (shading), and mostly, the CPU and system ram. So using more cores is definately a good thing, especially when you see the roadmaps of Intel / AMD.

Share this comment


Link to comment
Quote:
Original post by Jotaf
This might come off as a bit naive, but the occlusion problem might be mitigated by blurring the velocity texture a little. Some velocity would 'bleed' to the area outside the silhouette of a moving object. It makes sense, but I can't be sure if the effect would be noticeable without experimenting.


I was thinking to that yesterday, and I think it could indeed work (or at least help a bit). But to be honest, the main thing that worries me in motion blur is not its quality (which is not perfect, but as it has been said, in motion the artifacts are not really visible, it's only a problem on screenshots), but it's its performance. Rendering a second pass into the velocity buffer is a huge performance hit, it's stressing the setup and vertex units as well as the cpu a lot.

The next idea would be to do something based on multi-render targets, but since the scene texture and the velocity texture have different pixel formats and the vertices are transformed by different matrices, I don't really see how you'd escape to render in 2 passes..

Share this comment


Link to comment
Quote:
Original post by aiforge
do you use threadpools for the parallel stuff? (thread-creation is very time consuming (more than a pool :-]))


Yes. I made many performance measurements, and the overhead of starting a task in a main thread, delegating the work to 4 worker threads, and waiting for the results to come back, is incredible small: around 0.02 *milliseconds*. So any task that can be parallelized and that takes more than 0.02 ms can benefit from parallel calculations.

Share this comment


Link to comment
Quote:
Original post by tamat
Impressive, as always. That is the best motion blur I've seen.

Have you check this post?

There somebody explains the technologies used by Capcom's engine, and there is a place where they talk about Motion Blur. I think is interesting because in the screenshots the Motion Blur looks really shitty but in the game you never realize, so dont worry about how it looks in a screenshot.


Thanks for the link. They are refering to the same presentation than I, the OpenGL shader tricks from gdc 2003. I did however notice an interesting part, how they only rendered the close objects to the velocity buffer, because even when moving fast, far away objects will not get any motion blur on their own, but only the one due to the camera movements.. and that can be done by rendering a "sky box" in the velocity buffer.

It could help to increase performance quite a lot. Unfortunately, no real solution for the artifacts, even their pictures show them.

Share this comment


Link to comment
Quote:
Original post by YsaneyaI did however notice an interesting part, how they only rendered the close objects to the velocity buffer, because even when moving fast, far away objects will not get any motion blur on their own, but only the one due to the camera movements.. and that can be done by rendering a "sky box" in the velocity buffer.


Good catch, that's a great idea.

Quote:
The net result is that on complex planets, when you are moving around and a node gets subdivided in its 4 children, you see a small slowdown. It's very short ( like 20 milliseconds ), but long enough to make the camera motion feel unsmooth.


If you have a way of predicting where the camera is going (since your camera has inertia it should be possible) why couldn't you start subdividing early and do it over several frames?

Share this comment


Link to comment
This type of motion blur is something I use in my engine as well, and although the performance hit is high, and the effect subtle, it just adds such a nice touch. It looks great in your game -- really makes you feel like you're whizzing by those asteroids!

(sorry, meant to post this comment in your 8/25/2007 5:25:30 PM entry)

Share this comment


Link to comment

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now