 Shadowing part IV |
Posted - 10/22/2007 5:18:47 AM | Shadowing, part IV
In the previous shadows update I was explaining how I experimented LiSPSM and trapezoidal shadow maps. Both were a bit disapointing due to the quality change based on where the camera is looking at ( dual frustum case ).
I decided to do a step back and to go for parallel-split shadow maps (PSSM), which are very similar to my first implementation, cascaded shadow maps. In PSSM, the camera frustum is split in N depth sections, and each section gets it own shadow map. Here's a small diagram to explain it:

It offers twice the resolution compared to cascaded shadow maps; the reason is that the shadow frustums are tightly placed on the camera frustum, while in cascaded shadow maps, the frustums were *centered* on the camera origin. Performance is similar.
My version of PSSM is a bit simplified compared to the original algorithm: the distances at which to split the shadow frustums are constant and not determined dynamically. So far I haven't felt the need for it.
The good news is.. with this implementation, I'm finally getting close to the point where I'll call myself "satisfied". In particular, I can now get shadows on small ships, plus shadows on terrain at a long distance. I reached my target of 64 Km for the maximum distance at which shadows can be seen ( using 4 splits ). Pics of the shadowed terrain will come in a future update.
Of course, to get shadows at 64 Km, you'll need a monster PC. I currently "only" get 40 fps on my 8800 GTX, although there's still room for optimizations. Apparently I'm still heavily CPU limited. The bottleneck is, to do atmospheric scattering, shadowing etc.. I need to set for each object tons of shader constants ( plus for precision issues, rendering in camera space requires to compute many matrices per object ), and with the terrain LOD there can be up to a thousand objects in view. Since for shadow mapping the scene has to be rendered for each of the 4 splits into a depth texture, it's rendering many thousand objects per frame. Serious overhead.
Sovereign
WhiteSpade01 sent an update of the Sovereign ( a 650m long ship ), textured with UV tiling and SpAce's texture pack. With the new shadows, I couldn't resist taking a few screenshots. The shader is still basic and only features diffuse, bump, specular, emission and shadows. It lacks the custom/dirt maps as well as the anisotropic lighting ( giving the metallic look of ASEToBin ). No HDRI/bloom yet.






| |
| Saturday, October 6, 2007 |
 Collisions streaming and optimizations |
Posted - 10/6/2007 5:53:22 PM | No eye-candy today, sorry folks!
But still an interesting week. I've been extremely busy on the physics and collisions streaming I quickly mentionned in the previous updates.
Collision data streaming
But first, a bit of background.
I-Novae extensively uses the concept of abstract interfaces, which are then implemented in one or many DLLs.
The physics / collisions system follows the same principle. A CPhysics interface that provides classes and functions for physics objects ( box, spheres, triangle meshes, joints, etc.. ). An implementation of this interface is currently done by using ODE, the open-source physics library.
The first implementation of CPhysics, called CODEPhysics, was a pretty basic one. Not much to say really, it was just calling the corresponding ODE functions for all the physics objects and associated functionalities.
Triangle meshes are an interesting beast. To perform fast collision tests, ODE creates on the fly a data structure ( typically a Kd-tree or a bounding-volumes tree ) that stores references to the triangles of the mesh. When a collision test is done, the nodes of the tree are tested first, and a whole branch of the tree can be discarded quickly if the collision is negative on the node.
The important point here is "on the fly". As any programmer knows, building trees is not particularly fast ( all things being relative ), and usually all algorithms that use trees try to pre-process as much as possible. But there's no mechanism in ODE to per-process a triangle mesh for collisions, and even worse, if it had a mechanism for that I couldn't use it for the planetary terrain chunks, since they're also built on the fly ( procedurally ) !
When I frst implemented collision detection for the planets, like in the Motion blur & physics video, performance was quite abysmal. A single terrain chunk is around 512 triangles. Generating the collision data for it could take 10 to 20 milliseconds !
10 ms doesn't sound bad, but it really is. The first reason is an indirect one: when a terrain node gets split, it's split into 4 children at the same time. Since each chunk needs its collision data, that means you have a slowdown of 10 * 4 = 40 ms at once. The second reason is that when a chunk gets split, the chunk data needs to be procedurally generated, and that's already a lot of work. Tens of ms too. So in the end, you get a short slowdown leading to very unstable / uncomfortable framerates.
The slowdown incurred by the generation of terrain data cannot be avoided. But there's a solution for collision data, and that's "streaming".
I use the word "streaming" pretty loosely here. By that I mean that the collision data is delayed to when it's really necessary, and not immediate. Think about it: the camera is moving, then suddenly a terrain chunk that is 10 Km away gets split in 4. There's no need to generate collision data for it immediately, the camera is 10 Km away after all.
So I introduced a new concept in my physics interface: a streaming mesh body. It's like a mesh body, it has an associated mesh, but it also contains a boolean indicating if the collision data has already been built or not. It also contains a bounding box. A dynamic octree is generated to store the bounding boxes of all streaming mesh bodies, and when a collision test has to be done, the physics library does overlap tests between the bounding boxes to determine which pairs of objects potentially collide. Then, once a pair is found, the collision data is generated.
Now, when my node that is 10 Km away gets split in 4, I don't loose any time at all to generate collision data for it. Only when the camera ( or any colliding object, such as ships ) comes near it, will generate the collision data.
Performance optimizations
While working on streaming mesh bodies, I realized something very interesting. Start trembling, that's a revelation:
The ODE function I was using for collision tests is o(n^2).
At this point, any non-programmer or non-mathematician will be lost. So a few words about that: we evaluate the complexity of an algorithm in o(expression). The "expression" part is a simple formula that estimates the number of instructions ( hence performance / complexity ) it takes to complete the algorithm. "n" is linear, "n^2" is squared, "n^3" is cubic, etc.. So for a given "n" ( inputs of the algorithm ), the formula says how long the algorithm ( overall ) takes to complete.
For example, if the algorithm has 100 inputs, if it's o(n), it'll take 100 instructions to complete; if it's o(n^2) it'll take 10000 instructions; and if it's o(n^3), it'll take 1 million instructions.
At this point it should be clear that o(n) algorithms are linear with the number of inputs, and they're usually the best algorithms you can make ( excepting algorithms that run in a constant time, which are o(1) ).
I'll stop the complexity theory here, but you should also know that algorithms can have many inputs ( for ex n and m, which could give complexities of o(n*m) or o(n*m^2), ... ), or even things such as o(n*log(n)) ).
Now back to ODE: I realized that the collision function was o(n^2). What that means is that it's a very basic, non-optimized implementation that slows down quickly and does not scale very well with a high amount of objects to test collisions with.
The dynamic octree I implemented for streaming mesh bodies give a complexity level of o(n*log(n)). It's much better. Now it scales pretty well with a high number of objects.
I created a small collisions prototype to test and validate this. I took SPECTRE's battleship, made of around 150 objects ( only ten for the hull, the rest is for the hangar/garages ). In the ICP, the Kraken battleship had up to 100 objects, to give a comparison order.
I instanced 25 of those ships in a cube of 10 km x 10 km x 10 km.
Then I spawned laser shots which moved in random directions in that cube. Once a shot hits a ship or goes outside the cube, it re-spawns somewhere else.
I tested 20, 50, 250, 1000 and 5000 shots. 5000 is quite a lot, but after all it's a stress test. In the ICP at one given time, there was probably a few hundred shots. For it might not be impossible to reach higher counts on a whole server, or in big fleet battles.
The three algorithms I tested are called "Fixed" ( ODE's o(n^2) algorithm ), "Stream2" and "Stream3". The difference between "Stream2" and "Stream3" is whether the collision test is done with a ray, or with a bounding box. The ray is more accurate but costs more performance.
Numbers are timings in milliseconds, lower is better. Test machine is a quad-core Q6600, but only one core is used.
Fixed:
20 shots -> 1.3 ms
50 shots -> 2.8 ms
250 shots -> 12.7 ms
1000 shots -> 51 ms
5000 shots -> 255 ms
Stream2:
20 shots -> 0.1 ms
50 shots -> 0.2 ms
250 shots -> 0.67 ms
1000 shots -> 2.4 ms
5000 shots -> 12 ms
Stream3:
20 shots -> 0.08 ms
50 shots -> 0.16 ms
250 shots -> 0.55 ms
1000 shots -> 1.9 ms
5000 shots -> 9 ms
Now, in the case of 5000 shots, you can see the streaming version runs 28 times faster ! Although that's a bit of an extreme case. In average, speed ups range from 15 to 25.
Oh, and if you didn't realize yet, the ICP was severely CPU limited. The collision tests for shots were always one of the main bottlenecks. And yes, it was using the o(n^2) "fixed" algorithm I mentionned above :)
Future work
I also fixed a few precision issues when ray-casting at long distances. I found some more that I need to investigate.
Ray-casting is a bit tricky, that it kind of defeats the purpose of the streaming algorithm. Ray-casts are necessary to determine the distance to the scene in a direction, for things such as depth-of-field or occlusion tests ( lens-flares in thrusters of ships, or suns ). They're not coherent, and anyway the camera orientation can change quickly. I'm thinking of creating my own Kd-tree just to handle collision tests for the terrain. The Kd-tree will be less deep and less optimized than ODE's one, but it's no big deal, since here the idea is to build a Kd-tree quickly ( in less than a millisecond ), even if collision tests are less optimal.
| |
|
| S | M | T | W | T | F | S | | 1 | 2 | 3 | 4 | 5 | | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 | 20 | 21 | | 23 | 24 | 25 | 26 | 27 | 28 | 29 | 30 | 31 | | | |
OPTIONS
Track this Journal
ARCHIVES
October, 2009
August, 2009
July, 2009
May, 2009
April, 2009
March, 2009
February, 2009
January, 2009
November, 2008
October, 2008
July, 2008
June, 2008
May, 2008
April, 2008
March, 2008
January, 2008
December, 2007
November, 2007
October, 2007
September, 2007
August, 2007
July, 2007
June, 2007
May, 2007
April, 2007
March, 2007
February, 2007
January, 2007
December, 2006
November, 2006
October, 2006
September, 2006
August, 2006
July, 2006
June, 2006
May, 2006
April, 2006
March, 2006
February, 2006
January, 2006
December, 2005
November, 2005
October, 2005
September, 2005
August, 2005
July, 2005
June, 2005
May, 2005
April, 2005
March, 2005
February, 2005
January, 2005
December, 2004
October, 2004
September, 2004
August, 2004
|