• entries
    2
  • comments
    32
  • views
    155432

About this blog

A journal about what I am currently working on, mainly about the development of the Bow-Shock game.

Entries in this blog

Silviu Andrei
It's been a long time since my last post and therefore I will dedicate my first paragraph to coming up with lame excuses for it.
My only excuse actually, is that I had very little spare time since I moved to the US with my wife and I tried to use it for the development of my engine.

I focused mainly on porting everything from XNA to C++ / DirectX 11 and on optimizations. There are not so many new features in the new version except for the chromatic effect of shallow water due to light scattering inside the water body and the fact that the new version is much faster due to a lot of optimizations that I've made. The previous XNA version was doing about 21-23 fps in the most GPU intensive scenes whereas the new DX11 version is doing 50-55 fps easy in the same scenes and it is mostly CPU bound (the GPU is about 40% idle) which means there is a lot more processing space on the GPU for other things in the future. BTW, my dev machine is a laptop with a GeForce GTX 460M GPU so it's not exactly a top of the line GPU.
In the remainder of this post I will describe some of the major changes that I've made and some of the challenges that I encountered during the porting process.

Terrain generation


In the new version of my engine I moved a lot of calculations that were previously done using full-screen quads to compute shaders. One of these functionalities is the procedural terrain generation. I also thought I could take advantage of the integer math operations that are new in DX11 to compute the pseudo-random numbers directly in the compute shader instead of sampling a texture of precalculated values. I did that but I didn't notice any big improvement. I did not run a thorough test on this yet in order to give a final verdict but I suspect that calculating a random number is not much faster than sampling a texture with filtering disabled.

The Ocean


While porting the ocean code I also noticed that the FFT transforms could be done in a compute shader which should be a lot faster than the pixel shader approach that Brunetton used in his code and which I also used in the XNA version of the engine. By googling around, I stumbled upon the NVIDIA code provided in their FFT ocean demo from the NVIDIA SDK 11 which is a 2D radix-8 FFT algorithm. That means, it can only transform 2D maps that have both width and height as powers of 8, for example: 64x64, 512x512, 8096x8096 etc. The problem was that I was using a 256x256 wave spectrum which could not be transformed with the NVIDIA code. So, I had the option to either move to a 512x512 spectrum or use a radix-4 or radix-2 FFT transform. I searched the web for a compute shader implementation of a radix-2 or radix-4 transform but couldn't find anything. In conclusion, if I wanted to stick to the 256x256 spectrum, I had to write my own FFT code and I was in no mood of doing that. I tried that once and it gave me many days of headaches in which I managed to write a 1D radix-2 FFT but it was not easy. The complexity of FFT transform algorithms grows exponentially when you go from one dimension to 2 dimensions so I decided to move to a 512x512 map and use the NVIDIA code. I figured that if it would prove to be to slow, I would move to a 256x256 map later.

There was actually also another option. I noticed there is a new interface in the DX11 SDK called ID3DX11FFT. However, it seems that it can only transform one spectrum at a time and I have 6 of them. This means I would need to issue 6 transform commands whereas the NVIDIA FFT code can be modified easily to transform all 6 of them in one step. The NVIDIA FFT has also the advantage of using a radix-8 algorithm which means it only needs to issue 6 512x512 Dispatch calls for a 512x512 spectrum whereas a radix-2 FFT (like the one Brunetton used and which I suspect, the ID3DX11FFT interface also uses) would require 8 Dispatch calls of the same size for a 512x512 spectrum. I could also be wrong and the DX11 interface could be smarter than that and use a different radix algorithm for different spectrum sizes but I couldn't find anything on the web that describes how it works internally. It also appears that no one ever used it and that's just weird.

Bottom line is, my new version uses a 512x512 spectrum transformed with a radix-8 FFT compute shader instead of a 256x256 spectrum transformed with a radix-2 pixel shader code and the new one is a lot faster. For the future, it would be interesting to experiment a bit with the DX11 FFT interface to see if it computes a 256x256 FFT transform faster than the NVIDIA code computes a 512x512 transform. I don' really need a 512x512 map, the gain in visual quality is negligible so I would prefer a 256x256 transform even if it's only 10% faster. I would also like to write my own radix-4 FFT code one day just for the sake of it and to prove to myself that I can do it smile.png. On the other hand I fear I might waste too much valuable time doing it.

Deferred rendering (shading)


There is not much to say about this except for the fact that I'm using it now. If you don't know what deferred rendering is, read this article to get the basic idea. I Initially implemented it because I wanted to leverage the advantage of not having to run all the expensive atmospheric scattering and water shading computations for pixels that eventually get occluded anyway. Later, I came to realize that this problem is already mostly being taken care of by the early-z rejection and the front-to-back sorting of the objects before rendering. However, I am giving deferred rendering another chance because it might prove itself useful later when I will need to render scenes with multiple small lights like indoor scenes. For the moment, I only have outdoor scenes where I have only one big light source.

Occlusion culling


For the ones who do not know what occlusion culling is, it's exactly what the name says: culling (not rendering) objects that are occluded by other objects in the scene.
I always wanted to give this a try and I finally did. It took me a lot of work but I am really pleased with the results. In some scenes, the frame-rate almost doubled. Basically I use hardware occlusion queries on OBBs which are calculated for each terrain node. I ran into some interesting problems during the implementation of this feature which I will describe in more detail in my next post (which will be soon, I promise smile.png ).

In the meantime, here is a video of my latest version:

Silviu Andrei
This is my first journal entry so I'll start with a few words about myself. My name is Silviu Andrei, I'm a software developer from Romania, currently living in the United States. I've been working as a software developer for about 8 years now. As I am also a passionate gamer and Sci-Fi fan I always dreamed of an open world space sim where you would be able to explore an entire galaxy and seamlessly land on planets, asteroids, comets etc. So, about one year ago I decided to start developing a planet renderer that allows you to transit from space to ground in a seamless way and about one month ago I decided to try to make a game out of my planet renderer. Enough about me and my boring history, let's get down to business.

I will briefly present here all major aspects of what I have done so far.

Technology



As my natural tendencies drove me, I started to develop the planet renderer in C# using XNA. As I am frustrated by the fact that XNA is limited to DirectX9 at the moment I decided to stop the development on the current version and port all the code to a different platform. I have still not decided whether I should stick to C# and use SlimDX or whether I should just go with C++ and DX11. C# has one big advantage, the development speed is much higher IMHO. I am also a bit rusty on the C++ side, it might take me a while to get warmed up with it. On the other hand, C++ is faster, not a lot faster but faster. I wrote an app last year to test this out. It consisted of a perlin noise generator and some single&double precision floating point calculations both in C++ and C#. It turned out that C++ was overall about 1.2x faster than C#. (I'll try to find that benchmarking app and publish the source code in my next entry if anyone is interested). This was not enough to convince me at that time to turn to C++. However, I would like to grease up my C++ skills a bit so I'm tending towards C++ right now. I just hope it's not going to stall my progress on Bow shock.

Terrain LOD



I spent a lot of time on the terrain LOD last year, experimenting with a lot of techniques. I started out with ROAM (which is not efficient on current hardware anymore), then moved to SOAR (not a big change there) and finally ended up using a version of geomipmaps. I basically have a quadtree where the basic shape is a cube with 6 nodes (one for each face of the cube). Each node has about 29x29 vertices and is split based on a split-priority (I keep a sorted split-priority queue for all the nodes). The split priority is calculated based on the node's sea-level radius and its distance from the camera. The sea-level radius is the distance from the center vertex to the farthest vertex (all at sea-level without any height displacement). Initially I used instead of the sea-level radius the LOD of the node but soon I noticed that nodes closer to the main cube's edges tend to have smaller radius and nodes closer to the center of the cube's faces have larger radius due tothe cube-to-sphere transformation. Therefore nodes closer to the cube's edges got split sooner than nodes closer to the center of the cube's faces. The distance from the camera is calculated using the center vertex of the node (displaced by the heightfield)

Terrain Heightmap



For a planet of this scale it's not feasible to generate the terrain offline and store it to disk, it would take too much space and it wouldn't be possible to land on a planet without "loading" stops. Therefore it must be done procedurally in real-time each time a node is split. The terrain, along with atmospheric scattering (which I will describe below) is the part that required most of my time. The problem with heightmap generation is that variable tweaking to code implementation ratio is very high. I think I spent most of the time tweaking variables, recompiling, waiting for the result and going back to tweaking. Maybe it would have saved a lot of time if I had built a terrain editor of some sort where I could tweak parameters in real-time. I didn't do it because I always felt like the tweaking is almost done and that I would waste too much time writing one. Also the compile time of HLSL effects with deeply nested loops in XNA is horrible, I had a shader that took up to 20 minutes to compile, after tweaking I got it to 30 seconds but that is still a lot of time when you do constant tweaking on the shader code. For the terrain generation I use perlin noise, multifractal noise and cell noise. I use 3 main levels of noise, the first level defines the shape of the continents, the second one defines the mountain map (the areas that contain mountains) and the third one defines the mountains themselves.

The mountain map is basically 2 octaves of F1 Voronoi noise displaced by some FBM and modulated by the continent map so that mountains are most likely to exist on the continents but to allow for a small chance for them to extend outside the continent shore.

The mountain generation noise required most of the tweaking. It contains 2 octaves of F2-F1 Voronoi noise that define the basic shape of the mountains followed by a multifractal of 10 octaves of F1 Voronoi noise for the mountain details and an overall terraced effect modulated by 3 octaves of perlin noise (FBM) so that only some of the mountains become terraced.

I use a heightmap of 160x160 for each node and I compute a finite difference object-space normalmap in a second pass. I also render a 29x29 heightmap to get the height for the node vertices on the CPU side. I do this because I will need the height of the vertices for collision detection on the CPU side.


Atmospheric scattering



I get chills up my spine whenever I think of atmospheric scattering. I spent so much time tweaking, retweaking, dropping everything and starting from scratch that I had enough of it. First I implemented Nishita'smethod from his original paper, then moved to Oneil's implementation and finally I dropped everything and settled to Bruneton's method for multiple scattering which also took some code-porting, tweaking and bugfixing. I am pleased with this final method and I hope I will never have to touch that part of code again. There is a lot of math involved here which I will not go into. It's allexplained in his paper "Eric Bruneton and Fabrice Neyret - Precomputed Atmospheric Scattering" which is freely available on the internet if you are interested.

Ocean



For the ocean I also experimented with a few methods. First I started with Gerstner waves computed directly in the vertex shader and pixelshader (for normal mapping). After hitting some aliasing and performance problems I implemented Bruneton's ocean model with a projected grid which instead of using a fixed object-space mesh for the ocean, it uses a screen-space mesh. Basically you have a grid of NxM equally distanced vertices on the screen which you project, in the vertex shader, onto the surface of the ocean, displace them by the wave function and then project them back to screen. This way you will have the same amount of vertices no matter how close you are to the surface of the ocean. The performance increased a lot but aliasing problems are only less obvious but they are still present and it looks bad at the shore because thegrid vertices sample the ocean at different locations along the shore and waves seem to pop out of the ground as you move the camera. Finally I implemented a FFT ocean model inspired from the Tessendorf paper and the Bruneton ocean code. Basically I generate a 256x256 wave heightmap and normalmap using a Fast Fourier Transform implemented in the pixel shader which I then tile over the entire ocean surface.


Ship controls



I spent the last couple of weeks working on the ship, finding a free model on turbosquid that also looks good. Importing it into the engine, adding thruster effect and adding navigation controls. The ship had to have some sort of wings because I wanted it to behave in the atmosphere like anairplane. The ship is also able to fly out of the atmosphere into space were it uses the same main thruster for gaining linear speed but uses nozzles for rotating since the wings do not generate any lift in space. For atmospheric flight, I implemented the drag and lift force equations based on speed, airdensity, wing surface, angle of attack etc. When the airspeed (linear speed^2 times the air density) is below a certain threshold I enable the nozzles to control the ship. I also simulated an autopilot that tries to stabilize the ship if it starts to rotate. In space the autopilot calculates which nozzles to activate in order to stop the ship from rotating and inside the atmosphere it adjusts the ailerons of the ship in order to control the roll. The roll is the only rotation that can go out of control at high speed since the pitch is implicitly stabilized by the tail and the bank by the ruder. The autopilot also tries to stop the ship from nose diving as it loses speed, by adjusting the main wing's flaps. I implemented all the auto-pilot adjustments by constantly solving all the aileron/nozzle equations for aileron angle of attack / nozzle intensity.

This is my ShipWing class that does all the aerodynamics calculations.
public class ShipWing{ private float _shipOneOverMomentOfInertia; private float _shipMass; public ShipWing(float shipShipOneOverMomentOfInertia, float shipMass) { _shipOneOverMomentOfInertia = shipShipOneOverMomentOfInertia; _shipMass = shipMass; } public Vector3 CenterOfGravity { get; set; } public Vector2 MinMaxDrag { get; set; } public Vector2 MinMaxArea { get; set; } public Vector3 Normal { get; set; } public Vector3 BiNormal { get; set; } public float Flaps { get; set; } /// /// Returns the lift coefficient for the specified angle of attack and flaps angle /// /// Angle of attack in radians /// Flaps angle ranging from -1 to 1 representing about +- 28 degrees public float GetLiftCoefficient(float angleOfAttack, float flaps) { angleOfAttack += flaps * 0.5f; angleOfAttack /= (float)(Math.PI / 2d); if (angleOfAttack < -1) angleOfAttack += 2; if (angleOfAttack > 1) angleOfAttack -= 2; return (float)Math.Sin(angleOfAttack * (1 + (1 - Math.Abs(angleOfAttack))) * Math.PI) * 1.5f + (flaps * 0.9f); } /// /// Solves for flaps angle using a binary search /// /// The angle of attack of the wing /// The desired lift coefficient /// public float GetFlaps(float angleOfAttack, float liftCoeff) { float delta = 0.001f; float maxLc = 2.4f - delta * 2; float minFlaps = -1; float maxFlaps = 1; if (Math.Abs(liftCoeff) < delta) return 0; if (liftCoeff > maxLc) return maxFlaps; if (liftCoeff < -maxLc) return minFlaps; var flaps = 0f; var currLC = 0f; do { currLC = GetLiftCoefficient(angleOfAttack, flaps); if (Math.Abs(currLC - liftCoeff) > delta) { if (currLC > liftCoeff) { maxFlaps = flaps; flaps = flaps + (minFlaps - flaps) * 0.5f; } else { minFlaps = flaps; flaps = flaps + (maxFlaps - flaps) * 0.5f; } } } while (Math.Abs(currLC - liftCoeff) > delta && Math.Abs(1 - liftCoeff) > delta && Math.Abs(-1 - liftCoeff) > delta && minFlaps < maxFlaps); return flaps; } /// /// Solves the wing for drag and lift /// /// /// /// 0.5f * AirDensity * speedSquared /// /// public void SolveWing(Vector3 shipSpeed, Matrix shipWorld, float A, out Vector3 drag, out Vector3 lift) { lift = Vector3.Zero; var speedDir = Vector3.Normalize(shipSpeed); var worldNormal = Vector3.TransformNormal(Normal, shipWorld); var worldBiNormal = Vector3.TransformNormal(BiNormal, shipWorld); var angleOfAttack = MathUtils.GetUnitVectorsAngle(worldNormal, speedDir) - MathHelper.PiOver2; var angleOfAttackScaled = angleOfAttack / MathHelper.PiOver2; var angleOfAttackABS = Math.Abs(angleOfAttackScaled); var dragCoeff = MinMaxDrag.X + angleOfAttackABS * (MinMaxDrag.Y - MinMaxDrag.X); var dragArea = MinMaxArea.X + angleOfAttackABS * (MinMaxArea.Y - MinMaxArea.X); drag = -A * dragCoeff * dragArea * speedDir; if (float.IsNaN(shipSpeed.Length())) return; if (BiNormal.Length() > 0.5f) { var liftCoeff = GetLiftCoefficient(-angleOfAttack, Flaps); var liftDir = Vector3.Normalize(Vector3.Cross(speedDir, worldBiNormal)); if (Math.Abs(1f - liftDir.Length()) < 0.1f) lift = A * liftCoeff * MinMaxArea.Y * liftDir; if (float.IsNaN(shipSpeed.Length())) return; } } /// /// Solves for flaps angle in order to get the desired angular acceleration /// /// linear speed /// ship matrix /// 0.5f * AirDensity * speedSquared /// desired angular acceleration /// public float SolveFlapsForAngularAccel(Vector3 shipSpeed, Matrix shipWorld, float A, float desiredAcceleration) { var torque = desiredAcceleration / _shipOneOverMomentOfInertia; var force = (torque / CenterOfGravity.Length()); var liftCoeff = force / (A * MinMaxArea.Y); var speedDir = Vector3.Normalize(shipSpeed); var worldNormal = Vector3.TransformNormal(Normal, shipWorld); var angleOfAttack = -MathUtils.GetUnitVectorsAngle(worldNormal, speedDir) - MathUtils.PI2; return GetFlaps(angleOfAttack, liftCoeff); } /// /// Solves for flaps angle in order to get the desired linear acceleration /// /// linear speed /// ship matrix /// 0.5f * AirDensity * speedSquared /// desired linear acceleration /// public float SolveFlapsForLinearAccel(Vector3 shipSpeed, Matrix shipWorld, float A, float desiredAcceleration) { var force = desiredAcceleration * _shipMass; var liftCoeff = force / (A * MinMaxArea.Y); var speedDir = Vector3.Normalize(shipSpeed); var worldNormal = Vector3.TransformNormal(Normal, shipWorld); var angleOfAttack = -MathUtils.GetUnitVectorsAngle(worldNormal, speedDir) - MathUtils.PI2; return GetFlaps(angleOfAttack, liftCoeff); }}
And here are 2 videos:

[media]
[/media]

[media]
[/media]