Directx 11 porting & other changes

posted in Silviu Andrei's Journal

Published May 06, 2012

It's been a long time since my last post and therefore I will dedicate my first paragraph to coming up with lame excuses for it.
My only excuse actually, is that I had very little spare time since I moved to the US with my wife and I tried to use it for the development of my engine.

I focused mainly on porting everything from XNA to C++ / DirectX 11 and on optimizations. There are not so many new features in the new version except for the chromatic effect of shallow water due to light scattering inside the water body and the fact that the new version is much faster due to a lot of optimizations that I've made. The previous XNA version was doing about 21-23 fps in the most GPU intensive scenes whereas the new DX11 version is doing 50-55 fps easy in the same scenes and it is mostly CPU bound (the GPU is about 40% idle) which means there is a lot more processing space on the GPU for other things in the future. BTW, my dev machine is a laptop with a GeForce GTX 460M GPU so it's not exactly a top of the line GPU.
In the remainder of this post I will describe some of the major changes that I've made and some of the challenges that I encountered during the porting process.

Terrain generation

In the new version of my engine I moved a lot of calculations that were previously done using full-screen quads to compute shaders. One of these functionalities is the procedural terrain generation. I also thought I could take advantage of the integer math operations that are new in DX11 to compute the pseudo-random numbers directly in the compute shader instead of sampling a texture of precalculated values. I did that but I didn't notice any big improvement. I did not run a thorough test on this yet in order to give a final verdict but I suspect that calculating a random number is not much faster than sampling a texture with filtering disabled.

The Ocean

While porting the ocean code I also noticed that the FFT transforms could be done in a compute shader which should be a lot faster than the pixel shader approach that Brunetton used in his code and which I also used in the XNA version of the engine. By googling around, I stumbled upon the NVIDIA code provided in their FFT ocean demo from the NVIDIA SDK 11 which is a 2D radix-8 FFT algorithm. That means, it can only transform 2D maps that have both width and height as powers of 8, for example: 64x64, 512x512, 8096x8096 etc. The problem was that I was using a 256x256 wave spectrum which could not be transformed with the NVIDIA code. So, I had the option to either move to a 512x512 spectrum or use a radix-4 or radix-2 FFT transform. I searched the web for a compute shader implementation of a radix-2 or radix-4 transform but couldn't find anything. In conclusion, if I wanted to stick to the 256x256 spectrum, I had to write my own FFT code and I was in no mood of doing that. I tried that once and it gave me many days of headaches in which I managed to write a 1D radix-2 FFT but it was not easy. The complexity of FFT transform algorithms grows exponentially when you go from one dimension to 2 dimensions so I decided to move to a 512x512 map and use the NVIDIA code. I figured that if it would prove to be to slow, I would move to a 256x256 map later.

There was actually also another option. I noticed there is a new interface in the DX11 SDK called ID3DX11FFT. However, it seems that it can only transform one spectrum at a time and I have 6 of them. This means I would need to issue 6 transform commands whereas the NVIDIA FFT code can be modified easily to transform all 6 of them in one step. The NVIDIA FFT has also the advantage of using a radix-8 algorithm which means it only needs to issue 6 512x512 Dispatch calls for a 512x512 spectrum whereas a radix-2 FFT (like the one Brunetton used and which I suspect, the ID3DX11FFT interface also uses) would require 8 Dispatch calls of the same size for a 512x512 spectrum. I could also be wrong and the DX11 interface could be smarter than that and use a different radix algorithm for different spectrum sizes but I couldn't find anything on the web that describes how it works internally. It also appears that no one ever used it and that's just weird.

Bottom line is, my new version uses a 512x512 spectrum transformed with a radix-8 FFT compute shader instead of a 256x256 spectrum transformed with a radix-2 pixel shader code and the new one is a lot faster. For the future, it would be interesting to experiment a bit with the DX11 FFT interface to see if it computes a 256x256 FFT transform faster than the NVIDIA code computes a 512x512 transform. I don' really need a 512x512 map, the gain in visual quality is negligible so I would prefer a 256x256 transform even if it's only 10% faster. I would also like to write my own radix-4 FFT code one day just for the sake of it and to prove to myself that I can do it

. On the other hand I fear I might waste too much valuable time doing it.

Deferred rendering (shading)

There is not much to say about this except for the fact that I'm using it now. If you don't know what deferred rendering is, read this article to get the basic idea. I Initially implemented it because I wanted to leverage the advantage of not having to run all the expensive atmospheric scattering and water shading computations for pixels that eventually get occluded anyway. Later, I came to realize that this problem is already mostly being taken care of by the early-z rejection and the front-to-back sorting of the objects before rendering. However, I am giving deferred rendering another chance because it might prove itself useful later when I will need to render scenes with multiple small lights like indoor scenes. For the moment, I only have outdoor scenes where I have only one big light source.

Occlusion culling

For the ones who do not know what occlusion culling is, it's exactly what the name says: culling (not rendering) objects that are occluded by other objects in the scene.
I always wanted to give this a try and I finally did. It took me a lot of work but I am really pleased with the results. In some scenes, the frame-rate almost doubled. Basically I use hardware occlusion queries on OBBs which are calculated for each terrain node. I ran into some interesting problems during the implementation of this feature which I will describe in more detail in my next post (which will be soon, I promise

).

In the meantime, here is a video of my latest version:

Previous Entry Bow shock - A summary of work done so far

5 likes 16 comments

Comments

Hyunkel

This is really interesting!
I'm also working on DX11 procedural planet generation on the gpu (my masters thesis topic) and our approaches seem to be very similar.
However I'm wondering why you are CPU bound if you compute your terrain on the GPU?

The only terrain related work I'm doing on the CPU is quadtree splitting, sorting and culling, which isn't very expensive and can be done in a separate thread.
Everything else I do on the GPU, mostly in compute shaders.
I get ~99% GPU usage because my CPU isn't really doing much besides uploading data to the GPU.

I spend about ~1ms in compute shaders each frame (at high lod's), generating vertex positions, normals and doing stitching for approximately 340.000 vertices on a gtx580. (A total of 32 octaves of 3d perlin multifractals)
Because this is so fast I haven't bothered optimizing yet and just regenerate the entire planet every frame.

I have got to say though, your terrain looks so much better than mine, especially your mountains, and that is some very good looking water.
I've seen in your other journal entry that you are using voronin/cell noise to generate your mountains which I found very interesting.
Would you maybe be willing to say a few words about how you displace your voronin noise input to get such good looking results, or how you generate your terrace effect?

Cheers,
Hyu

May 09, 2012 04:42 PM

Moe

That is some very impressive water indeed!

Other than the switch in the FFT, I'd be curious to know what caused the speed difference between the XNA version and the C++/DirectX 11 version.

May 10, 2012 05:11 PM

Silviu Andrei

[quote name='Hyunkel' timestamp='1336581737']
This is really interesting!
I'm also working on DX11 procedural planet generation on the gpu (my masters thesis topic) and our approaches seem to be very similar.
However I'm wondering why you are CPU bound if you compute your terrain on the GPU?
[/quote]

Thanks. My engine became mostly CPU bound since I implemented the occlusion culling algorithm. Each Draw call is taking a small amount of time on the CPU which adds up. I have lots of Draw calls on OBB's for the occlusion queries which are very fast on the GPU but still take that small amount of time on the CPU. I guess I could do multi-core rendering on deferred contexts to speed things up but this is not a big problem right now. Also, I forgot to mention that if I increase the resolution, the CPU "boundness" drops a lot, in 1080p resolution it is actually absent ... on my hardware.

[quote name='Hyunkel' timestamp='1336581737']
Would you maybe be willing to say a few words about how you displace your voronin noise input to get such good looking results, or how you generate your terrace effect?
[/quote]

First of all, I just noticed my big spelling mistake. In my previous post I called it "voronin noise" which is incorrect, it is called "Worley noise", "Voronoi diagrams" or "Cell noise". They all refer to basically the same thing. I corrected it, sorry about that. For the inputs to the Voronoi noise I did something very similar to [url="http://www.gamedev.net/blog/73/entry-1836604-craters-and-normal-maps/"]this post[/url] from ysaneya's journal. It was a long time ago, I don't remember all the details and my code looks awfull. Basically you have to experiment with the parameters a lot. Here is some of my code, I tried to explain a bit in comments, hope you can make sense of it:

[CODE]

//These are the basic function definitions that I have:

//Returns multifractal of cell noise octaves
//noiseType: 0 = F1 noise; 1 = F2 noise; 3 = F2-F1 noise
float getCellMultiFractal(int nrOct, in float3 t, float gain, inout float weight, int noiseType)

//Returns one noise octave
//functionType: 0 = F1 cell, 1 = F2 cell ...... 4 = Perlin Billowy, 5 = Perlin Ridged, .... 8 = Perlin Sharp billowy, 9 = Perlin Sharp ridged ..... etc
float getNoise(float3 t, int functionType)

//Returns n octaves of FBM noise
float getFBM(int n, float3 t)

//Applies a terraced effect to "val" (in "n" steps if val -> (0, 1).)
//"power" defines the steepness of the terraces. The higher the value of "power", the steeper the terraces will be.

float getTerraced(float val, float n, float power)
{
float dVal = val * n;
float f = frac(dVal);
float i = floor(dVal);

return (i + pow(f, power)) / n;
}
[/CODE]

Then you just start playing with these functions and use one function to distort the inputs of another. This is an example from my mountain generation code:

[CODE]
float t0 = getFBM(4, t*f*10);
float d0 = t0 * 0.025;

float v0 = getNoise((t*f * 2 + d0*0) * (1 + d0 * 0.0625), 2);
v0 = v0 * v0;
total += v0 * a;
a*=0.5;

float v1 = getNoise((t*f * 2 + d0*0) * (2 + d0 * 0.0625), 2);
v1 = v1 * v1;
total += v1 * a;
a*=0.5;

float weight = smoothstep(-0.1, 0.1, total);

float mf = getCellMultiFractal(10, (t*f*5 + d0 * 0.5) * (1 + d0 * 0), 1.8, weight, 0);
total += mf * 0.35;

float tDef = getNoise(t*300, 3);
tDef = smoothstep(0.5, 0.7, tDef);
total = getTerraced(total, 15, 1 + tDef*1.5);
[/CODE]

As I said before, this takes a lot of frustrating tweaking to get right and you will never be pleased with the results, you always feel like you could do a little better [img]http://public.gamedev.net//public/style_emoticons/default/smile.png[/img]

May 11, 2012 03:29 PM

Silviu Andrei

[quote name='Moe' timestamp='1336669888']
That is some very impressive water indeed!

Other than the switch in the FFT, I'd be curious to know what caused the speed difference between the XNA version and the C++/DirectX 11 version.
[/quote]

Thanks. Besides the FFT transform which helped a lot, most of the speed improvement I gained from the occlusion culling and another visibility culling method which I plan to explain soon in a future post. Basically I cull geometry that is not visible either because it's too deep underwater or it has no sea-level vertices in case of the ocean and so on.

May 11, 2012 03:45 PM

Hyunkel

Thank you so much for your reply.
It really helps a lot. I was getting so frustrated because I wasn't making any progress with my terrain shaping lately.
Now I have new things to try out that will hopefully provide me with better results.

It seems like we're using a different method after all though.
I'm basically generating a variable amount of 33x33 terrain patches in a compute shader (position + normal) and store them in a buffer.
During my geometry pass when I want to render the planet, I use a NULL vertex buffer, an index buffer for a 33x33 patch, and a hardware instancing buffer that only provides patch id's.
Using that and SV_VertexID I can sample the correct vertex positions and normals from my generated data.

May 12, 2012 12:53 AM

Hyunkel

On a related note because the XNA -> DX11 topic was brought up:
I started out prototyping what I wanted to do in XNA as well.
I made the switch to C++/DX11 rather early, and the biggest difference really is the ability to use compute shaders.

For example I did normal vector generation with geometry shaders in XNA.
I do it in compute shaders now, making use of group shared memory which speeds things up a lot.

Structured buffers are also really handy.

May 12, 2012 03:11 PM

Skara

Thanks for sharing your knowledge, progress and observations in these posts. Now that you have moved to C++/DX11, are you planning to open source any parts of the DX9/XNA work as a base or starting point for anyone interested in following in your foot steps or post segments as tutorials?

May 27, 2012 03:35 PM

Carlos Bomtempo

WOW!!!

Can you share at the least the binary? I really want to run that on my machine and seet it myself.

Great job!

June 06, 2012 04:25 AM

Jason Reskin

Technology has sure changed since I wrote the terrain engine for Rise: TVP (http://rise.unistellar.com). I am currently building a flight simulator (http://www.unistellar.com) and started considering the visuals recently. Any chance of collaborating?

June 13, 2012 09:49 PM

wdj294

You should port this when Unity3d 4.0 comes out...concerning directx 11

[url="http://unity3d.com/?unity4"]http://unity3d.com/?unity4[/url]

June 20, 2012 01:14 AM

holocronweaver

I am working on a very similar planet generation project, except my code is C++/OpenGL with an eye towards cross platform support. Once the code is stable enough, I plan to go open source and release the terrain generator as a separate project from my game engine.

Should you choose not to release your source code, I would be very interested in comparing code bases in the near future. Let me know if you are willing.

Also, what is the other 'visibility method culling' you were referring to that's used in addition to occlusion culling?

September 16, 2012 11:06 PM

Silviu Andrei

[quote name='holocronweaver' timestamp='1347836776']
I am working on a very similar planet generation project, except my code is C++/OpenGL with an eye towards cross platform support. Once the code is stable enough, I plan to go open source and release the terrain generator as a separate project from my game engine.

Should you choose not to release your source code, I would be very interested in comparing code bases in the near future. Let me know if you are willing.
[/quote]

I'm not going to release my sourcecode in the near future but send me a private message and we can talk about that.

[quote name='holocronweaver' timestamp='1347836776']
Also, what is the other 'visibility method culling' you were referring to that's used in addition to occlusion culling?
[/quote]

For the other visibility culling method, I used a geometry shader to decide based on the triangle's altitude if it should be rendered using the terrain material, the ocean material or both. This way, I can use a single draw call for each quad node and let the GPU decide what to do with it. Also, for the refraction-map pass, I use a geometry shader to cull triangles that are below the max visibility depth of the water or above the max wave height.

September 17, 2012 01:37 PM

aioria

great job!!

This is very impresive!

November 23, 2013 06:56 AM

DTrudeau

Very interesting work! Are you still active on this project? And if so would you consider talking more about the game you hope to create with this?

July 05, 2014 03:50 AM

shinkamui

wow?it's really amazing!!

is there any plan to open source?

June 11, 2015 01:35 PM

L15

I just wonder what happened to this project and to its author. It's sad that he never posted anymore on that! If you are still around, give us news about how that went on! Cheers

January 27, 2016 05:06 AM

You must log in to join the conversation.

Don't have a GameDev.net account? Sign up!

Silviu Andrei

Author

Latest Entries

Directx 11 porting & other changes

33241 views

Bow shock - A summary of work done so far

27116 views

Directx 11 porting & other changes

Terrain generation

The Ocean

Deferred rendering (shading)

Occlusion culling

Comments

Silviu Andrei

Latest Entries

Directx 11 porting &#38; other changes

Bow shock - A summary of work done so far

Reticulating splines

Directx 11 porting & other changes