You can also find a simple orbit/pan camera in my demos: http://dl.dropboxusercontent.com/u/45638513/sdx/Tut08.zip
DwarvesHMember Since 12 Jun 2013
Offline Last Active Sep 15 2014 01:19 AM
- Group Members
- Active Posts 172
- Profile Views 2,089
- Submitted Links 0
- Member Title Member
- Age Age Unknown
- Birthday Birthday Unknown
- Website URL http://dwarvesh.blogspot.com
Posted by DwarvesH on 08 September 2014 - 04:28 AM
Posted by DwarvesH on 09 April 2014 - 01:23 AM
With 16bit indices you can't have such a large index count.
There are 16bit and 32bit indices. XNA Reach only supports 16bit. HiDef supports 32bit.
Posted by DwarvesH on 15 March 2014 - 01:58 PM
I'm not a game developer, so take my advice for what its worth You have already made the case for why you need to use permutations in the beginning of your post. Max performance, and support for a number of different lighting techniques pretty much mandates that you have a bunch of different permutations of your shaders. I would also suggest that if you can auto generate your shaders, then absolutely you should - with one caveat. If you want max performance, then you should allow special customization for cases where you find that the generated code isn't all that efficient.
Other than that, you should be pre-compiling your shaders anyways, so really that doesn't affect your end product. It seems to me that you have already made a strong, logical case for going the way that you have, so I agree with your approach!
OK, thanks for the feedback! One good thing I noticed is that my prototyping time is way down. I can write a new sub-shader, run my generation and try it out with it affecting all permutations.
The generated code for an individual shader is not that short but very readable. So I can track down bugs, do changes in the generated version and if needed feed them back to the generator.
So, what are you then? You have a huge rep...
And I also want them to perform at maximum speed, so run-time shader dynamic branching is out of the question.
Ironically, on modern hardware, you are likely going to shoot yourself in the food with this mindest. Switching shaders, from what I recall, can be almost as, if not even more, expensive than dynamic branching. Especially when, as in permutations, all branches will take the same path for all pixels of the same mesh.
Regarding your actual question, it appears to me that you are putting too much weight at those "shader profiles". From the number of techniques you statet, am I correct to assume all those settings are part of one uber-shader? In my own implementation, stuff like SSAO are all different shaders applied in different stages. I am using a deferred renderer, so my implementation might vary, but especially HDR (and maybe SSAO too) can be done in a post-process-effect, therefore be their own shader. This not only reduces the number of permutations, but also removes some - no need for an "off" permutation, off simply equals to not rendering the pass.
No uber shaders, only relevant settings are accessed in one permutation/render profile. I had uber shaders in the dynamic branching version. They greatly vary in size, with the flat ambient shader being one line of code and the mixed metal dielectric lerp environment mapped lighter being quite long. The former only depends on the material ambient color set as a pixel shader constant, while the alter has a ton of material properties and sampler dependencies.
Every permutation can be easily copied and pasted into any FX file as long as the 200 lines of code of common structure and variables are included.
I am using forward rendering, so I pretty much need to either change change shader technique or lighting constants for every single object, but the total pool of techniques is low. One disadvantage is that you can't use instancing, but you can't really use instancing with forward rendering only with simple lighting schemes.
I wrote the whole thing today, so it is not perfect yet. Some things shouldn't be there.
HDR shouldn't result in a permutation, and once I add it to the post processing pipeline I will reduce my permutations by half. But today is the first time I wrote a HDR shader, so I went with the simplest solution.
SSAO will again be added to the post processor, so no further permutations. I first need to figure out how the hell to do SSAO in a forward renderer. I tried before but it is very grainy.
SSAA is a weird beast. I'm using fixed buffer size SSAA so you must run it every single pixel shader. To keep the permutations down I only included three settings out of the 6 I have implemented. The final version will have off, almost medium and high since it is so expensive, leaving out things like the super duper ultra high version (11x SSAA). That is only there as a reference when I need to compare to maximum achievable quality.
And If I decide to drop SSAA I will reduce permutation by 66%, but I do hope to keep it. I don't care about surface aliasing, but SSAA makes the render rock solid when objects are moving around. I use it for it's temporal aliasing properties and high frequency data shimmering reduction.
And I plan to keep it around for a secret screenshot mode, with even more hardcore options. Maybe 30x SSAA...
Posted by DwarvesH on 15 March 2014 - 11:35 AM
So I decided to support most reasonable lighting setups. And I also want them to perform at maximum speed, so run-time shader dynamic branching is out of the question.
So I came up with the concept of render profiles. Each combination of render profiles results in a unique pixel shader. All permutations are automatically generated.
Render profiles support the following settings for now:
- Ambient mode. Can be off, flat color modulation, spherical harmonics or environment map ambient lighting. For metallic or mixed objects off ambient lighting is the same as flat, because of reasons and PBR.
- Baked ambient AO. On or off. Baked AO only gets displayed in the ambient component because you shouldn't have AO in strong direct light.
- SSAA mode: three settings. Off, medium and super duper ultra for now.
- HDR mode. Currently only off and a single tone-mapping operator is supported. I'll add things like filmic later.
- Material nature: metallic or dielectric. Or mixed, where you can lerp between metallic and dielectric.
This is pretty comprehensive. I found that a handful of render profiles are enough to render scenes.
The only problem is the number of permutations. With this limited setup there are 120 different techniques. I can easily see this getting over 1000. They are autogenerated so not a big problem, but I was wondering how others do this.
Manual shader source management is out of the question. Even a custom tailored solution that only woks with exactly what I want to render and that shader is compiled for my setup only will have dozens of permutations, so generated seems to win. The 120 techniques do occupy 100 KiB of source code and take 10 second to compile under FXC, but precompiled loads very fast.
So my question for more experienced folk: is this a good approach? Half live 2 uses almost 2000 permutations, so I'm not the only one doing this. And pretty much everyone uses permutations to handle different light setups and numbers. Unless they write those fancy shaders with for loops.
Posted by DwarvesH on 06 March 2014 - 11:38 PM
XP->Vista introduced a new driver model. D3D10/11 still could've been implemented for XP if they cared about supporting it though... e.g. GL4 exposes all the D11 functionality on XP...
I'm sure it could have been, but there's a lot more than marketing going on there. Agree on 11.1 restrictions being a bit silly, even if it's not just marketing either (new WDDM and DXGI versions).
Marketing isn't the right word for it. There are multiple. "Agenda" is one for it.
It is perfectly fine to introduce new models and break compatibility once in a while. The driver model change in Vista was sorely needed. When something really need improvement, it should be improved.
But their current actions betray an agenda, not them seeing a need for improvement. They are practically strong arming you into upgrading. Sure, it is always easier to not back-port changes, but I'm sure the current management would even prevent those back-ports is possible. They need to sell Windows 9. They need to reach with again numbers like in the glory periods, even though with current PC sales trends that is not possible even if Windows 9 is the best Windows yet. I have Windows 8 and it is a piece of crap, especially for gaming and engine development. Whatever support for miss-behaving fullscreen applications was in the OD previously, "Metro" gutted it. I often need to log out because the desktop app can't handle a full-screen app doing stupid stuff. Well, at least I could never crash the GPU driver like I could under 7 .
But being tied down to a version of a software sucks. I'm not die hard by principle, but I'll probably end up the last DirectX 9 user in the world. Every time I try to port stuff and do dual maintenance, something does not work.
Anyway, screw stubborn principles. I'm upgrading right now to Windows 8.1. The reason I did not update until now was I that mistakenly believed you needed an account for it. I was just about ready to create my account, when I happily noticed it would let me download the update without it.
I have over 40 account that are all important. Does anybody else have problems with modern social media and services presence and the number of accounts you need?
Posted by DwarvesH on 06 March 2014 - 11:02 AM
Maybe it's taking ideas from AMD's Mantle (or developed alongside...) to give better direct access to the hardware more akin to consoles.
Edit: and possibly a Windows 9 exclusive?
Yeah, that's my guess too. Looking at the history of DirectX from 9 to 11 that would be a likely direction for them. This applies for both performance and control and exclusivity
But if by some miracle they make the new DirectX available under Windows 7 and up, I will officially bury the hatchet, forget my recent animosity gained towards Microsoft and they will also acquire a huge buffer of good will, enough for one Microsoft employee to shit weekly on my doorstep for at least a year.
Posted by DwarvesH on 03 March 2014 - 05:55 AM
This is why I don't like releasing stuff until it is 100% done . I found some CSAA 32x and multithreading bugs. I fixed them and back-ported the fixes to each version.
I also split up the tutorials, each in its own archive to make it easier to edit one. I edited the first post with links.
DirectX 10 is not progressing at all. I'm getting super strange bugs and abysmal fullscreen MSAA performance, so I'll leave it for later.
The next tutorial is going to be on GUI. Unless someone can point me to a good lightweight but powerful GUI system for SharpDX. I couldn't find one. If you know of one please let me know.
Oh, and I should stop calling these tutorials. They are far too complex for that title. It is more like a DIY step-by-step iterative toy engine and shaders for each common rendering task.
Posted by DwarvesH on 27 February 2014 - 07:47 AM
I wrote the 10th one: http://dl.dropboxusercontent.com/u/45638513/sdx/Tut10.zip
It should have the sources and a working exe. If you want to compile it you'll need this too: http://dl.dropboxusercontent.com/u/45638513/sdx/Dependencies.zip
To take control of my running away back log I'll write up the text of the tutorial before I continue!
Now that I remember, I forgot to give the controls for the demo:
- Mouse buttons: camera control
- P: Toggle cube rotation
- Alt-Enter: toggle fullscreen
- F2: Toggle VSync
- F3: Cycle MSAA/CSAA
- F4: Cycle FXAA/SMAA
- F11: Toggle master post processing override. This way you can force to off the entire post processing framework. Good for debugging.
- F12: Toggle master AA override. This way you can force to off the entire MSAA/CSAA framework. Good for debugging.
- T: Toggle debug text.
- +/-: Adjust bloom threshold
- O: Toggle bloom debug overlays
- B: Cycle though bloom modes. Mode 0 is no bloom. Mode 1 is default bloom. The rest of modes are not that useful for realistic rendering.
- E: Cycle though render modes:
- High: Default
- Debug: Ambient spherical harmonics lighting
- Debug: Diffuse lighting
- Debug: Specular lighting
Posted by DwarvesH on 24 February 2014 - 08:12 AM
I did not find some proper SharpDX tutorials so I started writing a few. I know that SharpDX is basically DirectX, but porting your knowledge from one to the other is not as straightforward and I did have to spend quite a while searching for things in the documentation. I finished tutorial 9 back in December and wanted to write another one, but then I got into physically based BRDF and that is a huge topic so I couldn't finish the 10th one. Sorry that I did not post them before but I needed to clean them up badly and I've been super busy in 2014.
So a few words about the tutorials: they are built on the idea that you take a very simple sample and keep adding small features to it until you have a basic but rich renderer. The code is written for people who like DirectX (i.e. having full control, manually setting device states, sampler states, shader variables, etc.) but at the same time would like to have a full set of features pretty much out of the box. So you will manually fill in buffers and set variables to achieve simple things like rendering a cube, but that cube will be rendered using normal mapping, bloom, AA. etc.
The code is for DirectX 9.
I have the scripts written for the first 10 tutorials, but I did not have time to write the articles for them yet.
The tutorials do a lot of seemingly random shit, but all of them are there to fix some strange behaviors or instabilities. The written articles would contain explanations for those sections.
What is implemented:
- master control for post processing and MSAA
- MSAA and CSAA
- FXAA and SMAA
- directional light normal mapping with optional spherical harmonics hack ambient lighting
- good window control and input (screen capture friendly)
What I would like to add in the future:
- DirectX 10 mode
- physically based BRDF
- HDR rendering
- custom resolve for HDR with MSAA (only DirectX 10)
- flexible light composition scheme (almost done)
- depth of field
I hope I'll have time to write the articles and the rest of the tutorials, but until then I'm dropping a link here for anybody who might be interested:
Dependencies (needed only for compiling): http://dl.dropboxusercontent.com/u/45638513/sdx/Dependencies.zip
Media (needed for Tut11+): http://dl.dropboxusercontent.com/u/45638513/sdx/Media.zip
Important note: Tut11 though Tut13 need the contents of Media.zip. All Zip achieves should be unarchived. If you unarchive in foo, you should have foo/Media and foo/TutXX. If you also want to compile, you should have foo/Dependencies.
A few notes:
- you need .Net 4.0 or compatible
- you can find ready to run binaries for each tutorial in the TutXX/TutXX/bin folders (this is why the download is so big)
- the code should be ready to be compiled and run, but running will only work in release mode. For debug mode you need to copy the contents from bin/Release to /bin/Debug. I did not copy it for you in order to keep the download size smaller.
- there may be bugs. Some coding bug or some resulting from my lacking understanding of the topic. In particular bloom may seem kind of blocky in some samples.
If you find any bugs or have any feedback please let me know .
I'm learning DirectX 10 now for future tutorials. Did not manage to make any meaningful progress here because DXGI is just not cooperating and even without it rendering fullscreen with MSAA is 3-4 time slower than with DirectX 9.
Posted by DwarvesH on 15 December 2013 - 01:59 PM
It seems that this is due to the fact that DirectX uses a left-handed coordinate system and XNA a right-handed one.
For now I have multiplied the Z axis by -1 to fix this, but I might have to change the order of vertices in triangles. And I need to update my binary mesh importers to do the same.
Do I need tot do this to the translation matrices of bones too?
Posted by DwarvesH on 12 December 2013 - 10:48 AM
I have not encountered multithreading issues as of yet. Here is how I do it:
- The engine is aware if it is running normally, streaming data in (so it is still expected to run normally) and in full loading mode
- The rendering, eventing and GUI thread are the normal main thread, so I don't create a new thread for that.
- If in full loading mode I use short sleeps after presenting a frame to artificially limit the framerate.
- A second thread is running and loading resources.
- If some resources need to be only created in the main thread, the background thread still takes care of tasks that can still be run, like loading data from disk and computing, and once it is finished, in the main thread I create the structures based on the data prepared by the thread.
Posted by DwarvesH on 12 December 2013 - 04:09 AM
I figured this one out on my own. You need to load the compiled shader using ShaderBytecode:
ShaderBytecode s = ShaderBytecode.FromFile("shaders/SMAA.fxo"); smaaEffect = Effect.FromStream(Device, s.Data, ShaderFlags.None);
The most expensive shader now load 63 times faster!
Posted by DwarvesH on 20 September 2013 - 06:40 AM
Thank you very much for the info!
There's a great breakdown of how Just Cause 2 created such huge view distances here - http://www.humus.name/index.php?page=Articles. Just scroll down to Creating Vast Game Worlds.
If I remember right, JC2 has a view distance of around 50km. That article covers some of the details of how they achieved it.
Regarding depth precision, it can depend on the API you're using. I know with D3D11 you can use a 32 bit float depth buffer with reverse-Z, and that will give you a depth precision of about 0.01 out to just under 100,000 units (a.k.a 1cm precision at a depth of 99.9km). It's basically limited by the precision of a 32 bit float, which I believe is about 7.2 digits. If you're using OpenGL I've read there are other options such as logarithmic depth, which like reverse-Z float in D3D11 give a roughly linear depth precision similar to above, although it may disable early-Z culling optimisations. There's a really good writeup someone did about it a while back actually:
I looked over the links. The just cause presentation is quite readable and I'll continue to study it. Some great ideas over there. Yet, with all their tricks, optimization and experience they still have a lot of Z fighting? The second OpenGL link is more difficult to understand.
At first I'll focus on getting things to work as expected and look good without trying to fix the Z buffer. Even at a range of 2 km I started having horrible Z precision near the far plane, so I increased the near plane for now.
It seems that I have both under and overestimated to complexity of the problem and the factors involved. My old view range was just 500 meters by default, with more than 100 meters displaying fog. The spherical fog left you with even less perceived distance. Now I'm trying out a 2 km view range, still with the spherical fog. I'm still having most of the problems I described earlier, but the increased view range makes them all less apparent. I'm still forced to use alpha fog to reduce most of the artifacts.
The problem of making it look good, natural and distant got even more complicated because my maps will be 4x4 or 8x8 kilometers. On the 4x4 map with a 2 km view distance you can see almost 1/4 the way. On top of that, there is the added problem that the 8x8 maps while pretty big, did not feel that big under certain circumstances. Now with the bigger view distance they feel even smaller. Making the map 16x16 will eat up 2 GiB of disk space. My streamer can handle it, but still, pretty big.
It works pretty good at low height when surrounded by higher altitudes. In the cross-hair you can see a distant peak at around 1.9 km distance from the camera:
The look down from the top of the peak is less impressive:
So is the walk to the bottom. The illusion of scale is pretty much broken:
I'll try and add a further level to the geo-mipmapping a one to the quad tree and try to add another kilometer to the view distance to see how it looks. Maybe I can even render the entire map in the distance, but at 1/16 resolution.
It matters not which API you're using. You can interpret and store the position or its part in any way you wish in either GL or DX shaders.
I am using XNA and the max it supports is DepthFormat.Depth24, which I'm already using.
Posted by DwarvesH on 07 September 2013 - 02:55 AM
I did some smoothing using a box filter, but I couldn't find yet a method to create perfectly smooth terrain after the deformation, except to smooth the whole thing, which is undesirable. I think I need some slope analysis and do more powerful smoothing the bigger the slope is.
You can see a little bit of the smoothing in my latest video:
The smoothing is not that good but good enough for now.
But the main question is: how does the physics look?