To follow on MJP's great advice, have you tried using the performance tools in the latest versions of Visual Studio? They can show you a pretty good representation of the parallelism between the CPU and GPU, and will likely show you some insight into what is costing you time that stacks up in your overall frame time.
That is more or less correct. I have a structure that holds the references to the states (RenderEffect), and a material can reference a number of different RenderEffects for different situations. The higher level rendering pass is actually controlled in a separate object called a SceneRenderTask. This object is the one that sets up the pipeline outputs (i.e. render and depth targets) and provides whatever special logic is needed for that particular rendering pass. In your example of a mirror, the stencil rendering would be done in one pass and the reflected scene would be a second SceneRenderTask.
If you are interested in seeing how it works more closely, the whole engine is available as open source: Hieroglyph 3
I use an object that represents the rendering state the drawing to be done. This is contained within a material object, and each individual object in the scene can reference a material (via a smart pointer). This lets you have a shared material when it makes sense, or you can just as easily duplicate a material for a special object that wants to mutate its material state.
In the methods you show above, you are limiting yourself to a fixed number of states - if you make them their own objects then you can have an unlimited number of states to use.
Which version of Visual Studio were you using, and on which operating system? Also, was there a particular type of variable that had the issue (i.e. local, a certain type, input/output registers, etc...)? If you post a shader, we could try it out to see if it occurs the same on our end.
If you think about it, this is a similar problem that you have to face with a traditional depth buffer. You clear it to a particular value, which usually represents depths at the far clipping plane. If you don't render any geometry over a portion of the scene, then that portion will keep your default value even though no geometry is actually present in that location. Your G-Buffer is more complex, but retains the same idea. You are trying to make an area not updated have a defaulted appearance.
My suggestion would be to either ensure that the entire render target is rendered to every frame (i.e. a skybox to fill in the gaps) or just to default to an appropriate value to give the basic appearance that you are looking for. In your test scene shown above, you could for example just create a huge inside-out cube that your demo runs inside of. This would let you control the appearance of the entire render target, and you could use a default value for your render targets that indicates some error. Something like clearing the normal buffer to all zeros or very large values. This will give you a visual clue that some area of the scene didn't get rendered to and let you quickly figure out why!
Tried to debug my engine with VS graphics debugger and see that debugger`s frametime is the same as in my engine.
Just so I am sure that I understand, do you see the stuttering still in this frame time still?
Tried with nvidia GT 530 with vsync on. Stuttering has gone, but only with vsync = on. SO, I think this is likely related to some hardware problems.
This would not indicate to me that there is a hardware problem. Vsync essentially just forces your application to present at a fixed time interval, which is *precisely* what Buckeye suggested above. So there is likely something in your application that is causing a variable frame time, and you have to track it down. Start by isolating part of your application. Make a list of the things that are done every frame, and start commenting them out while taking measurements from the VS graphics debugger to see when it becomes smooth.
Once you have identified what the source of the stuttering is, then you can move on determine if this is something that can be fixed, or if it is inherently something that must be present. In the worst case, you can simply enable Vsync and forget about the issue, but this is a great opportunity to learn more about how your application is working - take advantage of it!
Could you please explain why this is happenning actually?
I have no idea, because I don't know anything about what you are doing. Did you check the frame time with a separate tool as I mentioned above? If so, what were the results? The nature of the changes in frame time will probably provide clues for you to figure out what the issue is. Trying to guess about the cause will not help - you need to take a factual set of measurements and logically come to a conclusion about what is affecting your program.
Have you tried using a separate tool to identify if your application is the cause of the jerkiness? You can use the performance analysis tools in VS2013 or VS2015 to see if your frame time is actually changing or not. If it isn't, then the issue is probably in your time calculation like Buckeye has been describing. If the frame time is changing and that is causing the stuttering, then you have something else to track down.
The description that you gave sounds perfectly fine, so why not just implement it and see how it goes? Especially when using a pre-built engine, it should be pretty quick to get the thing up and running and see what works well and what doesn't. Then you will have a little experience building that type of application, and you will figure out what needs to be improved and what doesn't.
If my guess is right, a PC game with broad HW support would either use DX12 plus DX11 for old windows, or would use Vulkan plus GL3 for old hardware... Which kinda sucks.
I actually think that there will be lots of mixes between D3D11 and D3D12. Even with all the extra power in D3D12, if you don't need cutting edge performance then D3D11 is going to be easier to use (plus there is lots of existing code bases out there already...). Over time I'm sure this will shift more and more towards D3D12, but I think it will take longer than most people are thinking right now...
Hi MJP, I'm actually using a very similar technique to the one in your book. Do you know of an efficient way to sort all those particles? I might be able to use a different render mode than standard alpha blending as C0lumbo suggested, these particles are representational more than realistic.
If you will be using append / consume buffers, then it is probably not going to be very easy to keep a maintained sorted list. You would be better off with a structured buffer that you maintained sorted order in, or using a round robin approach on the structured buffer to let you keep the particles in roughly correct order.
You can avoid redundant calls in your code, in any case.
if( texPtr != lastSetTexPtr ) // lastSetTexPtr is a static or class variable initialized to nullptr
Device->SetTexture( texPtr );
lastSetTexPtr = texPtr;
// or even Device->SetTexture( (lastSetTexPtr = texPtr) );
Also, this is the type of thing that is completely trivial to test out on your own - give it a try and see what type of results you can get. If you manage to identify a measureable difference, then that would make a much better discussion here than simply asking which one is faster.
EDIT: With that said, I would guess that they are equally fast since you are only setting a pointer in either case...
What feature level does your card support? For some reason, I seem to remember that series of graphics cards only supporting a subset of the complete D3D11 compute shader features. What is the response when you query for support of compute shaders like this:
m_pDevice->CheckFeatureSupport(D3D11_FEATURE_D3D10_X_HARDWARE_OPTIONS, &Options, sizeof(Options));
if ( Options.ComputeShaders_Plus_RawAndStructuredBuffers_Via_Shader_4_x )
Log::Get().Write( L"Device supports compute shaders plus raw and structured buffers via shader 4.x" );
Have you tried to run your original code on the reference device? That would eliminate the driver from the equation. Also, keep in mind that just because it runs on another computer doesn't put you in the clear. It could be that the other computer has a more lenient driver than it should be. Many, many thing can affect the system's stability including what other processes are running, how many processors you have, which GPU, and so on. There could still be an issue in your code, but if you can verify with the reference device it is a good start.
Do you happen to do any multithreading in your code?