Sign in to follow this  
bvanevery

DX11 DX11 multithreading - why bother?

Recommended Posts

bvanevery    174
DX11 allows a graphics pipeline to be multithreaded. But, why do I want to use my available CPU parallelism on feeding the graphics pipeline? Games have got other jobs to do, like physics simulation and AI. Coarse parallelism would seem to do fine here, and the app would be easier to write, debug, and port to other platforms. Maybe you say you want to use the GPU for physics simulation and AI, and so you need a tighter coupling between producer threads and the consuming graphics pipeline. Fine, but then you've locked yourself into DX11 HW. My 2.5 year old laptop is DX10 class HW, for instance. Also, your physics and AI code would be API specific. Not only does this limit you to Microsoft platforms, but GPUs do not have the nicest set of programming languages and tools available. We put up with GPUs when we want things to be fast; they're pretty much a detriment to programmer productivity. What am I missing here? Does anyone have a compelling rationale for bothering with more tightly coupled multithreading? Cynically, this seems like a way for Microsoft / NVIDIA / ATI to push perceived bells and whistles and sell HW "upgrades". Maybe they really can show a pure graphics benefit on high end HW with a lot of CPU cores. But most consumers don't have high end HW, and there's more to games than pure graphics. DX11 is way ahead of the installed base. Last I checked, consumers are only just now getting around to Vista / Windows 7 and DX10 class HW, and that took ~3 years. Do you want to waste all your time chasing around the top tier of game players? Some games have lost a lot of money doing that, like Crysis. Also, the performance results I've seen on my midrange consumer HW are not compelling: MultiThreadedRendering11 demo D3D11 Vsync off (640x480), R8G8B8A*_UNORM_SRGB (MS1, Q0) NVIDIA 8600M GT laptop, 256MB dedicated memory, driver 195.62 (this is DX10 class HW) windowed, with mouse focus in window ~22 fps Immediate ~20 fps Single Threaded, Deferred per Scene ~21 fps Multi Threaded, Deferred per Scene ~20 fps Single Threaded, Deferred per Chunk ~18 fps Multi Threaded, Deferred per Chunk Methodology: I manually observed the demo window. I picked fps values that seem to occur most frequently. I went through all the settings twice, just in case some system process happened to slow something down. These values seem reasonably stable. I didn't worry much about fractions. I wouldn't regard a difference of ~1 fps as significant, as it's probably a 0.5 fps difference. ~2 fps is observable, however. To the extent that multithreading matters at all, it seems to slow things down slightly. This demo does not make a compelling case for bothering with DX11 multithreading on midrange consumer HW. Does anyone have some code that demonstrates an actual benefit?

Share this post


Link to post
Share on other sites
darkelf2k5    286
DX11 multithreading needs to be supported by the hardware, otherwise it's just a software fallback and it's slower that way than immediate mode, obviously. AFAIK no pre-DX11 card supports it.

http://msdn.microsoft.com/en-us/library/ff476893%28VS.85%29.aspx

Share this post


Link to post
Share on other sites
Pyrogame    106
The other point is: FPS is an old value. It is ok for single threaded games. But for multi-threading, this only shows you how many frames the graphics device can render a scene. In background, the game can run the speed it wants and perform much complex things. The more complex a scene becomes, the more important multi-threading will be.

Share this post


Link to post
Share on other sites
DieterVW    724
Multithreading the graphics pipeline is nothing new to directx. DX9 had some parallel ability that many game companies made use of.

Why go to all of the dev and test effort to make a parallel API if no one wants it? Well, people do want it, game companies want it. DX10 had no multithreading abilities and many many requests came in asking for it. So lets look at some of the reasons why.

Object creation is slow. It can stall your rendering thread any time your app discovers that it needs to create a new object. These calls are slow enough that MS, ATI, NVIDIA all wrote white papers telling developers to avoid creating and destroying resources during the application runtime. The API supports multithreaded creates so that you can defer to the driver to pick the best times to create objects -- for instance when it has a few spare cycles -- which allows your rendering thread to continue its work -- which is to get stuff drawn to the screen.

Next, DX11 supports deffered contexts. These allow multiple threads to build command lists at the same time and for the DX runtime to preform validation on separate threads in advance. DX10 was an API redesign where one of the many goals was to reduce CPU overhead of the API calls. CPU overhead was a huge problem for DX9 -- and many many game companies were limited in what they could get on screen just because the API ate too much CPU. So DX10 reduce that cost significantly, in some places by an order of 10-100. However there are some calls that were difficult to trim down because the validation was necessary, or perhaps the driver had a lot of work to do. Being able to build command lists on separate CPU threads allows some of that work to take place in parallel and in advance of trying to actually draw the data. Several game studios are already taking advantage of deffered contexts and are seeing improvements in performance, even when using the CPU fallback for lack of driver support.

So, DX9 would allow roughly 2000 API calls per frame before the API would become a bottle neck, DX10 is around 12000, DX11 should be even higher when using deffered contexts. These are call limits based on using the whole API to do actual work, not just calling some API like SetPrimitiveTopology() x number of times. The trouble is that studios are trying to put more and more stuff on the screen and would surely take advantage of anything that could be provided performance wise.

Plus your engine has to do a lot of CPU work anyway to make draw calls. It has to build matrix transformations, sort objects and draw calls, make all sorts of decisions on what to draw and how to draw it. All of this could be done in parallel with big wins -- provided that your app actually has enough work to do that these things become bottle necks.

A consumer won't need high end hardware to take advantage of multithreading. It's all about preventing the GPU from being starved of data to crunch.

I don't think that there are many drivers out yet that fully support the multithreading APIs yet. This feature requires a lot of effort to get right and is a huge test burden -- but they will come out eventually.

The DX10.1 feature level supports hardware multithreading. This means that there is a reasonable sized slice of hardware out there already that can support this stuff once drivers arrive.

AAA Games take 2-4 years to develop - about the time span you pointed out required to adopt a new technology. Interesting how that works.

Vendor lock in is not an insurmountably problem for developers. The reality is that there are lots of game engines that wrap the graphics system into a layer so that they can run on xbox, or PC, or Playstation. These problems have been solved over and over again and are just part of reality. These same game engines have multithreaded deffered contexts built in because it makes a difference. DX11 gives them a way to map their engine API more closely to the hardware which results in a bigger win. There's no reason why an API should really lock you to any vendor if you layer your software. You want to support someone else, then target them too.

GPU tools and languages have been getting better and better over the years. Sure it's not as ideal as native tools, but it's getting there. With the spreading use of DX, compute, CUDA, opencl, etc. more and more people invest in GPU technologies which means that the whole infrastructure continues to improve. Lack of perfect tools shouldn't stop you from leveraging the amazing power of the GPU -- though I admit there are areas of debugging that are still frustrating but they will get better. People with a lot of practice writing shaders can actually get a lot done. It's not python, but it's also not asm.

A new API or hardware rev will always be ahead of the install base at launch time. This is not new.

Not all of the available API's are needed by every developer. Multithreading probably falls into one of those categories of optimization -- why do it if you don't have a problem. Granted multithreading normally requires a lot more forethought in code design, but I guarantee that if you're not seeing a win, it's because you're not running a scenario that it was designed to fix -- which is CPU and DX API bottle-necking.

Share this post


Link to post
Share on other sites
Hodgman    51234
Quote:
Original post by bvanevery
why do I want to use my available CPU parallelism on...
Why *don't* you want to use available parallelism on *everything*?
The game I'm writing at the moment is based on a SPMD (single program multiple data) type architecture, where essentially the same code is executed on every thread, with each thread processing a different range of the data. Every thread does physics together, then they all do AI together, then they all do rendering together, etc...

Share this post


Link to post
Share on other sites
namar777    488
Well there is a point, in that you don't have to use or follow multi-threading if the situation doesn't require. Just be flexible and pick the best suited tools/options/solutions for your project.

Share this post


Link to post
Share on other sites
bvanevery    174
Quote:
Original post by darkelf2k5
DX11 multithreading needs to be supported by the hardware, otherwise it's just a software fallback and it's slower that way than immediate mode, obviously. AFAIK no pre-DX11 card supports it.

http://msdn.microsoft.com/en-us/library/ff476893%28VS.85%29.aspx


That's not a HW support issue, that's a driver support issue. Theoretically, a DX11 multithreading application architecture should benefit a DX10 class card, if the drivers have been updated. In practice, I don't know if IHVs have updated their drivers, or will update them. It's quite possible that they'll be cheap bastards and expect people to just buy DX11 HW. If that happens in practice, then DX11 multithreading will have no benefit whatsoever on older HW.

I suppose I'll have to check my own driver. NVIDIA's support of older laptop HW has been notoriously poor. They dumped the problem in OEM's laps for some silly reason. For quite some time, their stock drivers refused to install on laptops; you had to get your driver from the OEM. Of course, the OEMs don't care about updating their drivers very often so you end up with really old drivers that don't have current features and fixes. Only recently did NVIDIA start to offer a stock driver that will work on laptops. There is still a disconnect as far as their most current drivers; for instance, the recently released OpenGL 3.3 driver will not install by default on my laptop. I have been getting around these problems using laptopvideo2go.com, a website that adds .inf files to enable the drivers on laptops. This doesn't help the general deployment situation however.

Share this post


Link to post
Share on other sites
bvanevery    174
Quote:
Original post by Pyrogame
The other point is: FPS is an old value. It is ok for single threaded games. But for multi-threading, this only shows you how many frames the graphics device can render a scene.


There is no readout for "CPU load" in the MultiThreadedRendering11 demo. This is unfortunate as it would be useful diagnostic information. That's part of why I asked if anyone had code that demonstrates an actual benefit.

Quote:
In background, the game can run the speed it wants and perform much complex things. The more complex a scene becomes, the more important multi-threading will be.


I think you may have missed the point. You don't need DX11 multithreading to do multithreading in your app. You can have an AI thread, a physics thread, or whatever. Your multithreading architecture will be simpler to write and debug, and it will not be tied to DX11.

Share this post


Link to post
Share on other sites
bvanevery    174
Quote:
Original post by Hodgman
Quote:
Original post by bvanevery
why do I want to use my available CPU parallelism on...
Why *don't* you want to use available parallelism on *everything*?


Because the debugging will drive you nuts.

Because it can easily become premature optimization.

Share this post


Link to post
Share on other sites
Promit    13246
A current high end desktop CPU has 8 hardware threads, and that number is only going to rise in the future. What possible reason could MS have for not improving multithreaded support? Coarse parallelism in games is okay up to 4 threads, maybe 6. Moving past that will require us to move beyond the rather naive approach of one graphics thread.
Quote:
Quote:
Quote:
Original post by bvanevery
why do I want to use my available CPU parallelism on...
Why *don't* you want to use available parallelism on *everything*?


Because the debugging will drive you nuts.
Jeez, it's not like these are problems never tackled before. People in other segments of software have been dealing with these issues for ages.

Share this post


Link to post
Share on other sites
bvanevery    174
Quote:
Original post by DieterVW
AAA Games take 2-4 years to develop - about the time span you pointed out required to adopt a new technology. Interesting how that works.


For an indie working on shorter development cycles, these adoption timelines make no sense. Yes, the way it works is whatever "heavyweight" development wants. NVIDIA / ATI / Microsoft / EA all pushing their core product, using lots of programmer worker bees to do it. It's mainly for selling more HW, more OSes, and more AAA games. Except that it clearly doesn't sell AAA games if you get on the tech bandwagon too early, as what happened to Crysis. So it's mainly about selling more HW and more OSes... except that most consumers have wised up.

Share this post


Link to post
Share on other sites
bvanevery    174
Quote:
Original post by Promit
A current high end desktop CPU has 8 hardware threads, and that number is only going to rise in the future. What possible reason could MS have for not improving multithreaded support? Coarse parallelism in games is okay up to 4 threads, maybe 6. Moving past that will require us to move beyond the rather naive approach of one graphics thread.
Quote:
Quote:
Quote:
Original post by bvanevery
why do I want to use my available CPU parallelism on...
Why *don't* you want to use available parallelism on *everything*?


Because the debugging will drive you nuts.
Jeez, it's not like these are problems never tackled before. People in other segments of software have been dealing with these issues for ages.


It's been tackled before, it will be tacked over and over again forever. It will still drive you nuts. As in, make development costs more expensive and time consuming.


Share this post


Link to post
Share on other sites
Promit    13246
Everything makes development more expensive and time consuming. That's why major projects now are 30M-50M budgets. What on earth does any of it have to do with DX11 multithreading? Or indies?

Share this post


Link to post
Share on other sites
bvanevery    174
Quote:
Original post by Promit
Everything makes development more expensive and time consuming. That's why major projects now are 30M-50M budgets. What on earth does any of it have to do with DX11 multithreading? Or indies?


Indies don't spend 30M..50M, DUH. It's about what API investments make sense from a money standpoint, and what's a trap / treadmill.

Share this post


Link to post
Share on other sites
Promit    13246
So you're saying that DX11 is a terrible choice for indies because the optional multithreading support doesn't work well on your laptop?

Share this post


Link to post
Share on other sites
Pyrogame    106
Quote:
Original post by bvanevery
Quote:
Original post by Pyrogame
The other point is: FPS is an old value. It is ok for single threaded games. But for multi-threading, this only shows you how many frames the graphics device can render a scene.


There is no readout for "CPU load" in the MultiThreadedRendering11 demo. This is unfortunate as it would be useful diagnostic information. That's part of why I asked if anyone had code that demonstrates an actual benefit.


My current engine runs a test scene on a single HT (Hardware Thread) with ~80 FPS with 16% global CPU-load. If I enable all the 8 threads, then this boosts the engine to ~3k FPS with near to 50% global CPU-load. Because the CPU uses Hyprethreading, the 50% is a very good value. With only 2 HT's enabled on different cores, I get 2.5k FPS with 24% load.

Ofcourse my engine does not render only things, but it calculates some other stuff (zero-gravity w/o collision detection physics, no AI). It renders a GUI, which renders the world on a window. The entire engine is based on a job manager, which creates at least 4 job workers (for example on a single HT). If the system has more then 4 HT's, more job workers will be created. Then all the work is done by jobs. If the engine want to calculate something, a job is created an in realtime attached to a job worker. Every camera (which is the world camera, gui camera, shadow camera, etc.) has its own rendering job. Each job can have a state machine, which can pause the calculation, if the job has dependency to another job. Because of this, I do not have any job or thread, that is called "the main renderer". All the rendering jobs can use the immediate context to calculate their prepared deffered contexts. But you have to synchronize your device to do this, because the device itself has not a real multithreding API (the device driver itself blocks the calls, so you get an exception, but not a self destructing graphics card ^^).

DX11 delivers the support for multithreading meaning the deffered contexts, that is in my opinion a very nice feature.

Share this post


Link to post
Share on other sites
bvanevery    174
Quote:
Original post by Promit
So you're saying that DX11 is a terrible choice for indies because the optional multithreading support doesn't work well on your laptop?


Not just my laptop, probably 90% of the installed base.

Share this post


Link to post
Share on other sites
_the_phantom_    11250
So, what you are saying is that because 90% (which sounds like a bullstat to me) of people currently can't then we shouldnt come up with technology to use in the future now?

So, what, in your world do we wait until everyone has at least 8 cores then dump this tech on people and say 'hey! get good at it now!'.. that's just madness and doing so would just stop progress.

No one says 'because you are using DX11 you must use Multi-threading' yet at the same time if you are targetting high end systems (my current target hardware is DX11 cards, 4 core/8thread systems) then it gives you a wonderful chunk of flexibility.

Oh, and if you are careful then frankly MT code is easy to write; hell back when I was 21 and pretty green I wrote an application which would query 50K game servers in less than 3 mins using a multi-threaded app. At the time I had practically zero experiance with networking and threads and wrote the whole thing in about 8 weeks; when it went live it never once crashed or gave the wrong output despite running every 3mins 24h a day.

And that was with raw threads, with task based systems it is even easier these days to do MT with existing libraries (be it MS's Concurrency Runtime, Intel's Threading Building Blocks or the .Net concurreny stuff).

Sure, you'll get bugs but if you are careful at what you write they won't be that hard to figure out... so either it's easier than people make out or I'm some sort of coding/design/multithreading god... come to think of it I'm good with either answer [grin]

Share this post


Link to post
Share on other sites
bvanevery    174
Quote:
Original post by phantom
So, what you are saying is that because 90% (which sounds like a bullstat to me)


Indeed. I was being too kind.

Quote:
of people currently can't then we shouldnt come up with technology to use in the future now?


I've watched the DX10 API impasse for ~3 years. Have fun watching the paint dry with DX11 for ~3 as well. The reality is that most games start life on consoles and they have DX9 HW specs.

Share this post


Link to post
Share on other sites
_the_phantom_    11250
Yes, your link just proved my point somewhat; if you drill down into the numbers then you'll see that 26.54% of people have 4 core CPUs, which is an increase of 3.5% over Jan's numbers.

Now, maybe my maths isn't too hot so remind me; what is 100 - 26.54? Is it 90? I can't recall?

As for DX10, it as an API strangled by the FUD thrown at Vista; DX11 on the other hand has had games AT LAUNCH which support it.

And you also didn't answer my question; how are we, as game programmers, meant to test out multi-threaded designs without the API support there? Because I'd lay money on MS's next console supporting DX11 style multi-threaded submission and more cores in general so by learning how to do things NOW means we'll be better positioned in the future.

But hey, if you want to stay with single threaded stuff here is the scoop; no one is going to stop you. Carry on as you were and all that.. the rest of us will be over here, trying to advance the state of the art instead of holding back advancement...

Share this post


Link to post
Share on other sites
MJP    19754
Quote:
Original post by bvanevery
The reality is that most games start life on consoles and they have DX9 HW specs.


On consoles you can multithread your command buffer generation. PC was the the odd man out in this regard until D3D11 came along.

Share this post


Link to post
Share on other sites
bvanevery    174
Quote:
Original post by phantom
Yes, your link just proved my point somewhat; if you drill down into the numbers then you'll see that 26.54% of people have 4 core CPUs, which is an increase of 3.5% over Jan's numbers.


3.29% are DX11 systems. Read my original post. I'm not against multithreading, I'm against multithreading that's tied to the DX11 API. Most of the installed base does not have enough cores to waste them on 3D graphics. Games have got other things they need to do.

Quote:
the rest of us will be over here, trying to advance the state of the art instead of holding back advancement...


You mean like Crysis? You learn slowly.

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this  

  • Similar Content

    • By isu diss
       I'm trying to code Rayleigh part of Nishita's model (Display Method of the Sky Color Taking into Account Multiple Scattering). I get black screen no colors. Can anyone find the issue for me?
       
      #define InnerRadius 6320000 #define OutterRadius 6420000 #define PI 3.141592653 #define Isteps 20 #define Ksteps 10 static float3 RayleighCoeffs = float3(6.55e-6, 1.73e-5, 2.30e-5); RWTexture2D<float4> SkyColors : register (u0); cbuffer CSCONSTANTBUF : register( b0 ) { float fHeight; float3 vSunDir; } float Density(float Height) { return exp(-Height/8340); } float RaySphereIntersection(float3 RayOrigin, float3 RayDirection, float3 SphereOrigin, float Radius) { float t1, t0; float3 L = SphereOrigin - RayOrigin; float tCA = dot(L, RayDirection); if (tCA < 0) return -1; float lenL = length(L); float D2 = (lenL*lenL) - (tCA*tCA); float Radius2 = (Radius*Radius); if (D2<=Radius2) { float tHC = sqrt(Radius2 - D2); t0 = tCA-tHC; t1 = tCA+tHC; } else return -1; return t1; } float RayleighPhaseFunction(float cosTheta) { return ((3/(16*PI))*(1+cosTheta*cosTheta)); } float OpticalDepth(float3 StartPosition, float3 EndPosition) { float3 Direction = normalize(EndPosition - StartPosition); float RayLength = RaySphereIntersection(StartPosition, Direction, float3(0, 0, 0), OutterRadius); float SampleLength = RayLength / Isteps; float3 tmpPos = StartPosition + 0.5 * SampleLength * Direction; float tmp; for (int i=0; i<Isteps; i++) { tmp += Density(length(tmpPos)-InnerRadius); tmpPos += SampleLength * Direction; } return tmp*SampleLength; } static float fExposure = -2; float3 HDR( float3 LDR) { return 1.0f - exp( fExposure * LDR ); } [numthreads(32, 32, 1)] //disptach 8, 8, 1 it's 256 by 256 image void ComputeSky(uint3 DTID : SV_DispatchThreadID) { float X = ((2 * DTID.x) / 255) - 1; float Y = 1 - ((2 * DTID.y) / 255); float r = sqrt(((X*X)+(Y*Y))); float Theta = r * (PI); float Phi = atan2(Y, X); static float3 Eye = float3(0, 10, 0); float ViewOD = 0, SunOD = 0, tmpDensity = 0; float3 Attenuation = 0, tmp = 0, Irgb = 0; //if (r<=1) { float3 ViewDir = normalize(float3(sin(Theta)*cos(Phi), cos(Theta),sin(Theta)*sin(Phi) )); float ViewRayLength = RaySphereIntersection(Eye, ViewDir, float3(0, 0, 0), OutterRadius); float SampleLength = ViewRayLength / Ksteps; //vSunDir = normalize(vSunDir); float cosTheta = dot(normalize(vSunDir), ViewDir); float3 tmpPos = Eye + 0.5 * SampleLength * ViewDir; for(int k=0; k<Ksteps; k++) { float SunRayLength = RaySphereIntersection(tmpPos, vSunDir, float3(0, 0, 0), OutterRadius); float3 TopAtmosphere = tmpPos + SunRayLength*vSunDir; ViewOD = OpticalDepth(Eye, tmpPos); SunOD = OpticalDepth(tmpPos, TopAtmosphere); tmpDensity = Density(length(tmpPos)-InnerRadius); Attenuation = exp(-RayleighCoeffs*(ViewOD+SunOD)); tmp += tmpDensity*Attenuation; tmpPos += SampleLength * ViewDir; } Irgb = RayleighCoeffs*RayleighPhaseFunction(cosTheta)*tmp*SampleLength; SkyColors[DTID.xy] = float4(Irgb, 1); } }  
    • By Endurion
      I have a gaming framework with an renderer interface. Those support DX8, DX9 and latest, DX11. Both DX8 and DX9 use fixed function pipeline, while DX11 obviously uses shaders. I've got most of the parts working fine, as in I can switch renderers and notice almost no difference. The most advanced features are 2 directional lights with a single texture  
      My last problem is lighting; albeit there's documentation on the D3D lighting model I still can't get the behaviour right. My mistake shows most prominently in the dark side opposite the lights. I'm pretty sure the ambient calculation is off, but that one's supposed to be the most simple one and should be hard to get wrong.
      Interestingly I've been searching high and low, and have yet to find a resource that shows how to build a HLSL shader where diffuse, ambient and specular are used together with material properties. I've got various shaders for all the variations I'm supporting. I stepped through the shader with the graphics debugger, but the calculation seems to do what I want. I'm just not sure the formula is correct.
      This one should suffice though, it's doing two directional lights, texture modulated with vertex color and a normal. Maybe someone can spot one (or more mistakes). And yes, this is in the vertex shader and I'm aware lighting will be as "bad" as in fixed function; that's my goal currently.
      // A constant buffer that stores the three basic column-major matrices for composing geometry. cbuffer ModelViewProjectionConstantBuffer : register(b0) { matrix model; matrix view; matrix projection; matrix ortho2d; }; struct DirectionLight { float3 Direction; float PaddingL1; float4 Ambient; float4 Diffuse; float4 Specular; }; cbuffer LightsConstantBuffer : register( b1 ) { float4 Ambient; float3 EyePos; float PaddingLC1; DirectionLight Light[8]; }; struct Material { float4 MaterialEmissive; float4 MaterialAmbient; float4 MaterialDiffuse; float4 MaterialSpecular; float MaterialSpecularPower; float3 MaterialPadding; }; cbuffer MaterialConstantBuffer : register( b2 ) { Material _Material; }; // Per-vertex data used as input to the vertex shader. struct VertexShaderInput { float3 pos : POSITION; float3 normal : NORMAL; float4 color : COLOR0; float2 tex : TEXCOORD0; }; // Per-pixel color data passed through the pixel shader. struct PixelShaderInput { float4 pos : SV_POSITION; float2 tex : TEXCOORD0; float4 color : COLOR0; }; // Simple shader to do vertex processing on the GPU. PixelShaderInput main(VertexShaderInput input) { PixelShaderInput output; float4 pos = float4( input.pos, 1.0f ); // Transform the vertex position into projected space. pos = mul(pos, model); pos = mul(pos, view); pos = mul(pos, projection); output.pos = pos; // pass texture coords output.tex = input.tex; // Calculate the normal vector against the world matrix only. //set required lighting vectors for interpolation float3 normal = mul( input.normal, ( float3x3 )model ); normal = normalize( normal ); float4 ambientEffect = Ambient; float4 diffuseEffect = float4( 0, 0, 0, 0 ); float4 specularEffect = float4( 0, 0, 0, 0 ); for ( int i = 0; i < 2; ++i ) { // Invert the light direction for calculations. float3 lightDir = -Light[i].Direction; float lightFactor = max( dot( lightDir, input.normal ), 0 ); ambientEffect += Light[i].Ambient * _Material.MaterialAmbient; diffuseEffect += saturate( Light[i].Diffuse * dot( normal, lightDir ) );// * _Material.MaterialDiffuse; //specularEffect += Light[i].Specular * dot( normal, halfangletolight ) * _Material.MaterialSpecularPower; } specularEffect *= _Material.MaterialSpecular; //ambientEffect.w = 1.0; ambientEffect = normalize( ambientEffect ); /* Ambient effect: (L1.ambient + L2.ambient) * object ambient color Diffuse effect: (L1.diffuse * Dot(VertexNormal, Light1.Direction) + L2.diffuse * Dot(VertexNormal, Light2.Direction)) * object diffuse color Specular effect: (L1.specular * Dot(VertexNormal, HalfAngleToLight1) * Object specular reflection power + L2.specular * Dot(VertexNormal, HalfAngleToLight2) * Object specular reflection power ) * object specular color Resulting color = Ambient effect + diffuse effect + specular effect*/ float4 totalFactor = ambientEffect + diffuseEffect + specularEffect; totalFactor.w = 1.0; output.color = input.color * totalFactor; return output; }   Edit: This message editor is driving me nuts (Arrrr!) - I don't write code in Word.
    • By Mercesa
      Hey folks. So I'm having this problem in which if my camera is close to a surface, the SSAO pass suddenly spikes up to around taking 16 milliseconds.
      When still looking towards the same surface, but less close. The framerate resolves itself and becomes regular again.
      This happens with ANY surface of my model, I am a bit clueless in regards to what could cause this. Any ideas?
      In attached image: y axis is time in ms, x axis is current frame. The dips in SSAO milliseconds are when I moved away from the surface, the peaks happen when I am very close to the surface.

       
      Edit: So I've done some more in-depth profiling with Nvidia nsight. So these are the facts from my results
      Count of command buffers goes from 4 (far away from surface) to ~20(close to surface).
      The command buffer duration in % goes from around ~30% to ~99%
      Sometimes the CPU duration takes up to 0.03 to 0.016 milliseconds per frame while comparatively usually it takes around 0.002 milliseconds.
      I am using a vertex shader which generates my full-screen quad and afterwards I do my SSAO calculations in my pixel shader, could this be a GPU driver bug? I'm a bit lost myself. It seems there could be a CPU/GPU resource stall. But why would the amount of command buffers be variable depending on distance from a surface?
       
       
      Edit n2: Any resolution above 720p starts to have this issue, and I am fairly certain my SSAO is not that performance heavy it would crap itself at a bit higher resolutions.
       
    • By turanszkij
      In DirectX 11 we have a 24 bit integer depth + 8bit stencil format for depth-stencil resources ( DXGI_FORMAT_D24_UNORM_S8_UINT ). However, in an AMD GPU documentation for consoles I have seen they mentioned, that internally this format is implemented as a 64 bit resource with 32 bits for depth (but just truncated for 24 bits) and 32 bits for stencil (truncated to 8 bits). AMD recommends using a 32 bit floating point depth buffer instead with 8 bit stencil which is this format: DXGI_FORMAT_D32_FLOAT_S8X24_UINT.
      Does anyone know why this is? What is the usual way of doing this, just follow the recommendation and use a 64 bit depthstencil? Are there performance considerations or is it just recommended to not waste memory? What about Nvidia and Intel, is using a 24 bit depthbuffer relevant on their hardware?
      Cheers!
       
    • By gsc
      Hi! I am trying to implement simple SSAO postprocess. The main source of my knowledge on this topic is that awesome tutorial.
      But unfortunately something doesn't work... And after a few long hours I need some help. Here is my hlsl shader:
      float3 randVec = _noise * 2.0f - 1.0f; // noise: vec: {[0;1], [0;1], 0} float3 tangent = normalize(randVec - normalVS * dot(randVec, normalVS)); float3 bitangent = cross(tangent, normalVS); float3x3 TBN = float3x3(tangent, bitangent, normalVS); float occlusion = 0.0; for (int i = 0; i < kernelSize; ++i) { float3 samplePos = samples[i].xyz; // samples: {[-1;1], [-1;1], [0;1]} samplePos = mul(samplePos, TBN); samplePos = positionVS.xyz + samplePos * ssaoRadius; float4 offset = float4(samplePos, 1.0f); offset = mul(offset, projectionMatrix); offset.xy /= offset.w; offset.y = -offset.y; offset.xy = offset.xy * 0.5f + 0.5f; float sampleDepth = tex_4.Sample(textureSampler, offset.xy).a; sampleDepth = vsPosFromDepth(sampleDepth, offset.xy).z; const float threshold = 0.025f; float rangeCheck = abs(positionVS.z - sampleDepth) < ssaoRadius ? 1.0 : 0.0; occlusion += (sampleDepth <= samplePos.z + threshold ? 1.0 : 0.0) * rangeCheck; } occlusion = saturate(1 - (occlusion / kernelSize)); And current result: http://imgur.com/UX2X1fc
      I will really appreciate for any advice!
  • Popular Now