Sign in to follow this  
Tasche

DX11 Rendertarget switching with deferred lighting

Recommended Posts

hello again, this time im doing deferred lighting following this concept:

http://kayru.org/articles/deferred-stencil/

 

ive got it implemented, but some questions remain...

-in the first pass you need to turn off color, while the second one draws to the rendertarget. at the moment i achieve this by setting the rendertarget to 0 during the first pass. however rendertarget switching is expensive, so i wonder if i should do this maybe with alpha blending. i know i can simply try it out, but i don't want a days worth of bughunting just to find out my original solution was better, and rather profit from someone elses experience. any other clever (i.e. fast) ways to mask out color drawing?

 

-also, when doing multiple lights, i save a light color and specular factor in the rendertarget and accumulate values for all lights in the same target. is this the right way to do it?

 

in pseudocode, i do a loop over my lights which looks something like this:

 

for (all lights)

{

     set_rendertarget(null,depthstencil)

     renderfrontfaces()

 

     set_rendertarget(colorbuffer,depthstencil)

     renderbackfaces()

}

 

so my loop always switches between those targets, feels kinda expensive (atm my fps drop from 200 to 60 with 50 lights each covering ~30% of the screen, even though their pixelshader code does almost nothing, only a tex lookup (well overdraw also kills fps, but that much?)). doing the first pass with all lights, then the second doesnt work obviously, because backfaces of some light may coincide with frontfaces of another drawn earlier, and produce a draw area which should've been stenciled out.

 

im using dx11 btw, if it matters...

 

any thoughts how to do this properly is appreciated

 

cheers,

   tasche

Share this post


Link to post
Share on other sites

that is a lot of rendertarget switching.  why not drop the stencil buffer and just handle it in the light shader.  unproject from screen space coord and depth value to world space position and discard fragment if its outside the bounds of the light (super easy to calculate for boxes, spheres, cones and probably doing it anyway when computing attenuation).  then you can just set the render target and render all your lights in one go.

Share this post


Link to post
Share on other sites

If you have to switch render targets anyways (i.e. from the previous final frame rendering) then you aren't really losing anything by only binding the depth buffer and then after the first step binding the render target. 

 

However, if you really want to try something else, you could always use the RenderTargetWriteMask in the blend state.  I'm pretty sure using the render target set to null is going to be significantly faster than this though, since this will allow your pixel shader to continue operating and just throws away the results (whereas with no render target the pixel shader probably won't execute at all).

Share this post


Link to post
Share on other sites

so first thanks for those quick responses...

that is a lot of rendertarget switching.  why not drop the stencil buffer and just handle it in the light shader.  unproject from screen space coord and depth value to world space position and discard fragment if its outside the bounds of the light (super easy to calculate for boxes, spheres, cones and probably doing it anyway when computing attenuation).  then you can just set the render target and render all your lights in one go.

hm i heard about people doing something like this, but to be honest i couldn't figure out how. doing a bounding volume test on a pixel's world position seems rather expensive  compared to rendering a low poly sphere twice, since it involves trigonometry, and has to be done for every pixel covered by the sphere (which still has to be rendered), not only the lit areas. but i probably am missing some integral part of the algorithm. got any links? googling something like 'deferred lighting without stencil' lists a bunch of stencil algorithms -.-' but as soon as i find some info on this ill definitely look into it.

 

If you have to switch render targets anyways (i.e. from the previous final frame rendering) then you aren't really losing anything by only binding the depth buffer and then after the first step binding the render target. 

 

However, if you really want to try something else, you could always use the RenderTargetWriteMask in the blend state.  I'm pretty sure using the render target set to null is going to be significantly faster than this though, since this will allow your pixel shader to continue operating and just throws away the results (whereas with no render target the pixel shader probably won't execute at all).

to the first part: but the color buffer always get bound and unbound (depthstencil remains) for every light iteration. if the card/driver is clever it will optimize it to just skip the pix shader, in which case i get optimal performance (best case), if not it will move the entire light accumulation buffer into cache and out again (worst case), which i think is actually happening.

to the second part: hm so you are saying the way i got it at the moment is faster? that word 'allow' confuses me, since it implies the opposite =)

 

pls post back once you read this guys! thanks again by the way...

Share this post


Link to post
Share on other sites

so first thanks for those quick responses...

that is a lot of rendertarget switching.  why not drop the stencil buffer and just handle it in the light shader.  unproject from screen space coord and depth value to world space position and discard fragment if its outside the bounds of the light (super easy to calculate for boxes, spheres, cones and probably doing it anyway when computing attenuation).  then you can just set the render target and render all your lights in one go.

hm i heard about people doing something like this, but to be honest i couldn't figure out how. doing a bounding volume test on a pixel's world position seems rather expensive  compared to rendering a low poly sphere twice, since it involves trigonometry, and has to be done for every pixel covered by the sphere (which still has to be rendered), not only the lit areas. but i probably am missing some integral part of the algorithm. got any links? googling something like 'deferred lighting without stencil' lists a bunch of stencil algorithms -.-' but as soon as i find some info on this ill definitely look into it.

 

>

If you have to switch render targets anyways (i.e. from the previous final frame rendering) then you aren't really losing anything by only binding the depth buffer and then after the first step binding the render target. 

 

However, if you really want to try something else, you could always use the RenderTargetWriteMask in the blend state.  I'm pretty sure using the render target set to null is going to be significantly faster than this though, since this will allow your pixel shader to continue operating and just throws away the results (whereas with no render target the pixel shader probably won't execute at all).

to the first part: but the color buffer always get bound and unbound (depthstencil remains) for every light iteration. if the card/driver is clever it will optimize it to just skip the pix shader, in which case i get optimal performance (best case), if not it will move the entire light accumulation buffer into cache and out again (worst case), which i think is actually happening.

to the second part: hm so you are saying the way i got it at the moment is faster? that word 'allow' confuses me, since it implies the opposite =)

 

pls post back once you read this guys! thanks again by the way...

That's right - I meant that using the blend state would allow your pixel shader to run --> meaning it will be slower than just enabling and disabling the whole render target.  I know you don't want to hear this, but the best way is just to try it out - it should be very easy to test out, and you can verify that you are doing things correctly with PIX / Graphics Debugger too.

Share this post


Link to post
Share on other sites

so first thanks for those quick responses...

that is a lot of rendertarget switching.  why not drop the stencil buffer and just handle it in the light shader.  unproject from screen space coord and depth value to world space position and discard fragment if its outside the bounds of the light (super easy to calculate for boxes, spheres, cones and probably doing it anyway when computing attenuation).  then you can just set the render target and render all your lights in one go.

hm i heard about people doing something like this, but to be honest i couldn't figure out how. doing a bounding volume test on a pixel's world position seems rather expensive  compared to rendering a low poly sphere twice, since it involves trigonometry, and has to be done for every pixel covered by the sphere (which still has to be rendered), not only the lit areas. but i probably am missing some integral part of the algorithm. got any links? googling something like 'deferred lighting without stencil' lists a bunch of stencil algorithms -.-' but as soon as i find some info on this ill definitely look into it.

 

so in point light rendered as sphere you're probably already computing the attenuation with some linear fall off that hits zero at light radius.  if the distance between light pos and world pos is greater than the light radius then you don't want to light that point so you either fully attenuate or discard (depending on expense of the rest of your shader)

Share this post


Link to post
Share on other sites

so in point light rendered as sphere you're probably already computing the attenuation with some linear fall off that hits zero at light radius.  if the distance between light pos and world pos is greater than the light radius then you don't want to light that point so you either fully attenuate or discard (depending on expense of the rest of your shader)

consider following situation (picture done in pov-ray, just for demo purposes)

brick wall + blue stuff = a floor and a ...well ... a wall in the gbuffer

white sphere = my pointlight bounding object

small ellipses = intersections of the bounding object with wall and floor

 

http://imagebin.org/248058 (sry somehow i cant post images directly)

 

my code will only run the pixel shader for the small ellipses in the white sphere. if i understand your suggestion correctly, the pixel shader will run for every pixel in the white sphere, and do at least a distance testing (trigonometry) before exiting.

 

since the sphere covers nearly the entire screen, this will be very expensive compared to my method. of course its a constructed situation, but i wouldn't say its uncommon.

the way i do it will always be less or same (since it only runs a pixel shader on intersections with the light sphere).

true, the sphere has to be rendered twice, but a modern GPU tears through a low vertex count vertex shader like a knife through butter, and aside from that rendertarget switch and some depthstencil settings data remains in cache for both passes. its definitly the target switch that is painful (provided that it is needed at all).

im still not 100% sure i got your method right, because i know a lot of people do deferred lighting in one pass, i just cant find proper info on it)

 

guess ill just have to try some alpha technique... if anyone knows anything else on how to mask color buffer (i may need it for something else, you never know :D) pls share!

 

on a side note, i noticed my severe frame rate drops were due to me loading and unloading the gbuffer as resource (3 fullHD size 32 bit textures) for every light, after fixing that i can render up to 250 lights at otherwise same settings/resulting fps. stupid me. i'm still very interested in the answers to my question though, but with this i can work.

Share this post


Link to post
Share on other sites

ah sry to dig out this old one, but just for completeness, i tried the alpha = 0 version and just setting first pass target to 0 is marginally quicker.

so if anyone ever wondered, go for a nulltarget =)

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this  

  • Announcements

  • Forum Statistics

    • Total Topics
      628354
    • Total Posts
      2982237
  • Similar Content

    • By joeblack
      Hi,
      im reading about specular aliasing because of mip maps, as far as i understood it, you need to compute fetched normal lenght and detect now its changed from unit length. I’m currently using BC5 normal maps, so i reconstruct z in shader and therefore my normals are normalized. Can i still somehow use antialiasing or its not needed? Thanks.
    • By 51mon
      I want to change the sampling behaviour to SampleLevel(coord, ddx(coord.y).xx, ddy(coord.y).xx). I was just wondering if it's possible without explicit shader code, e.g. with some flags or so?
    • By GalacticCrew
      Hello,
      I want to improve the performance of my game (engine) and some of your helped me to make a GPU Profiler. After creating the GPU Profiler, I started to measure the time my GPU needs per frame. I refined my GPU time measurements to find my bottleneck.
      Searching the bottleneck
      Rendering a small scene in an Idle state takes around 15.38 ms per frame. 13.54 ms (88.04%) are spent while rendering the scene, 1.57 ms (10.22%) are spent during the SwapChain.Present call (no VSync!) and the rest is spent on other tasks like rendering the UI. I further investigated the scene rendering, since it takes über 88% of my GPU frame rendering time.
      When rendering my scene, most of the time (80.97%) is spent rendering my models. The rest is spent to render the background/skybox, updating animation data, updating pixel shader constant buffer, etc. It wasn't really suprising that most of the time is spent for my models, so I further refined my measurements to find the actual bottleneck.
      In my example scene, I have five animated NPCs. When rendering these NPCs, most actions are almost for free. Setting the proper shaders in the input layout (0.11%), updating vertex shader constant buffers (0.32%), setting textures (0.24%) and setting vertex and index buffers (0.28%). However, the rest of the GPU time (99.05% !!) is spent in two function calls: DrawIndexed and DrawIndexedInstance.
      I searched this forum and the web for other articles and threads about these functions, but I haven't found a lot of useful information. I use SharpDX and .NET Framework 4.5 to develop my game (engine). The developer of SharpDX said, that "The method DrawIndexed in SharpDX is a direct call to DirectX" (Source). DirectX 11 is widely used and SharpDX is "only" a wrapper for DirectX functions, I assume the problem is in my code.
      How I render my scene
      When rendering my scene, I render one model after another. Each model has one or more parts and one or more positions. For example, a human model has parts like head, hands, legs, torso, etc. and may be placed in different locations (on the couch, on a street, ...). For static elements like furniture, houses, etc. I use instancing, because the positions never change at run-time. Dynamic models like humans and monster don't use instancing, because positions change over time.
      When rendering a model, I use this work-flow:
      Set vertex and pixel shaders, if they need to be updated (e.g. PBR shaders, simple shader, depth info shaders, ...) Set animation data as constant buffer in the vertex shader, if the model is animated Set generic vertex shader constant buffer (world matrix, etc.) Render all parts of the model. For each part: Set diffuse, normal, specular and emissive texture shader views Set vertex buffer Set index buffer Call DrawIndexedInstanced for instanced models and DrawIndexed models What's the problem
      After my GPU profiling, I know that over 99% of the rendering time for a single model is spent in the DrawIndexedInstanced and DrawIndexed function calls. But why do they take so long? Do I have to try to optimize my vertex or pixel shaders? I do not use other types of shaders at the moment. "Le Comte du Merde-fou" suggested in this post to merge regions of vertices to larger vertex buffers to reduce the number of Draw calls. While this makes sense to me, it does not explain why rendering my five (!) animated models takes that much GPU time. To make sure I don't analyse something I wrong, I made sure to not use the D3D11_CREATE_DEVICE_DEBUG flag and to run as Release version in Visual Studio as suggested by Hodgman in this forum thread.
      My engine does its job. Multi-texturing, animation, soft shadowing, instancing, etc. are all implemented, but I need to reduce the GPU load for performance reasons. Each frame takes less than 3ms CPU time by the way. So the problem is on the GPU side, I believe.
    • By noodleBowl
      I was wondering if someone could explain this to me
      I'm working on using the windows WIC apis to load in textures for DirectX 11. I see that sometimes the WIC Pixel Formats do not directly match a DXGI Format that is used in DirectX. I see that in cases like this the original WIC Pixel Format is converted into a WIC Pixel Format that does directly match a DXGI Format. And doing this conversion is easy, but I do not understand the reason behind 2 of the WIC Pixel Formats that are converted based on Microsoft's guide
      I was wondering if someone could tell me why Microsoft's guide on this topic says that GUID_WICPixelFormat40bppCMYKAlpha should be converted into GUID_WICPixelFormat64bppRGBA and why GUID_WICPixelFormat80bppCMYKAlpha should be converted into GUID_WICPixelFormat64bppRGBA
      In one case I would think that: 
      GUID_WICPixelFormat40bppCMYKAlpha would convert to GUID_WICPixelFormat32bppRGBA and that GUID_WICPixelFormat80bppCMYKAlpha would convert to GUID_WICPixelFormat64bppRGBA, because the black channel (k) values would get readded / "swallowed" into into the CMY channels
      In the second case I would think that:
      GUID_WICPixelFormat40bppCMYKAlpha would convert to GUID_WICPixelFormat64bppRGBA and that GUID_WICPixelFormat80bppCMYKAlpha would convert to GUID_WICPixelFormat128bppRGBA, because the black channel (k) bits would get redistributed amongst the remaining 4 channels (CYMA) and those "new bits" added to those channels would fit in the GUID_WICPixelFormat64bppRGBA and GUID_WICPixelFormat128bppRGBA formats. But also seeing as there is no GUID_WICPixelFormat128bppRGBA format this case is kind of null and void
      I basically do not understand why Microsoft says GUID_WICPixelFormat40bppCMYKAlpha and GUID_WICPixelFormat80bppCMYKAlpha should convert to GUID_WICPixelFormat64bppRGBA in the end
       
    • By DejayHextrix
      Hi, New here. 
      I need some help. My fiance and I like to play this mobile game online that goes by real time. Her and I are always working but when we have free time we like to play this game. We don't always got time throughout the day to Queue Buildings, troops, Upgrades....etc.... 
      I was told to look into DLL Injection and OpenGL/DirectX Hooking. Is this true? Is this what I need to learn? 
      How do I read the Android files, or modify the files, or get the in-game tags/variables for the game I want? 
      Any assistance on this would be most appreciated. I been everywhere and seems no one knows or is to lazy to help me out. It would be nice to have assistance for once. I don't know what I need to learn. 
      So links of topics I need to learn within the comment section would be SOOOOO.....Helpful. Anything to just get me started. 
      Thanks, 
      Dejay Hextrix 
  • Popular Now