Jump to content

  • Log In with Google      Sign In   
  • Create Account

Awesome job so far everyone! Please give us your feedback on how our article efforts are going. We still need more finished articles for our May contest theme: Remake the Classics

Cube Map Rendering only depth ! No color writes. - BUG


Old topic!
Guest, the last post of this topic is over 60 days old and at this point you may not reply in this topic. If you wish to continue this conversation start a new topic.

  • You cannot reply to this topic
7 replies to this topic

#1 MegaPixel   Members   -  Reputation: 175

Like
0Likes
Like

Posted 04 September 2012 - 02:59 AM

Hi all,

I'm trying to render to a cubemap using the geometry shader (GS). I want to render just depth, therefore I need to set a null render target and a depthstencilView as a texture2Darray of resources and indexing them with SV_RenderTargetArrayIndex in the GS.
The problem is that anytime that I launch PIX to verify the rendering results, the application crashes under pix (it doesn't crash if launch it without pix).
I tried disabling the shadow generation code and it still crashes, then I disabled the code that was creating the depth stencil view and the shader resource view for the cubemap and magically PIX wasn't crashing anymore.
So I think the problem might be in how I create those resources. I started digging on the internet to be sure that I was doing everything in the correct way etc. But I couldn't find any contraddiction with my code, so everything seems correct. It's also true that I couldn't find any reference or code sample that was showing the use of a depthstencilview for cubemap rendering with a rendertargetview set to null and color writes off.

btw here is my code to create DSV:
// Create a texture array to hold cube map data
FdkGfxTexture2DDesc texDesc;
ZeroMemory(&texDesc,sizeof(FdkGfxTexture2DDesc));
texDesc.width									 = 1024;
texDesc.height									= 1024;
texDesc.mipLevels							  = 1;
texDesc.arraySize							   = 6;
texDesc.format									= FDK_FORMAT_R32_TYPELESS;
texDesc.sampleDesc.count				 = 1;
texDesc.sampleDesc.quality				= 0;
texDesc.usage									 = FDK_USAGE_DEFAULT;
texDesc.bindFlags							   = FDK_BIND_DEPTH_STENCIL | FDK_BIND_SHADER_RESOURCE;
texDesc.CPUAccessFlags				   = 0;
texDesc.miscFlags							  = FDK_RESOURCE_MISC_TEXTURECUBE;
FdkGfxTexture2DId  texId					= fdkGfxCreateTexture2D(device,&texDesc,NULL);

// Create the depth stencil view desc for cube depth render
FdkGfxDepthStencilResourceViewDesc descDSV;
ZeroMemory(&descDSV,sizeof(descDSV));
descDSV.format									  = FDK_FORMAT_D32_FLOAT;
descDSV.viewDimension						 = FDK_DSV_DIMENSION_TEXTURE2DARRAY;
descDSV.texture2DArray.firstArraySlice  = 0;
descDSV.texture2DArray.arraySize				  = 6;
descDSV.texture2DArray.mipSlice				   = 0;
mDeferredData->mPointLightShadowMapBufferId	   = fdkGfxCreateDepthStencilTarget2D(device,&descDSV,texId);

//create shader resource view desc for the cube depth texture
FdkGfxShaderResourceViewDesc srvDesc;
ZeroMemory(&srvDesc,sizeof(srvDesc));
srvDesc.format								   				= FDK_FORMAT_R32_FLOAT;
srvDesc.viewDimension									  = FDK_SRV_DIMENSION_TEXTURECUBE;
srvDesc.textureCube.mipLevels						 = 1;
srvDesc.textureCube.mostDetailedMip			   = 0;
mDeferredData->mPointLightShadowMapShaderResource = fdkGfxCreateShaderResource(device,&srvDesc,texId);
fdkGfxDestroyTexture2D(texId);

fdk is just my API you can replace it with D3D11_blabla ;)

When it comes time to render, I expect to do something like:

OMSetRenderTargets(0,0,cubeMapDSV); //rtViewCount == 0, rtView == NULL

which means no render target view, just depth. I guess the rendering should be faster if we render depth with no color writes.

Could someone shed some light on this ?

Thanks in advance

Edited by MegaPixel, 04 September 2012 - 03:01 AM.


Ad:

#2 MegaPixel   Members   -  Reputation: 175

Like
1Likes
Like

Posted 04 September 2012 - 05:20 AM

From "Programming Vertex, Geometry and Pixel shaders":

There are some things to keep in mind though: As of Direct3D 10, you can
only set one depth stencil surface to the device at any given time. This
means that you need to store the depth values in the color data of the
render targets. Fortunately, D3D10 pixel shaders let us to handle
arbitrary values (including colors) as depth via views to typeless
resources, so this isn’t a problem.

I'm actually using DX10.1 so I think that might depend on that.

If that is the problem: Isn't slower generating the shadow map with color writes on ? Maybe in dx11 I can bound more than one depth stencil view ...

tbh on the internet I've found some people outputting linear depth in world space or view space to get more precision out of it and in that case a color buffer was compulsory if you don't want z/w hiperbolic falloff

Any thoughts on this ?


#3 Lightness1024   Members   -  Reputation: 638

Like
0Likes
Like

Posted 05 September 2012 - 03:27 PM

well there are various schools, there are people who want to purely render "something" with no actual pixel shader output, just the rendering, like Z prepass, and after that, the Z buffer is bound with a view that makes it accessible for later lighting/shadowing stage.
But with that you get Z/W storage and usually 32 bits floating, its good but not always the best.
some people are advocating one should fiddle the SV_Depth fragment output and output 1 - Z/W. this will reverse the useless multiplied precision we get near Z near in floating storage and asymptotic Z/W depth; so that the high precision of depth is in the back where float precision is low, and the low depth precision is near where the float precision is high. this "relinearize" roughly the precision on the depth range.
but people against that advocates that fiddling with Z output disable the early Z culling system.
some people say, "no, if you use the DX11 feature SV_DepthLessEqual and such". it was measured that you get perf that are halfway.

then for simplcity and compatibility with DX9, often we simply output to color. The best is R32F target with logarithmic depth storage in the view space (remapped to the depth range of the bounding sphere of the visible scene, so to say).
but you can output in RGBA8 in linear space using MSB/LSB tricks.
but doing this is like having 2 render targets : the depth&stencil buffer and the color buffer. so necessarily you have more bandwidth usage. but this will surely not be limitating compared to rasterization.

oh, and don't count on pix too much :)

#4 MJP   Moderators   -  Reputation: 5416

Like
1Likes
Like

Posted 06 September 2012 - 01:07 AM

Depth-only rendering is definitely good when you can do it. In some cases the advantage will be nullified because you'll become vertex or triangle setup bound, but I'd still recommend doing it.

You don't have to use SV_Depth to output 1 - z/w. You just need to tweak your projection matrix. In most math libraries you can just reverse the near and far clip planes and you'll get the desired result. I wouldn't recommend any kind of SV_Depth output unless you really have to. Even the conservative depth stuff will still have some performance impact.

#5 MegaPixel   Members   -  Reputation: 175

Like
0Likes
Like

Posted 06 September 2012 - 03:13 AM

Hi thanks all for the tips,

I've read somewhere about the trick of reversing the far and near plane ;)

I have two questions:

1) Currently I'm using a depth buffer to store six shadow maps (so a depth buffer atlas) that I use to approximate point lights (e.g. six spotlights -> see S.T.A.L.K.E.R.). I found that the pure cubemap method + geometry shader is quite slow. I mean the geometry shader itself doesn't seem that efficient when it comes time to output loads of geometry (ok except if you cull aggressively ... but even in that case is not scalable if the output is big).
The format tha I use for my depth buffer atlas is like D32_FLOAT.
One thing that I've noticed is that neither PIX nor the intel frame analyzer allow me to see the content of a depth stencil view created in that way (looks like that only the usual format D24_UNORM_S8_UNORM is recognized by those two profiler (and any other common one).
I create a shader resource view out of that atlas to get read access during the lighting stage so I created the top resource texture as R32_TYPELESS, maybe it might depend on that the fact that PIX or whatever other analyzer don't show the dethstencil contents ?
I output just depth, so color writes are disabled and render target slot is null.

2) For what that concern the cubemap method shouldn't I output the linear distance light->fragment instead of depth ? The cubemap is supposed to represent the whole environment so we can't just output planar z, but we need radial distance (or squared dist which is cheaper) from the surfaces seen by the cubemap faces (i.e. something like length(lightpos-position)).
And in the end the only advantage that I see from using cubemaps is the more even error distribution and the fact the I can render the whole cube in one pass with the geometry shader (but still is very easy with the geometry shader to go slow -> see nvidia about number of input scalar times maxvertexcount <= 20 ....). I don't find this method to be so scalable in the long run ... (I mean real case scenario).

I'd like to hear some opinion on those two approaches what is best in real case scenario.

On the STALKER article (GPU GEMS 2 page 155 Table 9-3) there is a clear comparison between the cubemap method, virtual shadow depth cube texture and the six spot lights one and it seems that the last one is the best in most cases ... So ... where is the geometry shader goodness in all of that ?

PS.: MJP I've got your dx11 book, I found it very usefull and informative, I still couldn't find a better book on dx11 other than yours.

Thanks in advance for any reply

#6 MJP   Moderators   -  Reputation: 5416

Like
0Likes
Like

Posted 06 September 2012 - 01:36 PM

Using a geometry shader will pretty much always decrease GPU performance, and this is especially true when doing any kind of geometry amplifications (which is the case when rendering to all 6 faces of a cubemap simultaneously). In the end it's really just a CPU performance optimization, so if you're concerned with GPU performance than just traditional rendering to 6 faces separately is the preferred option.

I've also noticed that about D32_FLOAT depth buffers, for whatever reason PIX won't visualize them. PIX is pretty much EOL at this point so it's not going to get fixed, and I haven't checked to see if it works correctly in the Visual Studio 2012 graphics debugger. Parallel Nsight will visualize it correctly, if you have access to that.

Planar z + the XY texel position is enough information to reconstruct radial distance. If you want you could render to a depth buffer first, then have a conversion step where you converted from Z to radial distance and wrote the result to the appropriate cube map face. This might be a bit faster than just rendering to the distance as a render target, but it's probably not a huge gain. Or you can project the surface position using the appropriate projection for a given face, and then compare planar z directly. But this requires determining which face a surface should use before sampling the cubemap, in which case there's not much point in using cubemap.

Thank you for the kind words about the book, we worked really hard on it. I'm glad that you find it useful. Posted Image

#7 MegaPixel   Members   -  Reputation: 175

Like
0Likes
Like

Posted 10 September 2012 - 11:15 AM

Using a geometry shader will pretty much always decrease GPU performance, and this is especially true when doing any kind of geometry amplifications (which is the case when rendering to all 6 faces of a cubemap simultaneously). In the end it's really just a CPU performance optimization, so if you're concerned with GPU performance than just traditional rendering to 6 faces separately is the preferred option.

I've also noticed that about D32_FLOAT depth buffers, for whatever reason PIX won't visualize them. PIX is pretty much EOL at this point so it's not going to get fixed, and I haven't checked to see if it works correctly in the Visual Studio 2012 graphics debugger. Parallel Nsight will visualize it correctly, if you have access to that.

Planar z + the XY texel position is enough information to reconstruct radial distance. If you want you could render to a depth buffer first, then have a conversion step where you converted from Z to radial distance and wrote the result to the appropriate cube map face. This might be a bit faster than just rendering to the distance as a render target, but it's probably not a huge gain. Or you can project the surface position using the appropriate projection for a given face, and then compare planar z directly. But this requires determining which face a surface should use before sampling the cubemap, in which case there's not much point in using cubemap.

Thank you for the kind words about the book, we worked really hard on it. I'm glad that you find it useful. Posted Image


Is it possible to generate an atlas of six projectors (instead of a cubemap) to render the whole shadow in just one pass ? or I have to make six pass with varying uv ?

At the moment I'm calculating the results in a shadow collector and I need to draw a fullscreen quad to sample the depth of the current view and therefore I can't instance.

This is my shadow atlas:

Posted Image

6 pass meansh 6 times a fullscreen quad ... I don't like to do that ... (or maybe there isn't any other way other than mark the received areas with stencil ...).

Plus, I think I have to re-send all the six view matrices as well in order to have the light projection space uv coherent with every face when it's time to lookup the depth for shadow testing ... am I correct ?

thanks in advance

#8 MegaPixel   Members   -  Reputation: 175

Like
0Likes
Like

Posted 10 September 2012 - 04:08 PM


Using a geometry shader will pretty much always decrease GPU performance, and this is especially true when doing any kind of geometry amplifications (which is the case when rendering to all 6 faces of a cubemap simultaneously). In the end it's really just a CPU performance optimization, so if you're concerned with GPU performance than just traditional rendering to 6 faces separately is the preferred option.

I've also noticed that about D32_FLOAT depth buffers, for whatever reason PIX won't visualize them. PIX is pretty much EOL at this point so it's not going to get fixed, and I haven't checked to see if it works correctly in the Visual Studio 2012 graphics debugger. Parallel Nsight will visualize it correctly, if you have access to that.

Planar z + the XY texel position is enough information to reconstruct radial distance. If you want you could render to a depth buffer first, then have a conversion step where you converted from Z to radial distance and wrote the result to the appropriate cube map face. This might be a bit faster than just rendering to the distance as a render target, but it's probably not a huge gain. Or you can project the surface position using the appropriate projection for a given face, and then compare planar z directly. But this requires determining which face a surface should use before sampling the cubemap, in which case there's not much point in using cubemap.

Thank you for the kind words about the book, we worked really hard on it. I'm glad that you find it useful. Posted Image


Is it possible to generate an atlas of six projectors (instead of a cubemap) to render the whole shadow in just one pass ? or I have to make six pass with varying uv ?

At the moment I'm calculating the results in a shadow collector and I need to draw a fullscreen quad to sample the depth of the current view and therefore I can't instance.

This is my shadow atlas:

Posted Image

6 pass meansh 6 times a fullscreen quad ... I don't like to do that ... (or maybe there isn't any other way other than mark the received areas with stencil ...).

Plus, I think I have to re-send all the six view matrices as well in order to have the light projection space uv coherent with every face when it's time to lookup the depth for shadow testing ... am I correct ?

thanks in advance


I think I figured out how to do it. I just instance 6 projectors treating them as independent unit and what I do is just add an offset on the u tex coord corresponding to the i-th projector.
That would be something like:

u+InstanceId*(shadowMapWidth/AtlasWidth)

So the u coord will always sample the right atlas portion for a given projector





Old topic!
Guest, the last post of this topic is over 60 days old and at this point you may not reply in this topic. If you wish to continue this conversation start a new topic.



PARTNERS