Jump to content
  • Advertisement


  • Content Count

  • Joined

  • Last visited

Everything posted by savail

  1. Hey, I've come across some odd problem. I am using DirectX 11 and I've tested my results on 2 GPUs: Geforce GTX660M and GTX1060. The strange behaviour occurs surprisingly on the newer GPU - GTX1060. I am loading HDR texture into DirectX and creating its shader resource view with DXGI_FORMAT_R32G32B32_FLOAT format: D3D11_SUBRESOURCE_DATA texData; texData.pSysMem = data; //hdr data in as a float array with rgb channels texData.SysMemPitch = width * (4 * 3);//size of texture row in bytes (4 bytes per each channel rgb) DXGI_FORMAT format = DXGI_FORMAT_R32G32B32_FLOAT; //the remaining (not set below) attributes have default DirectX values Texture2dConfigDX11 conf; conf.SetFormat(format); conf.SetWidth(width); conf.SetHeight(height); conf.SetBindFlags(D3D11_BIND_SHADER_RESOURCE); conf.SetCPUAccessFlags(0); conf.SetUsage(D3D11_USAGE_DEFAULT); D3D11_TEX2D_SRV srv; srv.MipLevels = 1; srv.MostDetailedMip = 0; ShaderResourceViewConfigDX11 srvConf; srvConf.SetFormat(format); srvConf.SetTexture2D(srv); I'm sampling this texture using linear sampler with D3D11_FILTER_MIN_MAG_MIP_LINEAR and addressing mode: D3D11_TEXTURE_ADDRESS_CLAMP. This is how I sample the texture in a pixel shader: SamplerState linearSampler : register(s0); Texture2D tex; ... float4 psMain(in PS_INPUT input) : SV_TARGET { float3 color = tex.Sample(linearSampler, input.uv).rgb; return float4(color, 1); } First of all, I'm not getting any errors during runtime in release and my shader using this texture gives correct result on both GPUs. In debug mode I'm also getting correct results on both GPUs but I'm also getting following DX error (in output log in Visual Studio) when debugging the app and only on the GTX1060 GPU: D3D11 ERROR: ID3D11DeviceContext::DrawIndexed: The Shader Resource View in slot 0 of the Pixel Shader unit is using the Format (R32G32B32_FLOAT). This format does not support 'Sample', 'SampleLevel', 'SampleBias' or 'SampleGrad', at least one of which may being used on the Resource by the shader. The exception is if the corresponding Sampler object is configured for point filtering (in which case this error can be ignored). This also only applies if the shader actually uses the view (e.g. it is not skipped due to shader code branching). [ EXECUTION ERROR #371: DEVICE_DRAW_RESOURCE_FORMAT_SAMPLE_UNSUPPORTED] Despite this error, the result of the shader is correct... This doesn't seem to make any sense... Is this possible that my graphics driver (I updated to the newest version) on GTX1060 doesn't support sampling R32G32B32 textures in pixel shader? This sounds like pretty basic functionality to support... R32G32B32A32 format works flawlessly in debug/release on both GPUs.
  2. Thanks a lot! This is exacly what I was missing. I didn't know that attribute marked with SV_POSITION is already converted to raster space automagically in pixel shader.
  3. Hey, I have to cast camera rays through the near plane of the camera and the first approach in the code below is the one I've come up with and I understand it precisely. However, I've come across much more elegant and shorter solution which looks to give exacly the same results (at least visually in my app) and this is the "Second approach" below. struct VS_INPUT { float3 localPos : POSITION; }; struct PS_INPUT { float4 screenPos : SV_POSITION; float3 localPos : POSITION; }; PS_INPUT vsMain(in VS_INPUT input) { PS_INPUT output; output.screenPos = mul(float4(input.localPos, 1.0f), WorldViewProjMatrix); output.localPos = input.localPos; return output; } float4 psMain(in PS_INPUT input) : SV_Target { //First approach { const float3 screenSpacePos = mul(float4(input.localPos, 1.0f), WorldViewProjMatrix).xyw; const float2 screenPos = screenSpacePos.xy / screenSpacePos.z; //divide by w taken above as third argument const float2 screenPosUV = screenPos * float2(0.5f, -0.5f) + 0.5f; //invert Y axis for the shadow map look up in future //fov is vertical float nearPlaneHeight = TanHalfFov * 1.0f; //near = 1.0f float nearPlaneWidth = AspectRatio * nearPlaneHeight; //position of rendered point projected on the near plane float3 cameraSpaceNearPos = float3(screenPos.x * nearPlaneWidth, screenPos.y * nearPlaneHeight, 1.0f); //transform the direction from camera to world space const float3 direction = mul(cameraSpaceNearPos, (float3x3)InvViewMatrix).xyz; } //Second approach { //UV for shadow map look up later in code const float2 screenPosUV = input.screenPos.xy * rcp( renderTargetSize ); const float2 screenPos = screenPosUV * 2.0f - 1.0f; // transform range 0->1 to -1->1 // Ray's direction in world space, VIEW_LOOK/RIGHT/UP are camera basis vectors in world space //fov is vertical const float3 direction = (VIEW_LOOK + TanHalfFov * (screenPos.x*VIEW_RIGHT*AspectRatio - screenPos.y*VIEW_UP)); } ... } I cannot understand what happens in the second approach right at the first 2 lines. input.screenPos.xy is calculated in vs and interpolated here but it's still before the perspective divide right? So for example y coordinate of input.screenPos should be in range -|w| <= y <= |w| where w is the z coordinate of the point in camera space, so maximally w can be equal to Far and minimally to Near plane right? How come dividing y by the renderTargetSize above yield the result supposedly in <0,1> range? Also screenPosUV seems to have already inverted Y axis for some reason I also don't understand - and that's why probably the minus sign in the calculation of direction. In my setup for example renderTargetSize is (1280, 720), Far = 100, Near = 1.0f, I use LH coordinate system and camera by default looks towards positive Z axis. Both approaches first and second give me the same results but I would like to understand this second approach. Would be very grateful for any help!
  4. Hey, This is a very strange problem... I've got a computation shader that's supposed to fill 3d texture (voxels in metavoxel) with color, based on particles that cover given metavoxel. And this is the code: static const int VOXEL_WIDTH_IN_METAVOXEL = 32; static const int VOXEL_SIZE = 1; static const float VOXEL_HALF_DIAGONAL_LENGTH_SQUARED = (VOXEL_SIZE * VOXEL_SIZE + 2.0f * VOXEL_SIZE * VOXEL_SIZE) / 4.0f; static const int MAX_PARTICLES_IN_METAVOXEL = 32; struct Particle { float3 position; float radius; }; cbuffer OccupiedMetavData : register(b6) { float3 occupiedMetavWorldPos; int numberOfParticles; Particle particlesBin[MAX_PARTICLES_IN_METAVOXEL]; }; RWTexture3D<float4> metavoxelTexUav : register(u5); [numthreads(VOXEL_WIDTH_IN_METAVOXEL, VOXEL_WIDTH_IN_METAVOXEL, 1)] void main(uint2 groupThreadId : SV_GroupThreadID) { float4 voxelColumnData[VOXEL_WIDTH_IN_METAVOXEL]; float particleRadiusSquared; float3 distVec; for (int i = 0; i < VOXEL_WIDTH_IN_METAVOXEL; i++) voxelColumnData[i] = float4(0.0f, 0.0f, 1.0f, 0.0f); for (int k = 0; k < numberOfParticles; k++) { particleRadiusSquared = particlesBin[k].radius * particlesBin[k].radius + VOXEL_HALF_DIAGONAL_LENGTH_SQUARED; distVec.xy = (occupiedMetavWorldPos.xy + groupThreadId * VOXEL_SIZE) - particlesBin[k].position.xy; for (int i = 0; i < VOXEL_WIDTH_IN_METAVOXEL; i++) { distVec.z = (occupiedMetavWorldPos.z + i * VOXEL_SIZE) - particlesBin[k].position.z; if (dot(distVec, distVec) < particleRadiusSquared) { //given voxel is covered by particle voxelColumnData[i] += float4(0.0f, 1.0f, 0.0f, 1.0f); } } } for (int i = 0; i < VOXEL_WIDTH_IN_METAVOXEL; i++) metavoxelTexUav[uint3(groupThreadId.x, groupThreadId.y, i)] = clamp(voxelColumnData[i], 0.0, 1.0); } And it works well in debug mode. This is the correct looking result obtained after raymarching one metavoxel from camera: As you can see, the particle only covers the top right corner of the metavoxel. However, in release mode The result obtained looks like this: This looks like the upper half of the metavoxel was not filled at all even with the ambient blue-ish color in the first "for" loop... I nailed it down towards one line of code in the above shader. When I replace "numberOfParticles" in the "for" loop with constant value such as 1 (which is uploaded to GPU anyway) the result finally looks the same as in debug mode. This is the shader compile method from Hieroglyph Rendering Engine (awesome engine) and it looks fine for me but maybe something's wrong? My only modification was adding include functionality ID3DBlob* ShaderFactoryDX11::GenerateShader( ShaderType type, std::wstring& filename, std::wstring& function, std::wstring& model, const D3D_SHADER_MACRO* pDefines, bool enablelogging ) { HRESULT hr = S_OK; std::wstringstream message; ID3DBlob* pCompiledShader = nullptr; ID3DBlob* pErrorMessages = nullptr; char AsciiFunction[1024]; char AsciiModel[1024]; WideCharToMultiByte(CP_ACP, 0, function.c_str(), -1, AsciiFunction, 1024, NULL, NULL); WideCharToMultiByte(CP_ACP, 0, model.c_str(), -1, AsciiModel, 1024, NULL, NULL); // TODO: The compilation of shaders has to skip the warnings as errors // for the moment, since the new FXC.exe compiler in VS2012 is // apparently more strict than before. UINT flags = D3DCOMPILE_PACK_MATRIX_ROW_MAJOR; #ifdef _DEBUG flags |= D3DCOMPILE_DEBUG | D3DCOMPILE_SKIP_OPTIMIZATION; // | D3DCOMPILE_WARNINGS_ARE_ERRORS; #endif // Get the current path to the shader folders, and add the filename to it. FileSystem fs; std::wstring filepath = fs.GetShaderFolder() + filename; // Load the file into memory FileLoader SourceFile; if ( !SourceFile.Open( filepath ) ) { message << "Unable to load shader from file: " << filepath; EventManager::Get()->ProcessEvent( EvtErrorMessagePtr( new EvtErrorMessage( message.str() ) ) ); return( nullptr ); } LPCSTR s; if ( FAILED( hr = D3DCompile( SourceFile.GetDataPtr(), SourceFile.GetDataSize(), GlyphString::wstringToString(filepath).c_str(), //!!!! - this must be pointing to a concrete shader file!!! - only directory would work as well but in that case graphics debugger crashes when debugging shaders pDefines, D3D_COMPILE_STANDARD_FILE_INCLUDE, AsciiFunction, AsciiModel, flags, 0, &pCompiledShader, &pErrorMessages ) ) ) //if ( FAILED( hr = D3DX11CompileFromFile( // filename.c_str(), // pDefines, // 0, // AsciiFunction, // AsciiModel, // flags, // 0,//UINT Flags2, // 0, // &pCompiledShader, // &pErrorMessages, // &hr // ) ) ) { message << L"Error compiling shader program: " << filepath << std::endl << std::endl; message << L"The following error was reported:" << std::endl; if ( ( enablelogging ) && ( pErrorMessages != nullptr ) ) { LPVOID pCompileErrors = pErrorMessages->GetBufferPointer(); const char* pMessage = (const char*)pCompileErrors; message << GlyphString::ToUnicode( std::string( pMessage ) ); Log::Get().Write( message.str() ); } EventManager::Get()->ProcessEvent( EvtErrorMessagePtr( new EvtErrorMessage( message.str() ) ) ); SAFE_RELEASE( pCompiledShader ); SAFE_RELEASE( pErrorMessages ); return( nullptr ); } SAFE_RELEASE( pErrorMessages ); return( pCompiledShader ); } Could the shader crash for some reason in mid way through execution? The question also is what could compiler possibly do to the shader code in release mode that suddenly "numberOfParticles" becomes invalid and how to fix this issue? Or maybe it's even sth deeper which results in numberOfParticles being invalid? I checked my constant buffer values with Graphics debugger in debug and release modes and both had correct value for numberOfParticles set to 1...
  5. Hey, thanks for your feedback! I agree with most of your points but I wonder if this solution is really that bad in my specific case at least on GTX660M . I've run this app also on GTX1060 and this solution indeed was horrible but on GTX660M the situation is reversed - it prove to be the fastest solution. I didn't know that registers are accessed like this (it's very valuable information, thanks!) but in my case you can see the loop in which I access voxelColumnData executes constant number of times => compiler should be smart enough to unroll the loop and predict the registers from the array, right? The current approach (which runs 4 groups of 32x8 threads and each thread processes 32 voxels in depth sequentially) takes about 6ms to fill about 8 metavoxels (each of size 32x32x32) while the approach with shared memory (I tried a few configurations) yielded sth like 7ms and another approach - running 32 groups of 32x32 threads per metavoxel (each thread setups color for exacly one voxel based on particles) is significantly faster on GTX1060 but on GTX660M takes about 20 ms. Unfortunately, horrible solutions on one GPU might not be horrible on another : P though I guess I should care more for the newer hardware than my GTX660M ; ]
  6. Alright... this is the first time compiler warnings became really important in my life . Especially the warnings generated by hlsl compiler with the flag D3DCOMPILE_WARNINGS_ARE_ERRORS. This is the warning I got with the above compute shader: Though, I thought the driver would handle this case appropriately and setup a sequential queue of threads if there weren't enough registers for all threads to execute... It also appears that this limitation might be just per group of threads because when I replaced 1 group of 32x32 threads with 4 groups of 32x8 threads everything finally works as supposed in release mode. I'm really surprised the driver doesn't handle this automatically in release mode. Could it be that in debug it does this and in release not? Is there some way to force correct behaviour in release mode without manually dividing the threads? Probably it's also driver specific, right? Any comments or insights would be really welcome! Thanks for your time guys anyway
  7. Hey, I can't find this information anywhere on the web and I'm wondering about specific optimization... Let's say I have hundreds of 3D textures which I need to process separately in compute shader. Each invocation needs different data in constant buffer BUT many of the 3d textures don't need to update their CB contents every frame. Would it be better to create just one CB resource, bind just once at startup and in loop map the data for each consecutive shader invocation or would it be better to create like hundreds of separate CB resources, map them only when needed and just bind appropriate CB before each shader invocation? This depends on how exacly are those resources managed internally in DirectX and what does binding actually do... I would be very grateful if somebody shared their experience!
  8. Hey, There are a few things which confuse me regarding DirectX 11 and HLSL shaders in general. I would be very grateful for your advice! 1. Let's take for example a scene which invokes 2 totally separate pipeline render passes interchangeably. I understand I need to bind correct shaders for each of the render pass and potentially blend/depth or rasterizer state but what about resources such as Constant Buffers, Shader Resource Views and Unordered Access Views? Assuming that the second render pass uses none of the resources used by the first pass, do I still need to unbind the resources and clean pipeline state after first pass? Or is it ok to leave pipeline with unbound garbage since anything I'd need to bind for second pass would overwrite contents in the appropriate register slots anyway? 2. Is it a good practice to assign register slots manually to all resources in HLSL? 3. I thought about assigning manually register slots for every distinct render pass up to the maximum slot limit if neccessary. For example in 1 render pass I invoke 3 CS's, 2 VS's and 2 PS's and for all resources used by those shaders I try to fill as many register slots as neccessary and potentially reuse many times the same slot in shaders sharing the same resource. I was wondering if there is any performance penalty or gain when I bind all of my needed resources at the start of render pass and never gonna have to do it again until next render pass? - this means potentially binding a lot of registers and having excessive number of bound resources for every shader that is run. 4. Is it a good practice to create a separate include file for every resource that occurs in >= 2 shader files or is it better to duplicate the declarations? In first case, the code is imo easier to maintain and edit but might be harder to read if there's too many includes. I've come up with a compromise between these 2 like this: create a separate include file for every CB that occurs in >= 2 shader files and a separate include file for every sampler I ever need to use. All other resources like srvs and uavs I prefer to duplicate in multiple shaders because they take much less space than CB for example... I'm not sure however if that's a good practice
  • Advertisement

Important Information

By using GameDev.net, you agree to our community Guidelines, Terms of Use, and Privacy Policy.

GameDev.net is your game development community. Create an account for your GameDev Portfolio and participate in the largest developer community in the games industry.

Sign me up!