Jump to content
  • Advertisement


  • Content Count

  • Joined

  • Last visited

Community Reputation

334 Neutral

About savail

  • Rank

Personal Information

  • Role
  • Interests
  1. Hey, I've come across some odd problem. I am using DirectX 11 and I've tested my results on 2 GPUs: Geforce GTX660M and GTX1060. The strange behaviour occurs surprisingly on the newer GPU - GTX1060. I am loading HDR texture into DirectX and creating its shader resource view with DXGI_FORMAT_R32G32B32_FLOAT format: D3D11_SUBRESOURCE_DATA texData; texData.pSysMem = data; //hdr data in as a float array with rgb channels texData.SysMemPitch = width * (4 * 3);//size of texture row in bytes (4 bytes per each channel rgb) DXGI_FORMAT format = DXGI_FORMAT_R32G32B32_FLOAT; //the remaining (not set below) attributes have default DirectX values Texture2dConfigDX11 conf; conf.SetFormat(format); conf.SetWidth(width); conf.SetHeight(height); conf.SetBindFlags(D3D11_BIND_SHADER_RESOURCE); conf.SetCPUAccessFlags(0); conf.SetUsage(D3D11_USAGE_DEFAULT); D3D11_TEX2D_SRV srv; srv.MipLevels = 1; srv.MostDetailedMip = 0; ShaderResourceViewConfigDX11 srvConf; srvConf.SetFormat(format); srvConf.SetTexture2D(srv); I'm sampling this texture using linear sampler with D3D11_FILTER_MIN_MAG_MIP_LINEAR and addressing mode: D3D11_TEXTURE_ADDRESS_CLAMP. This is how I sample the texture in a pixel shader: SamplerState linearSampler : register(s0); Texture2D tex; ... float4 psMain(in PS_INPUT input) : SV_TARGET { float3 color = tex.Sample(linearSampler, input.uv).rgb; return float4(color, 1); } First of all, I'm not getting any errors during runtime in release and my shader using this texture gives correct result on both GPUs. In debug mode I'm also getting correct results on both GPUs but I'm also getting following DX error (in output log in Visual Studio) when debugging the app and only on the GTX1060 GPU: D3D11 ERROR: ID3D11DeviceContext::DrawIndexed: The Shader Resource View in slot 0 of the Pixel Shader unit is using the Format (R32G32B32_FLOAT). This format does not support 'Sample', 'SampleLevel', 'SampleBias' or 'SampleGrad', at least one of which may being used on the Resource by the shader. The exception is if the corresponding Sampler object is configured for point filtering (in which case this error can be ignored). This also only applies if the shader actually uses the view (e.g. it is not skipped due to shader code branching). [ EXECUTION ERROR #371: DEVICE_DRAW_RESOURCE_FORMAT_SAMPLE_UNSUPPORTED] Despite this error, the result of the shader is correct... This doesn't seem to make any sense... Is this possible that my graphics driver (I updated to the newest version) on GTX1060 doesn't support sampling R32G32B32 textures in pixel shader? This sounds like pretty basic functionality to support... R32G32B32A32 format works flawlessly in debug/release on both GPUs.
  2. Thanks a lot! This is exacly what I was missing. I didn't know that attribute marked with SV_POSITION is already converted to raster space automagically in pixel shader.
  3. Hey, I have to cast camera rays through the near plane of the camera and the first approach in the code below is the one I've come up with and I understand it precisely. However, I've come across much more elegant and shorter solution which looks to give exacly the same results (at least visually in my app) and this is the "Second approach" below. struct VS_INPUT { float3 localPos : POSITION; }; struct PS_INPUT { float4 screenPos : SV_POSITION; float3 localPos : POSITION; }; PS_INPUT vsMain(in VS_INPUT input) { PS_INPUT output; output.screenPos = mul(float4(input.localPos, 1.0f), WorldViewProjMatrix); output.localPos = input.localPos; return output; } float4 psMain(in PS_INPUT input) : SV_Target { //First approach { const float3 screenSpacePos = mul(float4(input.localPos, 1.0f), WorldViewProjMatrix).xyw; const float2 screenPos = screenSpacePos.xy / screenSpacePos.z; //divide by w taken above as third argument const float2 screenPosUV = screenPos * float2(0.5f, -0.5f) + 0.5f; //invert Y axis for the shadow map look up in future //fov is vertical float nearPlaneHeight = TanHalfFov * 1.0f; //near = 1.0f float nearPlaneWidth = AspectRatio * nearPlaneHeight; //position of rendered point projected on the near plane float3 cameraSpaceNearPos = float3(screenPos.x * nearPlaneWidth, screenPos.y * nearPlaneHeight, 1.0f); //transform the direction from camera to world space const float3 direction = mul(cameraSpaceNearPos, (float3x3)InvViewMatrix).xyz; } //Second approach { //UV for shadow map look up later in code const float2 screenPosUV = input.screenPos.xy * rcp( renderTargetSize ); const float2 screenPos = screenPosUV * 2.0f - 1.0f; // transform range 0->1 to -1->1 // Ray's direction in world space, VIEW_LOOK/RIGHT/UP are camera basis vectors in world space //fov is vertical const float3 direction = (VIEW_LOOK + TanHalfFov * (screenPos.x*VIEW_RIGHT*AspectRatio - screenPos.y*VIEW_UP)); } ... } I cannot understand what happens in the second approach right at the first 2 lines. input.screenPos.xy is calculated in vs and interpolated here but it's still before the perspective divide right? So for example y coordinate of input.screenPos should be in range -|w| <= y <= |w| where w is the z coordinate of the point in camera space, so maximally w can be equal to Far and minimally to Near plane right? How come dividing y by the renderTargetSize above yield the result supposedly in <0,1> range? Also screenPosUV seems to have already inverted Y axis for some reason I also don't understand - and that's why probably the minus sign in the calculation of direction. In my setup for example renderTargetSize is (1280, 720), Far = 100, Near = 1.0f, I use LH coordinate system and camera by default looks towards positive Z axis. Both approaches first and second give me the same results but I would like to understand this second approach. Would be very grateful for any help!
  4. Hey, thanks for your feedback! I agree with most of your points but I wonder if this solution is really that bad in my specific case at least on GTX660M . I've run this app also on GTX1060 and this solution indeed was horrible but on GTX660M the situation is reversed - it prove to be the fastest solution. I didn't know that registers are accessed like this (it's very valuable information, thanks!) but in my case you can see the loop in which I access voxelColumnData executes constant number of times => compiler should be smart enough to unroll the loop and predict the registers from the array, right? The current approach (which runs 4 groups of 32x8 threads and each thread processes 32 voxels in depth sequentially) takes about 6ms to fill about 8 metavoxels (each of size 32x32x32) while the approach with shared memory (I tried a few configurations) yielded sth like 7ms and another approach - running 32 groups of 32x32 threads per metavoxel (each thread setups color for exacly one voxel based on particles) is significantly faster on GTX1060 but on GTX660M takes about 20 ms. Unfortunately, horrible solutions on one GPU might not be horrible on another : P though I guess I should care more for the newer hardware than my GTX660M ; ]
  5. Alright... this is the first time compiler warnings became really important in my life . Especially the warnings generated by hlsl compiler with the flag D3DCOMPILE_WARNINGS_ARE_ERRORS. This is the warning I got with the above compute shader: Though, I thought the driver would handle this case appropriately and setup a sequential queue of threads if there weren't enough registers for all threads to execute... It also appears that this limitation might be just per group of threads because when I replaced 1 group of 32x32 threads with 4 groups of 32x8 threads everything finally works as supposed in release mode. I'm really surprised the driver doesn't handle this automatically in release mode. Could it be that in debug it does this and in release not? Is there some way to force correct behaviour in release mode without manually dividing the threads? Probably it's also driver specific, right? Any comments or insights would be really welcome! Thanks for your time guys anyway
  6. Hey, This is a very strange problem... I've got a computation shader that's supposed to fill 3d texture (voxels in metavoxel) with color, based on particles that cover given metavoxel. And this is the code: static const int VOXEL_WIDTH_IN_METAVOXEL = 32; static const int VOXEL_SIZE = 1; static const float VOXEL_HALF_DIAGONAL_LENGTH_SQUARED = (VOXEL_SIZE * VOXEL_SIZE + 2.0f * VOXEL_SIZE * VOXEL_SIZE) / 4.0f; static const int MAX_PARTICLES_IN_METAVOXEL = 32; struct Particle { float3 position; float radius; }; cbuffer OccupiedMetavData : register(b6) { float3 occupiedMetavWorldPos; int numberOfParticles; Particle particlesBin[MAX_PARTICLES_IN_METAVOXEL]; }; RWTexture3D<float4> metavoxelTexUav : register(u5); [numthreads(VOXEL_WIDTH_IN_METAVOXEL, VOXEL_WIDTH_IN_METAVOXEL, 1)] void main(uint2 groupThreadId : SV_GroupThreadID) { float4 voxelColumnData[VOXEL_WIDTH_IN_METAVOXEL]; float particleRadiusSquared; float3 distVec; for (int i = 0; i < VOXEL_WIDTH_IN_METAVOXEL; i++) voxelColumnData[i] = float4(0.0f, 0.0f, 1.0f, 0.0f); for (int k = 0; k < numberOfParticles; k++) { particleRadiusSquared = particlesBin[k].radius * particlesBin[k].radius + VOXEL_HALF_DIAGONAL_LENGTH_SQUARED; distVec.xy = (occupiedMetavWorldPos.xy + groupThreadId * VOXEL_SIZE) - particlesBin[k].position.xy; for (int i = 0; i < VOXEL_WIDTH_IN_METAVOXEL; i++) { distVec.z = (occupiedMetavWorldPos.z + i * VOXEL_SIZE) - particlesBin[k].position.z; if (dot(distVec, distVec) < particleRadiusSquared) { //given voxel is covered by particle voxelColumnData[i] += float4(0.0f, 1.0f, 0.0f, 1.0f); } } } for (int i = 0; i < VOXEL_WIDTH_IN_METAVOXEL; i++) metavoxelTexUav[uint3(groupThreadId.x, groupThreadId.y, i)] = clamp(voxelColumnData[i], 0.0, 1.0); } And it works well in debug mode. This is the correct looking result obtained after raymarching one metavoxel from camera: As you can see, the particle only covers the top right corner of the metavoxel. However, in release mode The result obtained looks like this: This looks like the upper half of the metavoxel was not filled at all even with the ambient blue-ish color in the first "for" loop... I nailed it down towards one line of code in the above shader. When I replace "numberOfParticles" in the "for" loop with constant value such as 1 (which is uploaded to GPU anyway) the result finally looks the same as in debug mode. This is the shader compile method from Hieroglyph Rendering Engine (awesome engine) and it looks fine for me but maybe something's wrong? My only modification was adding include functionality ID3DBlob* ShaderFactoryDX11::GenerateShader( ShaderType type, std::wstring& filename, std::wstring& function, std::wstring& model, const D3D_SHADER_MACRO* pDefines, bool enablelogging ) { HRESULT hr = S_OK; std::wstringstream message; ID3DBlob* pCompiledShader = nullptr; ID3DBlob* pErrorMessages = nullptr; char AsciiFunction[1024]; char AsciiModel[1024]; WideCharToMultiByte(CP_ACP, 0, function.c_str(), -1, AsciiFunction, 1024, NULL, NULL); WideCharToMultiByte(CP_ACP, 0, model.c_str(), -1, AsciiModel, 1024, NULL, NULL); // TODO: The compilation of shaders has to skip the warnings as errors // for the moment, since the new FXC.exe compiler in VS2012 is // apparently more strict than before. UINT flags = D3DCOMPILE_PACK_MATRIX_ROW_MAJOR; #ifdef _DEBUG flags |= D3DCOMPILE_DEBUG | D3DCOMPILE_SKIP_OPTIMIZATION; // | D3DCOMPILE_WARNINGS_ARE_ERRORS; #endif // Get the current path to the shader folders, and add the filename to it. FileSystem fs; std::wstring filepath = fs.GetShaderFolder() + filename; // Load the file into memory FileLoader SourceFile; if ( !SourceFile.Open( filepath ) ) { message << "Unable to load shader from file: " << filepath; EventManager::Get()->ProcessEvent( EvtErrorMessagePtr( new EvtErrorMessage( message.str() ) ) ); return( nullptr ); } LPCSTR s; if ( FAILED( hr = D3DCompile( SourceFile.GetDataPtr(), SourceFile.GetDataSize(), GlyphString::wstringToString(filepath).c_str(), //!!!! - this must be pointing to a concrete shader file!!! - only directory would work as well but in that case graphics debugger crashes when debugging shaders pDefines, D3D_COMPILE_STANDARD_FILE_INCLUDE, AsciiFunction, AsciiModel, flags, 0, &pCompiledShader, &pErrorMessages ) ) ) //if ( FAILED( hr = D3DX11CompileFromFile( // filename.c_str(), // pDefines, // 0, // AsciiFunction, // AsciiModel, // flags, // 0,//UINT Flags2, // 0, // &pCompiledShader, // &pErrorMessages, // &hr // ) ) ) { message << L"Error compiling shader program: " << filepath << std::endl << std::endl; message << L"The following error was reported:" << std::endl; if ( ( enablelogging ) && ( pErrorMessages != nullptr ) ) { LPVOID pCompileErrors = pErrorMessages->GetBufferPointer(); const char* pMessage = (const char*)pCompileErrors; message << GlyphString::ToUnicode( std::string( pMessage ) ); Log::Get().Write( message.str() ); } EventManager::Get()->ProcessEvent( EvtErrorMessagePtr( new EvtErrorMessage( message.str() ) ) ); SAFE_RELEASE( pCompiledShader ); SAFE_RELEASE( pErrorMessages ); return( nullptr ); } SAFE_RELEASE( pErrorMessages ); return( pCompiledShader ); } Could the shader crash for some reason in mid way through execution? The question also is what could compiler possibly do to the shader code in release mode that suddenly "numberOfParticles" becomes invalid and how to fix this issue? Or maybe it's even sth deeper which results in numberOfParticles being invalid? I checked my constant buffer values with Graphics debugger in debug and release modes and both had correct value for numberOfParticles set to 1...
  7. Hey, I can't find this information anywhere on the web and I'm wondering about specific optimization... Let's say I have hundreds of 3D textures which I need to process separately in compute shader. Each invocation needs different data in constant buffer BUT many of the 3d textures don't need to update their CB contents every frame. Would it be better to create just one CB resource, bind just once at startup and in loop map the data for each consecutive shader invocation or would it be better to create like hundreds of separate CB resources, map them only when needed and just bind appropriate CB before each shader invocation? This depends on how exacly are those resources managed internally in DirectX and what does binding actually do... I would be very grateful if somebody shared their experience!
  8. Hey, There are a few things which confuse me regarding DirectX 11 and HLSL shaders in general. I would be very grateful for your advice! 1. Let's take for example a scene which invokes 2 totally separate pipeline render passes interchangeably. I understand I need to bind correct shaders for each of the render pass and potentially blend/depth or rasterizer state but what about resources such as Constant Buffers, Shader Resource Views and Unordered Access Views? Assuming that the second render pass uses none of the resources used by the first pass, do I still need to unbind the resources and clean pipeline state after first pass? Or is it ok to leave pipeline with unbound garbage since anything I'd need to bind for second pass would overwrite contents in the appropriate register slots anyway? 2. Is it a good practice to assign register slots manually to all resources in HLSL? 3. I thought about assigning manually register slots for every distinct render pass up to the maximum slot limit if neccessary. For example in 1 render pass I invoke 3 CS's, 2 VS's and 2 PS's and for all resources used by those shaders I try to fill as many register slots as neccessary and potentially reuse many times the same slot in shaders sharing the same resource. I was wondering if there is any performance penalty or gain when I bind all of my needed resources at the start of render pass and never gonna have to do it again until next render pass? - this means potentially binding a lot of registers and having excessive number of bound resources for every shader that is run. 4. Is it a good practice to create a separate include file for every resource that occurs in >= 2 shader files or is it better to duplicate the declarations? In first case, the code is imo easier to maintain and edit but might be harder to read if there's too many includes. I've come up with a compromise between these 2 like this: create a separate include file for every CB that occurs in >= 2 shader files and a separate include file for every sampler I ever need to use. All other resources like srvs and uavs I prefer to duplicate in multiple shaders because they take much less space than CB for example... I'm not sure however if that's a good practice
  9. Hey, I'm wondering what would be the best approach to make a character be completely interactive with the environment just like in the "Happy Wheels" game. How are characters in "Happy Wheels" made anyway? Are they 3d models or each body part is a separate object and everything is just being linked into 1 character in game? I would like to achieve a similar effect but in a completely 2d game (without 3d models), so that character would bleed just in place where he was shot, could be cut on half in many places of his body etc... ; P
  10. Hey, I already have quite good knowledge of C++. I've written 2d game in DirectX 9, played with network programming a bit and WinAPI. Now I would like to start learning 3D game programming and I wonder what would be the best approach? Should I learn DirectX 11 and try to create my own engine? I've already tried to learn DX 11 but it seems to be pointless... Going through DX 11 tutorials is a pain for 1 person and takes too much time in comparison to knowledge achieved. So maybe should I learn some 3D library like Ogre or download a game engine like Unity or Unreal? I would like to go with a solution that would give me most abilities and knowledge used in professional game development. I would be very grateful if someone with experience could share his thoughts!
  11. So a common view matrix looks like this:   where n is a vector acting as "z" axis of camera, u as "x" axis and v as "y" axis of camera. u_x, v_x, n_x etc. are coordinates of each vector. c is a vector representing distance of camera from (0,0, 0). u*c, v*c and n*c are dot products Does somebody know any detailed article which would explain precisely how to create such a matrice from the scratch? Or maybe could someone explain it to me here? First my assumptions: When applying this matrix to every object on the scene, this matrix is the first factor of multiplication and a vertex is a second one? So for example: * | x | | y | | z | | 1 | If my assumptions are correct then I don't understand a few things . Why 4th row of the matrix must contain a vector representing how much should I move every vertex on the screen? The 4th coordinate in any vertex is "w" right? So what meaning does it have here? I thought it's actually useless but defined only to enable adding matrices in form of multiplication. Now after such mutiplication as above I would get the following vertex transformation: |x * u_x + y * v_x + z * n_x + 0 | |x * u_y + y * v_y + z * n_y + 0 | |x * u_z + y * v_z + z * n_z + 0 | |-x * (u*c) -y * (v*c) -z * (n*c) + 1 | And it seems as the "w" component of the vertex was moved but it doesn't make any sense to me : (.   My second issue is rotation in the view matrix so 3 first rows of the view matrix. I completely don't understand why we can put coordinates of the camera axis vectors as rotation factors.   So if anyone could lend me a hand here I would be really grateful!
  12. Thank you very much! Indeed, I had to change the format for my position in the layout from DXGI_FORMAT_R32G32B32_FLOAT to DXGI_FORMAT_R32G32B32A32_FLOAT and now everything works correctly! Thanks a lot again... there is so much to learn in DX 11 that one can easily get confused :(.
  13. So I'm following a tutorial http://www.rastertek.com/dx11tut04.html and managed to convert it from D3DX10math.h to DirectXMath.h but I can't get the same color values when using XMVECTOR or XMFLOATs for my vertices. I want to draw a green triangle and this is how I set positions and colors for vertices using XMVECTOR: //vertices::position and vertices::color are XMVECTORs alligned to 16 bytes vertices[0].position = DirectX::XMVectorSet(-1.0f, -1.0f, 0.0f, 0.0f);  // Bottom left.     vertices[0].color = DirectX::XMVectorSet(0.0f, 1.0f, 0.0f, 1.0f);     vertices[1].position = DirectX::XMVectorSet(0.0f, 1.0f, 0.0f, 0.0f);  // Top middle.     vertices[1].color = DirectX::XMVectorSet(0.0f, 1.0f, 0.0f, 1.0f);     vertices[2].position = DirectX::XMVectorSet(1.0f, -1.0f, 0.0f, 0.0f);  // Bottom right.     vertices[2].color = DirectX::XMVectorSet(0.0f, 1.0f, 0.0f, 1.0f); This gives me totally blue triangle, whereas such solution: //vertices::position is XMFLOAT3 and vertices::color is XMFLOAT4     vertices[0].position = XMFLOAT3(-1.0f, -1.0f, 0.0f);  // Bottom left.     vertices[0].color = XMFLOAT4(0.0f, 1.0f, 0.0f, 1.0f);     vertices[1].position = XMFLOAT3(0.0f, 1.0f, 0.0f);  // Top middle.     vertices[1].color = XMFLOAT4(0.0f, 1.0f, 0.0f, 1.0f);     vertices[2].position = XMFLOAT3(1.0f, -1.0f, 0.0f);  // Bottom right.     vertices[2].color = XMFLOAT4(0.0f, 1.0f, 0.0f, 1.0f); with types changed to XMFLOAT3 for position and XMFLOAT4 for color gives me correct green triangle.   I've played a bit with these coordinates and tried to make a green triangle using XMVECTORs but it doesn't seem to have any sense. The XMVECTOR solution returns wrong colors all the time whatever I set the coordinates to... I know that it might be hard to guess what's the problem without some details but maybe somebody has an idea of what might be wrong?   PS: When I change the last parameter in XMVECTOR solution for vertices::position to 1.0f it changes the color near that vertex to pink, which is completely wierd for me as position shouldn't have impact on color?
  14. thanks for answer!     A lot of libs (I'm using Bullet Math) follow that pattern. You should be very careful again, with passing aligned values, type casting ect. (http://msdn.microsoft.com/en-us/library/83ythb65.aspx)  So it's a better solution if I want my program / game run as smoothly and quickly as possible?   According to my 2. question I mean: I have a XMFLOAT3A Position in my class and Should I use something like this: DirectX::XMStoreFloat3A(&Position, DirectX::XMVectorSet(-1.0f, -1.0f, 0.0f, 0.0f)); or: Position.x = -1.0f; Position.y = -1.0f Position.z = 0.0f; to fill the Position vector? What's faster and better to use? or it doesn't matter at all?
  15. Hey, I'm wondering how should I get around with those types. I've read that I should use XMFLOATs in classes for storage purposes but for calculations convert it to local XMVECTOR objects.   1. Doesn't converting from one type to another, whenever I need to calculate sth, slow down the performance? Isn't it better to properly align class and then always use XMVECTOR in it?   2. If I should use XMFLOAT for storage purposes, does it mean that I should use XMFLOAT for filling the data as well? Or is it faster and preferred to use XMVECTOR to fill data  in there and then store it in XMFLOAT?   I would be very grateful for answering 2 of my questions!
  • Advertisement

Important Information

By using GameDev.net, you agree to our community Guidelines, Terms of Use, and Privacy Policy.

GameDev.net is your game development community. Create an account for your GameDev Portfolio and participate in the largest developer community in the games industry.

Sign me up!