Jump to content
  • Advertisement


  • Content Count

  • Joined

  • Last visited

Community Reputation

283 Neutral

About Jalibr

  • Rank
  1. It is known that *writing* to an out of bounds location in a UAV will invalidate all later reads, but reading should be well defined. I imagine if you try this on REF it should work.
  2. Jalibr

    .fx compilation slow

    Without seeing actual shader code I can't diagnose any perf issues, but for now there are really three cases that will cause the compiler to take a lot of time. We know compiler perf problems can be super painful, so we're working on solutions for these. A lot of this stems from the fact that the compiler was initially intended for far simpler programs (on the order of tens to hundreds of instructions on a few inputs/outputs), so algorithms were chosen that are not appropriate for larger programs. The first is loop simulation, if you have a complex loop the compiler generally tries to give up as soon as possible (for 4_0+, earlier shader models have more restrictions), but if you have a loop like for( i = 0; i < 1024; i++) the compiler will try to analyze if it's worth unrolling. This gets progressively worse as you include nesting of other flow control inside of the loop. If your shader doesn't benefit from this analysis, you can use the [fastopt] attribute on the loop, ie: [fastopt] for(...). The second is array analysis, the larger your arrays, the slower the compile will be. If you can shrink your arrays to only what is necessary this can help significantly. If you have a ton of large arrays that you're accessing often, the compiler will be slow. The last thing is program size, unfortunately there are still parts of the compiler that are N^2 in the size of the program. So if you have long loops with complex bodies that are unrolled, or simply just a ton of code, it can take a while to compile.
  3. Jalibr

    [HLSL] Never EVER unroll?

    In the next compiler release, we've added an attribute called [fastopt] that tells the compiler not to bother simulating the loop. It will work on any target that supports the break instruction, though there is a limitation on SM3 pixel shaders, due to complexities involved with gradient operations. If your pixel shader does a gradient operation (something that requires implicit derivative calculation, such as a tex2D call, but not a tex2Dlod call), then the compiler will continue simulating the loop.
  4. The number of iterations a loop can run for is limited to 255 for shader model 3 (http://msdn.microsoft.com/en-us/library/bb174715(VS.85).aspx). If this is too limiting, you'll have to use D3D10, which has no such limits.
  5. Jalibr

    ID3DXBaseEffect::SetValue Bug?

    It looks like you should be getting this debug spew: ID3DXEffect::SetValue: Data size mismatch So it looks to be expected behavior, ::SetValue expects the entire data type to be set. Note that you can pass D3DX_DEFAULT for the size and it will assume that you passed in enough data to fill the value.
  6. All of the reflection code is hosted in d3dcompiler.dll, so there shouldn't be any requirements on Vista SP1.
  7. That's the version I just checked in for March, so yes it's a modified version of the one in August/November.
  8. I was mistaken about the include handler not getting called for the top-level file. As it turns out, if you convert the passed in string from UTF-8 to a unicode string, it works as expected. Here is our default include handler: class CD3DInclude : public ID3D10Include { protected: WCHAR m_pIncludePath[MAX_PATH]; public: HRESULT STDMETHODCALLTYPE Open(D3D10_INCLUDE_TYPE IncludeType, LPCSTR pOrigFileName, LPCVOID pParentData, LPCVOID *ppData, UINT *pBytes) { HANDLE hFile = INVALID_HANDLE_VALUE; HRESULT hr = S_OK; LARGE_INTEGER fileSize; void *pData = NULL; DWORD bytesRead = 0; WCHAR pFileName[MAX_PATH]; WCHAR fullFileName[MAX_PATH]; VB( ppData && pBytes ); MultiByteToWideChar(CP_UTF8, 0, pOrigFileName, -1, pFileName, ARRAYSIZE(pFileName)); pFileName[ARRAYSIZE(pFileName)-1] = 0; StringCchPrintfW(fullFileName, ARRAYSIZE(fullFileName), L"%s%s", m_pIncludePath, pFileName); hFile = CreateFileW(fullFileName, GENERIC_READ, FILE_SHARE_READ, NULL, OPEN_EXISTING, NULL, NULL); if( hFile == INVALID_HANDLE_VALUE ) { // Try opening the file with no directory added. LPWSTR pFilePart; DWORD returnVal = GetFullPathNameW(pFileName, ARRAYSIZE(fullFileName), fullFileName, &pFilePart); if (returnVal == 0 || returnVal > MAX_PATH) { hr = E_FAIL; goto lExit; } else if(pFilePart == NULL) { CloseHandle(hFile); hr = E_FAIL; goto lExit; } hFile = CreateFileW(fullFileName, GENERIC_READ, FILE_SHARE_READ, NULL, OPEN_EXISTING, NULL, NULL); } VB( hFile != INVALID_HANDLE_VALUE ); VB( GetFileSizeEx(hFile, &fileSize) ); VB( fileSize.HighPart == 0); VN( pData = NEW BYTE[fileSize.LowPart] ); VB( ReadFile(hFile, pData, fileSize.LowPart, &bytesRead, NULL) ); if (bytesRead != fileSize.LowPart) { VH( E_FAIL ); } *ppData = pData; *pBytes = fileSize.LowPart; pData = NULL; lExit: SAFE_DELETE_ARRAY(pData); CloseHandle(hFile); return hr; } CD3DInclude() { m_pIncludePath[0] = 0; } HRESULT STDMETHODCALLTYPE Close(LPCVOID pData) { SAFE_DELETE_ARRAY(pData); return S_OK; } HRESULT Initialize(LPCSTR pIncludePath) { m_pIncludePath[0] = 0; if (pIncludePath) { WCHAR *pszNew = NULL; WCHAR pFileName[MAX_PATH]; MultiByteToWideChar(CP_UTF8, 0, pIncludePath, -1, pFileName, ARRAYSIZE(pFileName)); pFileName[ARRAYSIZE(pFileName)-1] = 0; if (GetFullPathNameW(pFileName, ARRAYSIZE(m_pIncludePath), m_pIncludePath, &pszNew) == 0) { return HRESULT_FROM_WIN32(GetLastError()); } if (pszNew) { *pszNew = 0; } } return S_OK; } HRESULT Initialize(LPCWSTR pIncludePath) { m_pIncludePath[0] = 0; if (pIncludePath) { WCHAR *pszNew = NULL; if (GetFullPathNameW(pIncludePath, ARRAYSIZE(m_pIncludePath), m_pIncludePath, &pszNew) == 0) { return HRESULT_FROM_WIN32(GetLastError()); } if (pszNew) { *pszNew = 0; } } return S_OK; } };
  9. Are you, by any chance, including the file that is resulting in an error getting printed? It looks like the top-level file should get created properly, but the default include handler just converts the wide string to a character string. If this is the case, then you should be able to work around this issue by using your own include handler. I'll look into fixing our include handler, and I'll post the source to this thread when I've verified that it's working correctly. This probably won't make the November release, though, since we're way past lockdown.
  10. Try this: float4 vs_main( float4 pos : POSITION ) : POSITION { return pos; } texture2D diffuse_tex0 : register( t0 ); texture2D diffuse_tex1 : register( t1 ); sampler diffuse_samp0 : register( s0 ) = sampler_state { Texture = <diffuse_tex0>; }; sampler diffuse_samp1 : register( s1 ) = sampler_state { Texture = <diffuse_tex1>; }; float4 ps_main() : SV_Target0 { return diffuse_tex1.Sample( diffuse_samp1, float2( 0.5f, 0.5f ) ) * diffuse_tex0.Sample( diffuse_samp0, float2( 0.5f, 0.5f ) ); } ------------------ You're not referencing the textures at all, so it shouldn't be surprising that your register selections aren't doing anything. Instead you're hitting the compiler's DX9 compatibility path (where you can still specify resource registers, but you have to put the t binding on the sampler).
  11. Jalibr

    Error X3539

    It means that you're compiling to a target (ps_1_x) that we've deprecated in the compiler. It gives two workarounds we've provided, which is to change it to the lowest target (ps_2_0) or to use an older DLL (and we've conveniently provided a compiler switch/flag that does this for you).
  12. http://msdn.microsoft.com/en-us/library/bb205441(VS.85).aspx It says that D3DXSHADER_ENABLE_BACKWARDS_COMPATIBILITY is only supported by the D3D10 version of the compiler (that is, a version that supports D3D10 targets). This is essentially every compiler that has been released since December '06.
  13. Jalibr

    DirectX SDK August 2008

    Just in case you wanted to know the next release, internally we've been calling it the November release, though there is a distinct possibility that we may delay it by a month.
  14. Jalibr

    Directx Internal Optimization

    The D3D10 runtime generally (and in this case, definitely) turns redundant calls like this into a no-op before it gets to the driver. There is still some overhead in any API call, but it's probably as fast or faster than you checking yourself to see if the handles match.
  15. The D3DX implementation of the adjacency calculator is a two-step process. First it decides which vertices are equivalent to each other (aka point reps), then for each point rep, it gets a list of triangles that surround that point. Then if two points are connected by two triangles, those triangles are set to be adjacent to each other. You'd do a very similar thing, except that instead of marking the triangle that is adjacent, you just insert the opposing point.
  • Advertisement

Important Information

By using GameDev.net, you agree to our community Guidelines, Terms of Use, and Privacy Policy.

GameDev.net is your game development community. Create an account for your GameDev Portfolio and participate in the largest developer community in the games industry.

Sign me up!