DieterVW

Members
  • Content count

    556
  • Joined

  • Last visited

Community Reputation

724 Good

About DieterVW

  • Rank
    Advanced Member
  1. DX11

    It depends on your render target format. and format with NORM is from [-1.0f,1.0f] UNROM is from [0.0f,1.0f] FLOAT is from [minfloat,maxfloat, nan, inf] So you can do it if you use the right format.
  2. My speculation for the flush command is that the runtime or drive may see the acquire/release as being very costly and so return early rather than blocking the caller for such a long period of time. Other basic commands may not suffer from this. I honestly couldn't say for sure, but the documentation does say it's not a reliable mechanism. Perhaps in practice most people find it to be reliable, but it'll not be guaranteed, even between release or patches. The query is just a simple object that can indicate when a command has finished. You'll create an ID3D11Query object using the ID3D11:CreateQuery method and use a description with the D3D11_QUERY_EVENT flag. Then in the code you will call ID3D11DeviceContext::End() with the query object once the drawing of all D3D11 context is complete. Now, before DX9 can continue with it's work, you'll have to continue to check ID3D11DeviceContext::GetData() until a result of TRUE is returned. This function will return S_OK and the out data which is a bool should read TRUE. If the out value is false it implies that rendering has not completed. This call let's you check the status immediately, or it can block and flush the pipeline until the result will return TRUE. TRUE indicates that the pipeline has completed the query and all prior submitted D3D commands. Spinning on this will be costly and could drive CPU usage to 100% so I recommend either making the blocking call or having the thread do some other work and checking the result periodically.
  3. So I'm not sure about this approach to using D3D11 with DX9. It looks as though the DX11 call to flush is expected to block and therefore enforce synchronization where all DX11 content will be rendered before proceeding. However, as described [url="http://msdn.microsoft.com/en-us/library/ff476425(v=VS.85).aspx"]here[/url] the call to flush is asynchronous. It may or may not return before all rendering is actually done. It looks like the lock/release on the other resource has enough overhead to cause flush to return before the triangle is drawn. When that happens, a small percentage of the time, D3D9 presents the incomplete frame. The main trouble is that the shared resource on the DX11 size has no locking mechanism to ensure that it completes before dx9 grabs the data. And since flush doesn't block until rendering is guaranteed to be complete your left with having to poll the device with a query instead, which will eat time. This is a good example for why the keyed mutex approach was added.
  4. I think we'll need more code to work out what the problem is. The code blocks from the first post don't show the usage scenario that you describe in the second post. Otherwise, it sounds like you have the right approach in mind. Be sure that the keyed mutex stuff is used throughout all usages of the shared keyed mutex resource: create keyed mutex shared rendertarget in D3D11 or perhaps. get D2D handle from the above render target. get mutex for both versions of the render target mutex10.acquire(0); // D2D or other device rendering mutex10.release(1); mutex11.acquire(1); // D3D11 rendering // do one of the following // 1. use as shader resource // 2. copy to swapchain back buffer // 3. call present if this is the swapchain's buffer mutex11.release(0); I don't know if you're having both render to the same shared render target or if one is rendering to it and the other is using it as a shader resource for drawing. Either way, provided that the resource always protected when being used there shouldn't be any sync issues as you're describing.
  5. DX11

    3. In addition to the above, new DX10+ drivers released with/after Win7 may have updates to support downlevel features such as multithreading and compute shader. That means older hardware can support some of the newer features if the driver was updated. Cap flags were added to the D3D11 API to check if there is support for these optional downlevel features.
  6. Using FXC.exe to compile your shader you'll see that the array will be split across multiple registers where each uses the same semantic with an incremented postfix index. Catching it in the next stage just requires the same semantic approach. [code]void main( inout float4 pos : SV_Position, out uint3 data[3] : MyData ) { data[0] = uint3( 4, 6, 8 ); data[1] = uint3( 5, 2, 9 ); data[2] = uint3( 2, 7, 1 ); }[/code] asm from above shader [code]// fxc d:\repro\r041.fx /Tvs_4_0 // // // // Input signature: // // Name Index Mask Register SysValue Format Used // -------------------- ----- ------ -------- -------- ------- ------ // SV_Position 0 xyzw 0 NONE float xyzw // // // Output signature: // // Name Index Mask Register SysValue Format Used // -------------------- ----- ------ -------- -------- ------- ------ // SV_Position 0 xyzw 0 POS float xyzw // MyData 0 xyz 1 NONE uint xyz // MyData 1 xyz 2 NONE uint xyz // MyData 2 xyz 3 NONE uint xyz // vs_4_0 dcl_input v0.xyzw dcl_output_siv o0.xyzw, position dcl_output o1.xyz dcl_output o2.xyz dcl_output o3.xyz mov o1.xyz, l(4,6,8,0) mov o2.xyz, l(5,2,9,0) mov o3.xyz, l(2,7,1,0) mov o0.xyzw, v0.xyzw ret // Approximately 5 instruction slots used[/code] Pixel shader to catch the array: [code]uint4 main2( uint3 data[3] : MyData ) : SV_Target { return uint4( data[0].x, data[1].y, data[2].z, 1 ); } // or do this uint4 main3( uint3 data0 : MyData0, uint3 data1 : MyData1, uint3 data2 : MyData2 ) : SV_Target { return uint4( data0.x, data1.y, data2.z, 1 ); }[/code] asm from pixel shader [code]// fxc d:\repro\r041.fx /Tps_4_0 /Emain2 // // // // Input signature: // // Name Index Mask Register SysValue Format Used // -------------------- ----- ------ -------- -------- ------- ------ // MyData 0 xyz 0 NONE uint x // MyData 1 xyz 1 NONE uint y // MyData 2 xyz 2 NONE uint z // // // Output signature: // // Name Index Mask Register SysValue Format Used // -------------------- ----- ------ -------- -------- ------- ------ // SV_Target 0 xyzw 0 TARGET uint xyzw // ps_4_0 dcl_input_ps constant v0.x dcl_input_ps constant v1.y dcl_input_ps constant v2.z dcl_output o0.xyzw mov o0.x, v0.x mov o0.y, v1.y mov o0.z, v2.z mov o0.w, l(1) ret // Approximately 5 instruction slots used // fxc d:\repro\r041.fx /Tps_4_0 /Emain3 // // // // Input signature: // // Name Index Mask Register SysValue Format Used // -------------------- ----- ------ -------- -------- ------- ------ // MyData 0 xyz 0 NONE uint x // MyData 1 xyz 1 NONE uint y // MyData 2 xyz 2 NONE uint z // // // Output signature: // // Name Index Mask Register SysValue Format Used // -------------------- ----- ------ -------- -------- ------- ------ // SV_Target 0 xyzw 0 TARGET uint xyzw // ps_4_0 dcl_input_ps constant v0.x dcl_input_ps constant v1.y dcl_input_ps constant v2.z dcl_output o0.xyzw mov o0.x, v0.x mov o0.y, v1.y mov o0.z, v2.z mov o0.w, l(1) ret // Approximately 5 instruction slots used[/code]
  7. Yes, moving the points to infinity will cause them to be clipped by the rasterizer. It still comes down to neither the assembler nor the vertex shader have a way to discard geometry there by reducing work the rest of the pipeline would need to do. Enabling more parts of the pipeline does have overhead. You can use the tessellation pipeline but I wouldn't do that unless the tessellation pipeline was already in use. It may also not be worth using the geometry shader, unless it's already in use or the culling algorithm is more complicated than culling against the view frustum. The GS is generally only "slow" when something about the algorithm prevents all the hardware from being used efficiently. Examples of that are too many registers, or an output size that's too large. Simple culling isn't likely to be one of these cases.
  8. DX11

    The goal is to make feature groupings that cover a reasonable amount of hardware. That's a hard task given that there is such a wide variety of dx9 hardware and given that most of the dx9 features are optional. So yes, there could be a classification of cards that are sm3, but that group of cards would be so small as to be useless to anyone. So instead the feature scope was scaled back so that the classification would cover enough hardware to be meaningful. Of course that means some things like instruction counts didn't get the sm3 value. So the whole thing will require a lot of referencing of the 10Level9 documentation to use without surprises.
  9. You can't discard geometry from the vertex shader stage. For won, there'd be no way for the pipeline to enforce or ensure that you did it in groups of 3 for triangles, etc. Plus, what would it mean in the case of strips, etc. Most of the time this sort of culling is done on the CPU side or using simple occlusion culling on the GPU. For more fine tuned approach you can discard geometry from the geometry shader by just not stream out primitives that you want to cull. You can all use the rasterizer to cull/clip geometry.
  10. It's perfectly possible to use UpdateSubresource() to change the contents of a constant buffer. More detials can be found <a href="http://msdn.microsoft.com/en-us/library/ff476486%28v=vs.85%29.aspx">here</a> I think that the only thing you don't have set correctly is the src size of the data. I believe you need to set the srcRowPitch to the byte size of the data which also has to be the same size as the constant buffer itself. You can't partially update a constant buffer. The whole things has to be updated.
  11. I think you are looking for this #define D3D10_IA_VERTEX_INPUT_RESOURCE_SLOT_COUNT which is in the DX10 header and is set to the value 16. Why did you think it was less than that? Here you have used 7 streams and will have 9 left.
  12. I'd guess that you're mentioning 5 dynamic buffers due to different layouts. I'd suggest using 5 static buffers, each containing all of the necessary data. You'd maintain information on the offsets into those buffers for submission of different draw calls. That should get you the best of both worlds for most situations. Buffer changes, and in particular layout changes are often more costly then texture changes. You may also be able to reduce texture changes by using texture arrays.
  13. We need to see the code snippet and all of the parameter values you are passing to the function. On first guess I'd say that you are passing in the adapter but marking the device type as hardware instead of unknown. The device type must be unknown if you already have an adapter picked out.
  14. I don't believe that you can get the variable name. The semantic provides most of the value since that holds the meaning and is also uniquely distinguishable.
  15. You have to use Map/Unmap with a dynamic or staging surface. UpdateSubresource will only work with a default usage.