WARP vs UAVs in PS

Started by
7 comments, last by pcmaster 6 years, 2 months ago

Hi all, I have another "niche" architecture error :(

On our building servers, we're using head-less machines on which we're running DX11 WARP in a console session, that is D3D_DRIVER_TYPE_WARP plus D3D_FEATURE_LEVEL_11_0. It's Windows 7 or Windows Server 2008 R2 with "Platform Update for Windows 7". Everything's been fine, it's running all kinds of complex rendering, compute shaders, UAVs, everything fine and even fast.

The problem: Writes to a cubemap array specific slice and specific mipmap using PS+UAV seem to be dropped.

Do note that with D3D_DRIVER_TYPE_HARDWARE it works correctly; I can reproduce the bug on any normal workstation (also Windows 7 x64) with D3D_DRIVER_TYPE_WARP.

The shader in question is a simple average 4->1 mipmapping PS, which samples a source SRV texture and writes into a UAV like this:
 


RWTexture2DArray<float4> array2d;

array2d[int3(xy, arrayIdx)] = avg_float4_value;

The output merger is set to do no RT writes, the only output is via that one UAV.

Note again that with a normal HW driver (GeForce) it works right, but with WARP it doesn't.

Any ideas how I could debug this, to be sure it's really WARP causing this? :)Do you think RenderDoc will capture also a WARP application (using their StartFrameCapture/EndFrameCapture API of course, since the there's no window nor swap chain)? EDIT: RenderDoc does make a capture even with WARP, wow :o

Thanks!

Advertisement

Well, bad news is that I made two RenderDoc captures, and the one made on the real driver renders fine and the one with WARP does not and I can clearly see the draw-calls in question not doing anything there :(

To be even more specific, the DXBC goes like this:


ps_5_0
      dcl_globalFlags refactoringAllowed
      dcl_constantbuffer cb0[2], immediateIndexed
      dcl_sampler s0, mode_default
      dcl_resource_texture2darray (float,float,float,float) t0
      dcl_uav_typed_texture2darray (float,float,float,float) u1
      dcl_input_ps_siv v0.xy, position
      dcl_input_ps linear v1.xy
      dcl_output o0.xyzw
      dcl_temps 5
  
   0: ftoi r0.xy, v0.xyxx		// pixel XY
   ...
   2: ftoi r0.zw, cb0[1].xxxx	// array slice
   ...
   // sample the SRV
   23: sample_indexable(texture2darray)(float,float,float,float) r2.xyzw, r1.xyzx, t0.xyzw, s0
   ..
   // output (r0.xyz will be used as the address -- I hope)
   26: store_uav_typed u1.xyzw, r0.xyzw, r2.xyzw
   // fake output
   27: mov o0.xyzw, l(0, 0, 0, 0)
   28: ret

The only weird thing I can think of is that my UAV is u1 (not u0) and there are no RTVs, i.e. OMSetRenderTargetsAndUnorderedAccessViews(D3D11_KEEP_RENDER_TARGETS_AND_DEPTH_STENCIL, nullptr, nullptr, 0, 2, { null, myUAV }, nullptr);

So maybe WARP doesn't like this?

Note that Windows 7 likely won't get any updates for WARP at this point. I do seem to recall an issue with UAV-only draw calls not working correctly. Are you able to try your app on a more recent OS's version of WARP?

Unfortunately I can't test it on Windows 10 :( I will try to give it a fake RT (and a fake u0 UAV) to see if I can fool it.

Okay. Filling the first unused UAV slot doesn't help, however binding a fake RT to slot 0 does help. It's fine with WARP then. I want to cry for almost a whole day wasted figuring this out :(

Thank you Microsoft :(

 

What is your uav start slot when calling OMSetRenderTargetsAndUnorderedAccessViews ? If you use u1 as a register, it is likely to be 1, but if you have no RT bound, i could imagine you set it to 0 by mistake, creating a missmatch.

4 hours ago, pcmaster said:

Okay. Filling the first unused UAV slot doesn't help, however binding a fake RT to slot 0 does help. It's fine with WARP then. I want to cry for almost a whole day wasted figuring this out :(

Thank you Microsoft :(

 

You lost a single day on an issue and want to cry, good luck when you will get a monthly long unresolved bug :)

Over the years and platforms this is far from the only bug that makes me wanna cry :(

My UAV start slot is 1 (u1) indeed, but I do not have a missmatch there. As I said, it works fine with normal drivers and it works fine now with setting a dummy (big enough) RTV with WARP.

 

This topic is closed to new replies.

Advertisement