DX9 Creating Many Threads

Started by
11 comments, last by MysteryX 8 years, 6 months ago
You could do the conversion in pixel shader.

Niko Suni

Advertisement

If for some strange reason you can't use the GPU for the conversion, with recent CPUs there are intrinsics to convert back and forth from float to half quickly. See http://blogs.msdn.com/b/chuckw/archive/2012/09/11/directxmath-f16c-and-fma.asp

The intrinsics are: _mm_cvtps_ph() and _mm_cvtph_ps() which require F16C instruction set support.

Even if you can't use those intrinsics the other DirectXMath functions may be quicker than the functions you're currently using.

I was doing the conversion pixel by pixel which was taking forever. The updated code is processing them all at once with a buffer, which is much faster.

Is there a performance difference between DirectX Math and the DX9 conversion function?

Or, is there a way to convert straight from INT into half-float? Converting within the shader could be a good idea; but converting UINT16 to float16 would cause cropping.

As for the memory usage and many threads and devices being created, I'll see if I could chain the various shaders one after the other in the same instance, where I would reconfigure the same device with different shaders each step in the chain. This would probably drastically improve memory usage and performance.

Edit: I replaced the DX9 function with DirectX Math. Performance went up from 18.5fps to 20fps.

This topic is closed to new replies.

Advertisement