Reviewing Memory Transfers

Started by
1 comment, last by MysteryX 8 years, 4 months ago

I have written some code that takes video frames, run a series of HLSL pixel shaders on them, and returns the processed video.

I would like to review the overall memory transfers to see if it's done properly, if it could be optimized, and then I have some questions.

In this sample, we have 3 input textures out of 9 slots, and we run 3 shaders.
D = D3DPOOL_DEFAULT, S = D3DPOOL_SYSTEMMEM, R = D3DUSAGE_RENDERTARGET


m_InputTextures
Index  0 1 2 3 4 5 6 7 8 9 10 11
CPU    D D D                   S
GPU    R R R             R  R  R

m_RenderTargets contains one R texture per output resolution. The Render Target then gets copied into the next available index. Command1 outputs to RenderTarget[0] and then to index 9, Command2 outputs to index 10, Command3 outputs to index 11, etc. Only the final output needs to be copied from the GPU back onto the CPU, requiring a SYSTEMMEM texture.

All the processing is done with D3DFMT_A16B16G16R16 format.

The full code is here

Any comments so far?

Here are a few questions.

1. Do all GPU-side textures need to be defined as RenderTargets? If I don't, then StretchRect commands fails, which I'm using to move the data around.

2. Can I do the processing in D3DFMT_A16B16G16R16 and then return the result in D3DFMT_X8R8G8B8 to avoid converting from 16-bit to 8-bit on the CPU? If so, which textures do I have to change?

3. If I want to work with half-float data, can the input textures be D3DFMT_A32B32G32R32F and then do all the processing in D3DFMT_A16B16G16R16F, to avoid doing float to half-float conversion on the CPU? If so, which textures do I have to change? Similarly, I could pass input frames as D3DFMT_X8R8G8B8 and process it in D3DFMT_A16B16G16R16.

Edit: To be more specific, I support 3 pixel formats: D3DFMT_X8R8G8B8, D3DFMT_A16B16G16R16 and D3DFMT_A16B16G16R16F. I'd like to be able to take the input in D3DFMT_X8R8G8B8, do the processing in D3DFMT_A16B16G16R16F and give back the result in D3DFMT_A16B16G16R16 (use use any other combination). How can I do that?

Edit2: By changing the last RenderTarget and the last m_InputTextures set to D3DFMT_A16B16G16R16F, I'm able to read it back as half-float successfully. If I change it to D3DFMT_X8R8G8B8, however... the R and B are reversed, and the image is repeated twice side-by-side. Other than that, if I compare that with the regular 16-bit processing, the image is exactly the same which indicates it internally processed it as D3DFMT_A16B16G16R16. Creating the texture as D3DFMT_A8B8G8R8 fails.

Then I've also tried changing the input textures to D3DPOOL_SYSTEMMEM and using UpdateSurface instead of StretchRect. It works. However, performance is not better and memory usage is higher.

Advertisement

As an update, it appears that both the "inversed pixels" and "duplicated width" problems were bugs with my own code. I fixed those and I'm now able to process 16-bit data and getting a 8-bit result. I suppose the same will work for input textures.

And considering my (failed) test with UpdateSurface, it seems the way I'm doing it with StretchRect is the right way.

Edit: Got it working. It can now take 8-bit frames as input, process with half-float data and get the output as 8-bit frames, and all the data conversion is done on the GPU instead of the CPU.

I have a problem with the memory textures re-design. Changing the last RenderTarget and the textures to read the data back to D3DFMT_X8R8G8B8 works fine.

However, changing the very first textures in to D3DFMT_X8R8G8B8 causes image distortion

HR(m_pDevice->CreateOffscreenPlainSurface(width, height, m_formatIn, D3DPOOL_DEFAULT, &Obj->Memory, NULL));

Here are screenshots comparing the result between using the first input texture as D3DFMT_X8R8G8B8 vs D3DFMT_A16B16G16R16. All other textures in the processing chain are D3DFMT_A16B16G16R16.

Texture_In1.png Texture_In2.png

Is there a way to do the pixel conversion on the GPU without having such distortion?

This topic is closed to new replies.

Advertisement