Use Buffer or Texture, PS or CS for GPU Image Processing?

Started by
22 comments, last by Dingleberry 7 years, 11 months ago

How do you undo the morton pattern? Would you create it with something like D3D12_TEXTURE_LAYOUT_STANDARD_SWIZZLE? That seems not not be the intended usage of the flag but it also implies that an undefined swizzle will be, well, undefined.

For the unknown swizzles, you can only guess standard patterns, and attempt to see if they're faster (since it's GPU specific). Obviously on console it's way easier since the HW is fixed.
Checkout this tweet history.

If the hardware is automatically translating texture indices for you, could you maybe alias the texture memory as a buffer, write a known pattern, and then undo the pattern?

I suppose you could but I'm not sure if it's legal to do that. Might not work on future hardware? Honestly I don't know.

Advertisement

I have a maxwell gpu at home so I only have tier 1 resource heaps and can't alias them like that :(. But I think it should work if you created a buffer and texture out of the same heap memory?

I did some testing and if you create a buffer with sequential data, copy it to a texture that was created using CreatePlacedResource, and then call ReadFromSubresource on the texture it returns a really funny pattern:


2, 64, 258, 320, 6, 68, 262, 324 ...

Kind of weird because doing this on a texture created CreateCommittedResource yields sequential data. I don't think this is a useful access pattern that it's giving me back. I'm not really sure what I'm looking at.

I did some testing and if you create a buffer with sequential data, copy it to a texture that was created using CreatePlacedResource, and then call ReadFromSubresource on the texture it returns a really funny pattern:


2, 64, 258, 320, 6, 68, 262, 324 ...

Kind of weird because doing this on a texture created CreateCommittedResource yields sequential data. I don't think this is a useful access pattern that it's giving me back. I'm not really sure what I'm looking at.

You're probably looking at a vendor-specific D3D12_TEXTURE_LAYOUT_UNKNOWN ordering.

What texture layout and resource state arguments did you give to those functions?

It was an unknown layout and copy-dest since it was in a readback heap. I tried putting it into the default heap too but that didn't change anything. So it's just upload buffer -> readback texture, results vary based on whether it's made with CreateCommittedResource/CreatePlacedResource.

Interesting :) Well, now you're probing the undefined internal behaviors of your vendor's specific d3d driver logic :lol:

Converting from the initial data format into the optimized "unknown layout" has a cost associated with it -- drivers must make a guess whether they'll pay this cost at all (in order to make later memory accesses perform faster) and if so, at what point they will pay that cost. You've discovered that your driver is deciding to perform this transformation sooner if the user is performing their own memory management, and later if the user asks the driver to perform the memory management... I guess that the driver is making the guess that a placement resource will be longer lived than a driver-managed resource?

BTW, yes, those numbers you posted do seem like some kind of Morton order / z-order curve, possibly unique to your GPU.

Hey Guys,

I got a interesting finding:

For linear buffer reading Compute Shader is around 15% faster than Pixel Shader in my GTX 680m.

To be more specific: My dx12 program created a permanently mapped buffer in Upload heap and will copy image buffer from camera. Then I can use CS or PS to copy(render) this image into a Texture (swizzled buffer in default heap). So the CS/PS is simple and almost the same, just read and write/output, no TGSM (threadgroup shared memory) used in CS.

My expectation is that, since I didn't use TGSM in CS, PS should run a little faster, since I observed this PS speed up long time ago when copy Texture to another Texture buffer (but these texture buffers are all swizzled in default heap). However, the result suggests that for reading unswizzled buffer in Upload heap CS may be faster...

I can't find any reasonable explanations for this. So it will be greatly appreciated if someone could confirm this (it may be possible that I messed up something in PS to get worse result than CS) and explain why CS is faster.

Thanks in advance

Peng

Why aren't you just using a copy function to copy the texture data? Shouldn't that ideally be the fastest way since it doesn't involve pipeline state?

Also, yeah, pixel shaders probably won't be as fast at reading linear data because they render in sets of 2x2 quads, not like 32x1 lines. But your typical texture isn't going to be stored linearly -- even in this case you're really copying from a buffer and storing to a texture, not reading and writing to/from a texture, which is the more common gpu image operation.

https://developer.nvidia.com/sites/default/files/akamai/gameworks/images/lifeofatriangle/fermipipeline_distribution.png

Basically just note that stuff gets shaded in blocks -- if it were a linear distribution of shaders, you'd see it shaded as horizontal line strips.

Why aren't you just using a copy function to copy the texture data? Shouldn't that ideally be the fastest way since it doesn't involve pipeline state?

Also, yeah, pixel shaders probably won't be as fast at reading linear data because they render in sets of 2x2 quads, not like 32x1 lines. But your typical texture isn't going to be stored linearly -- even in this case you're really copying from a buffer and storing to a texture, not reading and writing to/from a texture, which is the more common gpu image operation.

https://developer.nvidia.com/sites/default/files/akamai/gameworks/images/lifeofatriangle/fermipipeline_distribution.png

Aha, I should be honest that I am not just read and output, I convert the data before output to satisfy my data processing requirement. And that's why I never think of using the copy function.

And thanks for your reply, that really help. But another following question is that the output is texture, so write to buffer is not linear, will PS be benefited by that? since PS should be optimized for swizzled and compressed output to rendertarget?

Thanks again

Are you rasterizing the colors using your pixel shader or writing them to a uav texture?

This topic is closed to new replies.

Advertisement