Jump to content
  • Advertisement
Sign in to follow this  
lomateron

texture resource as shader input and render target

This topic is 2138 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

what is the reason a texture can not be in the shader input and as a render target at the same time when drawing?

Edited by lomateron

Share this post


Link to post
Share on other sites
Advertisement

Rendering is not physically immediate, so when you would sample the render target (while rendering to that same target), some texels would be ready and some would not. The results would be highly indeterministic. Also take into account that if the GPU did implement synchronization for this scenario, each texel would potentially have to wait for each and every other texel of the resource; with nanosecond-scale time budgets for each texel, this is simply not feasible.

 

Essentially, the limitation is there to prevent this race condition.

Edited by Nik02

Share this post


Link to post
Share on other sites

Can you explain that in more detail plz, the part were you say some texels would be ready and some would not

 

Remember that the GPU is a highly parallel processor, and it could be executing the pixel shader for many pixels at the exact same time in parallel. Let's say your GPU is executing the pixel shader for pixels (20,30) and (21,31) at the same time. Now, if your render target is also bound as a shader input, it would be possible for the shader being executed for pixel (20,30) to try and sample a texel at location (21,31), because the pixel shader can sample any location in a texture. So when you sample the texel at (21,31), what value should be returned?

 

This is what Nik02 was referring to when he said some texels would be ready and some would not. (21,31) is not ready, because it's currently being worked on alongside (20,30). You might think that the original value at (21,31) should be returned, but what if the pixel shader for (21,31) finishes before the shader for (20,30) requests that texel? Then it would be impossible to get the original value without having copied it somewhere safe ahead of time, which would have a significant performance cost. If you want the "new" value at (21,31), then you'll have to stall the shader for (20,30) until (21,31) is written, which is also a big performance cost because it serializes the work that needs to be done. 

 

The gist of the problem is this: if you have a texture bound as a render target and as a shader input, then you have a major data hazard problem because you have dozens of parallel processes reading and writing to that shared resource (the texture). The only way around that data hazard is to perform some kind of synchronization, which is far too expensive.

Share this post


Link to post
Share on other sites

The GPU works by computing the values of lots of pixels at the same time in parallel. The same way as on the CPU you might use more than one thread for it.

 

To make sure that the results from the rendering are deterministic you can't have one thread reading the same pixel as a texture as another thread is writing to as a render target.

 

To enforce that D3D doesn't allow the same texture to be used for both things at the same time.

 

The standard way to avoid this is to make a copy of the render target, which you can then read back from later on.

 

I believe compute shaders also can work round the limitation by using a UAV and some manual synchronization, but if you need much synchronization then it will be slow.

Share this post


Link to post
Share on other sites

And this is not limited to GPU: even most old school CPU based image processing algorithms have this requirement.

 

In order for an algorithm that runs on an image to be able to have the same input and output buffer, it pretty much must operate deterministically on a single pixel: the output value must only depend on the input value at that very specific coordinate and the value must be constant  and not depend on the exact moment in time it has been computed. A few simple algorithms meet this requirement, but most require an output buffer of the same size as the input buffer.

 

Luckily, this is not a practical problem: just use render targets. Don't be afraid to use as many as you need, but not more :). I practice I have found that I need about 2-3 full sized render targets, two half sized and two quarter sized, this without using deferred rendering. And reuse buffers as needed. Implementing a simple rolling buffer system, where you dynamically determine which buffer you get as a default render target allows you create a very simple system that support any number of post-processing steps with ease.

Share this post


Link to post
Share on other sites
So it all comes to, the GPU can't read and write same place in memory
Is this because the hardware can't do this or because of the use of undetermined values that the memory will have as you read and write, for example a process was in the middle of writing to a float, it changed the first byte of the 4 that it has and suddenly this other process reads the float, so the float it reads is wrong.

Share this post


Link to post
Share on other sites

Can you explain that in more detail plz, the part were you say some texels would be ready and some would not

 
Remember that the GPU is a highly parallel processor, and it could be executing the pixel shader for many pixels at the exact same time in parallel. Let's say your GPU is executing the pixel shader for pixels (20,30) and (21,31) at the same time. Now, if your render target is also bound as a shader input, it would be possible for the shader being executed for pixel (20,30) to try and sample a texel at location (21,31), because the pixel shader can sample any location in a texture. So when you sample the texel at (21,31), what value should be returned?


Although it's worth pointing out that sampling from and rendering to the same target is problematic even without parallelism, and this also has implications for parallel processors.

Even if the GPU was completely serial, processing only one pixel at a time, there would still be issues arising from the simple fact that at any given time, some pixels would contain data that has been processed by the shader, while others will still contain unprocessed data. Sampling pixels that have already been processed ultimately means that pixels that are processed earlier have more influence over the final image that those that are processed later, and can possible produce wildy different results than anticipated - only the pixels that have not yet been processed are "safe" to sample. (Conversely, all the pixels that have been processed are "altered" - they will produce potentially unexpected results if you sample them.)

As an example of how the results cam vary greatly by sampling already-processed pixels, let's look at a simple shader that averages the current pixel with one of it's neighbors, and an image (which will be our source and destination) that consists of alternating 1-pixel wide vertical black and white lines, running on a GPU that processes pixels serially from left to right. If the shader averages each pixel with the pixel to it's right, you get what one would expect: a flat field of 50% gray (with the exception of the rightmost pixel column). But if it averages each pixel with the pixel to it's left (which will have already been processed), the result is a series of 1-pixel wide vertical lines alternating between 1/3 grey and 2/3 grey, with a handful of lines at the left edge getting closer to white or black depending on the starting color of the left edge (or right edge, if the texture coordinates wrap).

In a purely serial GPU, this can be dealt with through proper shader design if the GPU processes pixels in a purely deterministic manner. For example, if the GPU always draws pixels from left to right, and from top to bottom, a shader could avoid these issues by simply never sampling pixels to the left of or above the current pixel. On the other hand, you could sample "altered" pixels as well, just so long as you are aware of how that will affect the output.

---

However, in a parellel-processing system, the regions of "safe" pixels (those that have not yet been processed) and "altered" pixels can no longer be determined solely from the location of the current pixel - they are also affected by the number of threads running in parallel, the dimensions of the texture, how the GPU allocates pixels to threads, and whether or not parallel threads are running in lock-step with one another. Some of these factors are not available to the programmer or can change over time. (Furthermore, there is now a region of pixels that are neither "safe" nor "altered" - these are the pixels that are currently being processed, which means that sampling them can cause data dependency hazards.)

Because the boundary between "safe" and "altered" is unpredictable on a parallel-processing GPU, is it basically impossible to write a pixel shader that samples it's own render target and produces a predictable result.

The last thing to point out is that multi-pipeline (i.e. parallel-processing) GPUs predate programmable pixel shaders, and so every GPU with programmable pixel shaders is a parallel processor, meaning that predictable target-sampling shaders cannot be written for any device that actually supports programmable shaders.

Which is another reason, on top of the data dependency hazard you mentioned, why it's not allowed.

Share this post


Link to post
Share on other sites
Sign in to follow this  

  • Advertisement
×

Important Information

By using GameDev.net, you agree to our community Guidelines, Terms of Use, and Privacy Policy.

GameDev.net is your game development community. Create an account for your GameDev Portfolio and participate in the largest developer community in the games industry.

Sign me up!