Jump to content
  • Advertisement
Sign in to follow this  
Viik

Non-interleaved Deferred Shading of Interleaved Sample

This topic is 2993 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

"Non-interleaved Deferred Shading of Interleaved Sample Patterns" This technique is widely used in some of the realtime GI implementations. Numerous papers are refering to it when it comes to optimization of calculating influence of VPL's on particual fragment of G-buffer. But I'm having trouble understanding how to efficiently implement their two-pass method. From what I understood, on first pass G-buffer is split into let's say 8x8 parts, then during second-pass each sub part is sampled to create interleaved G-buffer. This make second-pass coherent. What I don't understand is what spliting during first pass means - do they mean that you create 8x8=64 separate textures? And sample from them during second pass? In that case second pass need to be done in several sub passes as you can't simply bind 64 textures at the same time. Would appreciate any tip on how to do this effectively\properly.

Share this post


Link to post
Share on other sites
Advertisement
Quote:
Original post by Viik
"Non-interleaved Deferred Shading of Interleaved Sample
Patterns"


This technique is widely used in some of the realtime GI implementations. Numerous papers are refering to it when it comes to optimization of calculating influence of VPL's on particual fragment of G-buffer.

Ahh, interesting article :)

Quote:
Original post by Viik
But I'm having trouble understanding how to efficiently implement their two-pass method.

From what I understood, on first pass G-buffer is split into let's say 8x8 parts, then during second-pass each sub part is sampled to create interleaved G-buffer. This make second-pass coherent.

Yes, the trick is to keep the sampling range to fill up a small target buffer as small as possible to benefit from GPU texture caching mechanisms.


Quote:
Original post by Viik
What I don't understand is what spliting during first pass means - do they mean that you create 8x8=64 separate textures? And sample from them during second pass? In that case second pass need to be done in several sub passes as you can't simply bind 64 textures at the same time.

As I understand it, each pass targets one buffer of the same size:

buffer_A = createGBufferOfScreensize();
buffer_B = createGBufferOfScreensize();

// rendering
renderSceneToBufferA()
// first pass splitting: this will create smaller blocks of
// predefined regions of the source buffer A
renderBufferAToBufferB()

// second pass splitting: this will interleave the small blocks to the final
// region size
renderBufferBToBufferA()

// Result: buffer A holds the original G buffer subdivied into nxm blocks
renderDifferentEffectsBasedOnBufferA()

// Final composition:
...


Quote:

Would appreciate any tip on how to do this effectively\properly.

[/quote]
It seems, that you are just shifting pixels of the original G-Buffer to other positions which will then be used to perform other postprocessing effects. The benefit of this approach is, that you can work with just one buffer and have a more or less automatic "downscaling" of the source g-buffer. Still it will only give you a preformance benefit if you use many different postprocessing passes in your engine. If you are just rendering 50 lights and don't use something like ssao or similar, this approach would be overkill. In this case just use a downsampled buffer of the original g-buffer to increase the performance of some passes.

Share this post


Link to post
Share on other sites
It's not downsampling, split G-buffer has the same size as original G-buffer. Pixels just arranged in a different way, they create are smaller blocks, like 8x8. Each such block containes one texel from one of the 8x8 parts that split G-buffer.
Let's say you have a 1024 lights, instead of calculating their influence on all pixels, you take 16 ligths from set of 1024 and apply them to 8x8 block, then next 16 you apply to next block and so on. After all light are calculated split G-Buffer is converted back to original one. So now you have all your original texels in their places and you can filter them, basically finding average influence of all 1024 light for every texels on screen. It removes high frequncies, but GI is low frequency anyway.


Quote:
this will create smaller blocks of
predefined regions of the source buffer A

A and B has the same size in your example, not sure how does that creates smaller blocks.

Share this post


Link to post
Share on other sites
It is not downsampling, but kind of :) Your light calculation will be done on a smaller sized version of your orignial screensized g-buffer. Ignoring quality for a moment, you could use multiple light-passes on a downsampled g-buffer version with almost the same performance.

The benefit of using an interleaved vs. a downsampled g-buffer could be better quality. Still you have to fight light bleeding while compositing the final image. A bilateral upsampling approach could reduce light bleeding. But high frequency in intensity (i.e. encountered when using high detailed normalmapping) could be a problem.

As I understand, having really lot of fullscreen lightsources will result in (much) better performance, whereas other post processing like SSAO will not really benefit from it.

It sounds really interesting, still normal mapping could be a show stopper (I encountered these kind of problems while testing SSAO in combination with a "normalmapped" g-buffer). It would be amazing if you would post some results of your testing :)

Quote:

A and B has the same size in your example, not sure how does that creates smaller blocks.

Think of the old television interlace modus. Your final images has a height of 480 pixel(or whatever), but it is composed of two half images with a height of 240 pixels. The television displays first all even lines, then all odd lines (interleaved). If you would just display the first half image at the upper screenspace and the second half image at the bottom screenspace, you would see two images of the same scene (even if they will not contain the same pixels!) with just half the height. You could do the same for horizontal space and repeat it many times, which will result in many smaller "blocks", all showing the same scene. But your television has still the same resolution. As already said, it is a kind of downsampling.

[Edited by - Ashaman73 on April 14, 2010 4:19:18 AM]

Share this post


Link to post
Share on other sites
Quote:
But high frequency in intensity (i.e. encountered when using high detailed normalmapping) could be a problem.

As mentioned it will be used for GI - low frequency lighting.

Quote:
The television displays first all even lines, then all odd lines (interleaved). If you would just display the first half image at the upper screenspace and the second half image at the bottom screenspace, you would see two images of the same scene (even if they will not contain the same pixels!)

This won't improve coherency and dosn't lead to desired pattern. It terms of performance it's the same as using a single pass approach and just read all texels from the same texture at ones. Thats why I'm trying to figureout how to build two-pass aproach that they describe.

I'm not looking for alternatives, just need advice from somebody who already implemented such technique. Anyway thanks for the help, I'll go with a simple one pass aproach and see how that works.

Share this post


Link to post
Share on other sites
Little disclaimer: This is how I understand the article :-)

Quote:
Original post by Viik
This won't improve coherency and dosn't lead to desired pattern. It terms of performance it's the same as using a single pass approach and just read all texels from the same texture at ones. Thats why I'm trying to figureout how to build two-pass aproach that they describe.

I'm not looking for alternatives, just need advice from somebody who already implemented such technique. Anyway thanks for the help, I'll go with a simple one pass aproach and see how that works.



This is not an alternative way of doing it, this is the technique they use :-) You could shift the pixel in just one pass, but to do this, you have to sample pixels which are far aways, which leads to texture cache trashing (bad for performace).

To avoid texture cache trashing they do a two pass shifting, each pass using a smaller sampling region, each pass working on a buffer of the same original size. It is really just shifting pixels :)

Let's say you want to divide your buffer into 2x2 regions:


source buffer
S1 | S2
-------
S3 | S4

target buffer
T1 | T2
-------
T3 | T4

In the one pass solution you would shift pixels from S1,S2,S3,S4 into T1. In this case you have to sample the "whole" S buffer to buildup T1.

If we take a closer look at the newly build region T1 it looks like:

S1a | S2a
-------
S3a | S4a

Where S1a,S2a,S3a,S4a contains pixels from the corresponding source buffer. T2,T3,T4 will contain according "blocks" of the source buffer. This means, that the source region S1 is divided into blocks S1a, S1b, S1c, S1d with T1 containing S1a, T2 containing S1b ...

The simple trick is to first build up this S1a,... blocks in the first pass, this means the source region S1 will be divided into:

S1a | S1b
-------
S1c | S1d

For this you only need to sample the S1 region, just 1/4 of the frame buffer.

In the second pass you just shift this blocks to its final position, so that your target region T1 looks like

S1a | S2a
-------
S3a | S4a

To do this you just have to sample a very small area !

I hope it is clearer now :-)







Share this post


Link to post
Share on other sites
Quote:
The simple trick is to first build up this S1a,... blocks in the first pass, this means the source region S1 will be divided into:

S1a | S1b
-------
S1c | S1d

This is the original content of the buffer, I don't see what you devide here.

Share this post


Link to post
Share on other sites
Source:

S1a | S1b | S2a | S2b
---------------------
S1c | S1d | S2c | S2d
---------------------
S3a | S3b | S4a | S4b
---------------------
S3c | S3d | S4c | S4d


Result after one-pass method:

S1a | S2a | S1b | S2b
---------------------
S3a | S4a | S3b | S4b
---------------------
S1c | S2c | S1d | S2d
---------------------
S3c | S3c | S3d | S4d

As I understood in two pass method, first pass is division of buffer into separate parts (textures):

S1a | S1b ..... S2a | S2b
--------- ..... ---------
S1c | S1d ..... S2c | S2d
. .
. .
S3a | S3b ..... S4a | S4b
--------- ..... ---------
S3c | S3d ..... S4c | S4d

And then combination of final image, as this is a separate textures, each texel will be read one after another at the same time from all textures.

Share this post


Link to post
Share on other sites
Quote:
Original post by Viik
Quote:
The simple trick is to first build up this S1a,... blocks in the first pass, this means the source region S1 will be divided into:

S1a | S1b
-------
S1c | S1d

This is the original content of the buffer, I don't see what you devide here.


It is hard to explain without the use of screenshots :/ Still I'm not giving up :-)

S1a is the "downsampled" version of S1 at a quartal of the original size. It contains all pixels at only even rows AND even columns. If we want to divide a region of the pixel size 8x8 into 2x2 blocks (first pass)

source buffer S
11 21 31 41 | 51 61 71 81
12 22 32 42 | 52 62 72 82
13 23 33 43 | 53 63 73 83
14 24 34 44 | 54 64 74 84 S1 | S2
------------------------- = -------
15 25 35 45 | 55 65 75 85 S3 | S4
16 26 36 46 | 56 66 76 86
17 27 37 47 | 57 67 77 87
18 28 38 48 | 58 68 78 88

first pass:
divide region S1...S4 into smaller blocks, here is an example about dividing S1 into smaller blocks S1a ... S1b
11 21 | 31 41 11 31 | 21 41
12 22 | 32 42 13 33 | 23 43 S1a | S1b
------------- => ------------- = ---------
13 23 | 33 43 12 32 | 22 42 S1c | S1d
14 24 | 34 44 14 34 | 24 44

after the first pass, all blocks contains the pixel in the right and final(!) order, but the blocks itselfs needs to be shifted. This is done in the second pass so that the final target region T1 will look like:

11 31 | 41 71
13 33 | 43 73 S1a | S2a
------------- = ---------
15 35 | 45 75 S3a | S4a
17 37 | 47 77


Share this post


Link to post
Share on other sites
Quote:
Original post by Viik
Source:

S1a | S1b | S2a | S2b
---------------------
S1c | S1d | S2c | S2d
---------------------
S3a | S3b | S4a | S4b
---------------------
S3c | S3d | S4c | S4d


Result after one-pass method:

S1a | S2a | S1b | S2b
---------------------
S3a | S4a | S3b | S4b
---------------------
S1c | S2c | S1d | S2d
---------------------
S3c | S3c | S3d | S4d

Yep, that is correct.

Quote:

As I understood in two pass method, first pass is division of buffer into separate parts (textures):

S1a | S1b ..... S2a | S2b
--------- ..... ---------
S1c | S1d ..... S2c | S2d
. .
. .
S3a | S3b ..... S4a | S4b
--------- ..... ---------
S3c | S3d ..... S4c | S4d

No, it is not a division in separate buffers ! You just need one buffer(one texture), in the first pass you already reorder the pixels into the correct order, BUT only on a much smaller region of the source buffer instead of the whole buffer.

Quote:

And then combination of final image, as this is a separate textures, each texel will be read one after another at the same time from all textures.

You dont need different texture to sample more than one texel at the same time. You can just sample it from one texture at the same time.


The problem is not to use one or more textures, the issue with hardware is, that when a shader samples a textures (reading its texels), not the whole texture will be cached, but only a small frame of the texture. So, when many shaders process the same small texture region, there will be many texture cache hits and this will be fast. But when many shaders try to access one or worse,many textures at positions which are quite far away, many texture cache hits will fail, which leads to cache trashing (you need to renew the cache many times). This is slow and will have a major impact on performace.

*Hope I get it right *

Share this post


Link to post
Share on other sites
Sign in to follow this  

  • Advertisement
×

Important Information

By using GameDev.net, you agree to our community Guidelines, Terms of Use, and Privacy Policy.

Participate in the game development conversation and more when you create an account on GameDev.net!

Sign me up!