Sign in to follow this  
AshleysBrain

Most efficient way to draw 1000+ textures w/ shader

Recommended Posts

AshleysBrain    162
VS2008 / Unmanaged C++ / D3D 9 / 2D drawing with quads I'm trying to optimise a particle effect system where hundreds of quads are drawn with a 'Screen' pixel shader, as below:
texture ForegroundTexture;

texture BackgroundTexture;

// Foreground sampler
sampler2D foreground = sampler_state {
    Texture = (ForegroundTexture);
    MinFilter = Linear;
    MagFilter = Linear;
    MipFilter = Linear;
};

// Background sampler
sampler2D background = sampler_state {
    Texture = (BackgroundTexture);
    MinFilter = Point;
    MagFilter = Point;
    MipFilter = Point;
};

float2 bgStart;
float2 bgEnd;

// Effect function
float4 EffectProcess( float2 Tex : TEXCOORD0 ) : COLOR0
{
    // Screen formula
    float4 front = tex2D(foreground, Tex.xy);
    float4 back = tex2D(background, lerp(bgStart.xy, bgEnd.xy, Tex.xy));
    front.rgb = 1.0 - ((1.0 - front.rgb) * (1.0 - back.rgb * front.a));
    return front;
}

technique MyTechnique
{
    pass p0
    {
        VertexShader = null;
        PixelShader = compile ps_2_0 EffectProcess();
    }
}


I have a render target texture everything is drawing to (BackgroundTexture) and a 32x32 texture for a particle (ForegroundTexture), and I want to draw lots and lots of them with this shader. My current system is set up like so: 1. There is a temporary texture the size of the window 2. Clear a 32x32 space on the temporary texture where the particle will go on the screen 3. Copy the particle texture to this space 4. Begin the effect 5. Transfer the 32x32 area on the temporary texture to the display (ie. render target texture that will be copied to backbuffer) 6. End the effect Is there not a faster way to achieve this? I have tried drawing directly with the effect, i.e: 1. Begin the effect 2. Draw the 32x32 to the display 3. End the effect but this renders with flickering artefacts and Direct3D complains of that "Can not render to a render target that is also used as a texture" error... but I've checked and verified that the render target is indeed the display texture and the current texture is the particle texture. Am I right in assuming you cannot render shaders to the same texture they sample from, even if they have 1:1 sampling with the background pixels? If this isn't supported, is there a more efficient way of using an intermediate texture? Also, what effect does the set render target and texture have on running a shader? I've passed the textures for the shader to use as parameters, why should changing the set texture or render target make a difference?

Share this post


Link to post
Share on other sites
MJP    19791
Yes, you can't render to a surface that you're simultaneously sampling from. This is considered undefined behavior in D3D9, and if you want to avoid problems you'll probably want to stay away from it. Make sure that when you're done sampling from a texture, set that texture to NULL so that it's not still bound to a sampler when you use it as a render-target later.

As for drawing thousands of particles, you're going to want to batch as many as you can and keep your pixel shader cheap. You may want to check out this excellent sample by Humus too see what techniques are available to you in D3D9.

Share this post


Link to post
Share on other sites
AshleysBrain    162
Thanks, I thought that might be the case. But there must be a way to do this efficiently even with an intermediate texture: the performance penalty of clearing the intermediate texture once for each particle has got to hurt. Is geometry instancing the way to go? How would that solve the need for an intermediate texture? I'm not too hot with my matrices and I haven't used it before...

And every particle has to blend with the already drawn particles, so the intermediate texture has to be transferred to display for every particle too... it sounds like a problem somebody must have solved before.

Share this post


Link to post
Share on other sites
MJP    19791
Quote:
Original post by AshleysBrain
Thanks, I thought that might be the case. But there must be a way to do this efficiently even with an intermediate texture: the performance penalty of clearing the intermediate texture once for each particle has got to hurt. Is geometry instancing the way to go? How would that solve the need for an intermediate texture? I'm not too hot with my matrices and I haven't used it before...

And every particle has to blend with the already drawn particles, so the intermediate texture has to be transferred to display for every particle too... it sounds like a problem somebody must have solved before.


Instancing has nothing to do with an intermediate texture, in fact I'm not quite sure why you're using one. The only reason you'd want to render to an intermediate surface would be if you made the surface smaller than the screen, in order to save fillrate and pixel-processing for all of your particles. If that's not a concern, you should be rendering directly to the back-buffer or primary render-target.

Share this post


Link to post
Share on other sites
AshleysBrain    162
Sorry, the link you gave me mentioned geometry instancing and I was wondering if that would apply to my situation. The reason I use an intermediate texture is because the render target is sampled from in the shader, so I have to render to a different target first, then copy the result to the actual render target afterwards. Is there a better way of circumventing this limitation in D3D?

Share this post


Link to post
Share on other sites
MJP    19791
Okay I think I see where you went wrong here...sorry, I wasn't quite understanding what you were trying to do. Anyway, there's no need to for you to sample both the background texture and your particle texture and blend them manually in the pixel shader. Just set your device render states to use alpha blending, and then you can render your particle texture to the background texture. No clearing or intermediate surfaces necessary. You can enable alpha-blending by adding this to your technique:


technique MyTechnique
{
pass p0
{
AlphaBlendEnable = TRUE;
SrcBlend = SRCALPHA;
DestBlend = INVSRCALPHA;

VertexShader = null;
PixelShader = compile ps_2_0 EffectProcess();
}
}




As for instancing...definitely use it. 1000's of DrawPrimitive calls will make you CPU-bound very quickly, so it's important to batch them.

Share this post


Link to post
Share on other sites
MJP    19791
Quote:
Original post by hikikomori-san
Why not use a dynamic vertex buffer and point sprites instead of instancing? Am I missing something here?


Point sprites are extremely limited. As for dynamic vertex buffer vs. instancing, go ahead and run the demo I linked to earlier and see which performs better.

Share this post


Link to post
Share on other sites
Adam_42    3629
What is clearly being missed is that standard alpha blending isn't the effect wanted. What's wanted is 'screen' blending (screen is the name of a layer blend mode in Photoshop).

That means: dest = 1 - ((1-src) * (1-dest))

One option is to set the blend modes to:

pDevice->SetRenderState(D3DRS_BLENDOP, D3DBLENDOP_ADD);
pDevice->SetRenderState(D3DRS_ALPHABLENDENABLE, TRUE);
pDevice->SetRenderState(D3DRS_SRCBLEND, D3DBLEND_INVDESTCOLOR);
pDevice->SetRenderState(D3DRS_DESTBLEND, D3DBLEND_ZERO);

You then need to draw your sprite (make the pixel shader output the inverse of the source colour).

After that you set blend modes to:

pDevice->SetRenderState(D3DRS_BLENDOP, D3DBLENDOP_SUBTRACT);
pDevice->SetRenderState(D3DRS_SRCBLEND, D3DBLEND_ONE);
pDevice->SetRenderState(D3DRS_DESTBLEND, D3DBLEND_ONE);

Then draw a white square to re-invert the destination buffer.

If I'm right that should get you what you need.

It may also help to batch up sprites to avoid repeatedly switching render states. You can obviously draw all non-overlapping ones together.

Edit: Fixed second lot of blend modes...

[Edited by - Adam_42 on April 23, 2008 5:57:53 AM]

Share this post


Link to post
Share on other sites
AshleysBrain    162
Thanks Adam_42, that's definitely a very interesting way of doing it. However, I was hoping for a general solution, since I'm using a variety of shaders. So I guess the question is, assuming you are using a shader which samples from the rendertarget, what's the most efficient way of drawing?

Share this post


Link to post
Share on other sites
MJP    19791
Quote:
Original post by AshleysBrain
Thanks Adam_42, that's definitely a very interesting way of doing it. However, I was hoping for a general solution, since I'm using a variety of shaders. So I guess the question is, assuming you are using a shader which samples from the rendertarget, what's the most efficient way of drawing?


If you really need to sample to the render target, you'd have to use an intermediate render target like you've already been doing. Of course when you're doing this you should render everything to the intermediate target first, then copy that to the back-buffer or wherever its going.

However like I said before, you don't need to manually perform alpha blending in the pixel shader. The only time you should be doing such a thing is if you need to do some sort of blending that you can't achieve with render-states, or if the device doesn't support blending for the render-target's surface format.

Share this post


Link to post
Share on other sites
A slight change to adam's idea can batch all the sprites.

temptexture = inverse backbuffer. We now have 1-dest as a texture.
Use temptexture as a rendertarget

Draw sprites w/ srcblend = zero, destblend = INVSRCBLEND. Because dest is already inverted, this is the same as (1-src) * (1-dest).

We ignore the initial "1 -" part of the equation, so our result will be inverted... which is what we already have in our buffer, and still want for our result.

After all particles are drawn, invert the buffer again.

Share this post


Link to post
Share on other sites
AshleysBrain    162
Quote:
Original post by MJP
Of course when you're doing this you should render everything to the intermediate target first, then copy that to the back-buffer or wherever its going.


This is a good idea, except most of the particles overlap (they are 32x32 textures, after all), and if they are ALL drawn to the intermediate texture while sampling from a different texture, they all sample from a texture that has no particles drawn on it at all. That'd look odd; ideally, each particle should be blending on top of all the already drawn particles.

Using an intermediate buffer, I can't see any way to achieve this without sampling from the render target, except for the original system outlined in my original post. The problem with that is a clear call for each and every particle - and copying a small area from a large texture, which I assume is a less efficient memory access pattern.

Thoughts?

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this