Most efficient way to draw 1000+ textures w/ shader

Started by
11 comments, last by AshleysBrain 16 years ago
VS2008 / Unmanaged C++ / D3D 9 / 2D drawing with quads I'm trying to optimise a particle effect system where hundreds of quads are drawn with a 'Screen' pixel shader, as below:
texture ForegroundTexture;

texture BackgroundTexture;

// Foreground sampler
sampler2D foreground = sampler_state {
    Texture = (ForegroundTexture);
    MinFilter = Linear;
    MagFilter = Linear;
    MipFilter = Linear;
};

// Background sampler
sampler2D background = sampler_state {
    Texture = (BackgroundTexture);
    MinFilter = Point;
    MagFilter = Point;
    MipFilter = Point;
};

float2 bgStart;
float2 bgEnd;

// Effect function
float4 EffectProcess( float2 Tex : TEXCOORD0 ) : COLOR0
{
    // Screen formula
    float4 front = tex2D(foreground, Tex.xy);
    float4 back = tex2D(background, lerp(bgStart.xy, bgEnd.xy, Tex.xy));
    front.rgb = 1.0 - ((1.0 - front.rgb) * (1.0 - back.rgb * front.a));
    return front;
}

technique MyTechnique
{
    pass p0
    {
        VertexShader = null;
        PixelShader = compile ps_2_0 EffectProcess();
    }
}


I have a render target texture everything is drawing to (BackgroundTexture) and a 32x32 texture for a particle (ForegroundTexture), and I want to draw lots and lots of them with this shader. My current system is set up like so: 1. There is a temporary texture the size of the window 2. Clear a 32x32 space on the temporary texture where the particle will go on the screen 3. Copy the particle texture to this space 4. Begin the effect 5. Transfer the 32x32 area on the temporary texture to the display (ie. render target texture that will be copied to backbuffer) 6. End the effect Is there not a faster way to achieve this? I have tried drawing directly with the effect, i.e: 1. Begin the effect 2. Draw the 32x32 to the display 3. End the effect but this renders with flickering artefacts and Direct3D complains of that "Can not render to a render target that is also used as a texture" error... but I've checked and verified that the render target is indeed the display texture and the current texture is the particle texture. Am I right in assuming you cannot render shaders to the same texture they sample from, even if they have 1:1 sampling with the background pixels? If this isn't supported, is there a more efficient way of using an intermediate texture? Also, what effect does the set render target and texture have on running a shader? I've passed the textures for the shader to use as parameters, why should changing the set texture or render target make a difference?
Construct (Free open-source game creator)
Advertisement
Yes, you can't render to a surface that you're simultaneously sampling from. This is considered undefined behavior in D3D9, and if you want to avoid problems you'll probably want to stay away from it. Make sure that when you're done sampling from a texture, set that texture to NULL so that it's not still bound to a sampler when you use it as a render-target later.

As for drawing thousands of particles, you're going to want to batch as many as you can and keep your pixel shader cheap. You may want to check out this excellent sample by Humus too see what techniques are available to you in D3D9.
Thanks, I thought that might be the case. But there must be a way to do this efficiently even with an intermediate texture: the performance penalty of clearing the intermediate texture once for each particle has got to hurt. Is geometry instancing the way to go? How would that solve the need for an intermediate texture? I'm not too hot with my matrices and I haven't used it before...

And every particle has to blend with the already drawn particles, so the intermediate texture has to be transferred to display for every particle too... it sounds like a problem somebody must have solved before.
Construct (Free open-source game creator)
Quote:Original post by AshleysBrain
Thanks, I thought that might be the case. But there must be a way to do this efficiently even with an intermediate texture: the performance penalty of clearing the intermediate texture once for each particle has got to hurt. Is geometry instancing the way to go? How would that solve the need for an intermediate texture? I'm not too hot with my matrices and I haven't used it before...

And every particle has to blend with the already drawn particles, so the intermediate texture has to be transferred to display for every particle too... it sounds like a problem somebody must have solved before.


Instancing has nothing to do with an intermediate texture, in fact I'm not quite sure why you're using one. The only reason you'd want to render to an intermediate surface would be if you made the surface smaller than the screen, in order to save fillrate and pixel-processing for all of your particles. If that's not a concern, you should be rendering directly to the back-buffer or primary render-target.

Sorry, the link you gave me mentioned geometry instancing and I was wondering if that would apply to my situation. The reason I use an intermediate texture is because the render target is sampled from in the shader, so I have to render to a different target first, then copy the result to the actual render target afterwards. Is there a better way of circumventing this limitation in D3D?
Construct (Free open-source game creator)
Okay I think I see where you went wrong here...sorry, I wasn't quite understanding what you were trying to do. Anyway, there's no need to for you to sample both the background texture and your particle texture and blend them manually in the pixel shader. Just set your device render states to use alpha blending, and then you can render your particle texture to the background texture. No clearing or intermediate surfaces necessary. You can enable alpha-blending by adding this to your technique:

technique MyTechnique{    pass p0    {        AlphaBlendEnable = TRUE;        SrcBlend = SRCALPHA;        DestBlend = INVSRCALPHA;        VertexShader = null;        PixelShader = compile ps_2_0 EffectProcess();    }}



As for instancing...definitely use it. 1000's of DrawPrimitive calls will make you CPU-bound very quickly, so it's important to batch them.
Why not use a dynamic vertex buffer and point sprites instead of instancing? Am I missing something here?
Quote:Original post by hikikomori-san
Why not use a dynamic vertex buffer and point sprites instead of instancing? Am I missing something here?


Point sprites are extremely limited. As for dynamic vertex buffer vs. instancing, go ahead and run the demo I linked to earlier and see which performs better.

What is clearly being missed is that standard alpha blending isn't the effect wanted. What's wanted is 'screen' blending (screen is the name of a layer blend mode in Photoshop).

That means: dest = 1 - ((1-src) * (1-dest))

One option is to set the blend modes to:

pDevice->SetRenderState(D3DRS_BLENDOP, D3DBLENDOP_ADD);
pDevice->SetRenderState(D3DRS_ALPHABLENDENABLE, TRUE);
pDevice->SetRenderState(D3DRS_SRCBLEND, D3DBLEND_INVDESTCOLOR);
pDevice->SetRenderState(D3DRS_DESTBLEND, D3DBLEND_ZERO);

You then need to draw your sprite (make the pixel shader output the inverse of the source colour).

After that you set blend modes to:

pDevice->SetRenderState(D3DRS_BLENDOP, D3DBLENDOP_SUBTRACT);
pDevice->SetRenderState(D3DRS_SRCBLEND, D3DBLEND_ONE);
pDevice->SetRenderState(D3DRS_DESTBLEND, D3DBLEND_ONE);

Then draw a white square to re-invert the destination buffer.

If I'm right that should get you what you need.

It may also help to batch up sprites to avoid repeatedly switching render states. You can obviously draw all non-overlapping ones together.

Edit: Fixed second lot of blend modes...

[Edited by - Adam_42 on April 23, 2008 5:57:53 AM]
Thanks Adam_42, that's definitely a very interesting way of doing it. However, I was hoping for a general solution, since I'm using a variety of shaders. So I guess the question is, assuming you are using a shader which samples from the rendertarget, what's the most efficient way of drawing?
Construct (Free open-source game creator)

This topic is closed to new replies.

Advertisement