# D3D9, batching sprites gives no speed increase

This topic is 3384 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

## Recommended Posts

It was my understanding that batching sprites into large groups would be faster than drawing them one by one, however having implemented a batching system on my sprite class, I cant see any gain at all in the time taken to render large groups of sprites (such as those in a particle system). non-batching
//d3d init
...
device->CreateVertexBuffer(sizeof(Vertex)*4, 0, Vertex::FVF, D3DPOOL_MANAGED, &quadVB, NULL);
Vertex *vertices;
vertices[0] = Vertex(0.0f,0.0f, 0.0f,0.0f);
vertices[1] = Vertex(1.0f,0.0f, 1.0f,0.0f);
vertices[2] = Vertex(1.0f,1.0f, 1.0f,1.0f);
vertices[3] = Vertex(0.0f,1.0f, 0.0f,1.0f);
...
//render one sprite
void D3D9Sprite::DoDraw(D3DXMATRIXA16 &transform)
{
d3d->GetDevice()->SetTransform (D3DTS_WORLD, &transform);
d3d->GetDevice()->SetTexture   (0, tex);
d3d->GetDevice()->DrawPrimitive(D3DPT_TRIANGLEFAN, 0,2);
}


batched method:
//d3d init
//batchSize is the max number of sprites in a single batch, 100
device->CreateVertexBuffer(sizeof(VertexRHW)*4*batchSize, D3DUSAGE_DYNAMIC | D3DUSAGE_WRITEONLY, VertexRHW::FVF, D3DPOOL_DEFAULT, &batchVB, NULL);
batchCount=0;

device->CreateIndexBuffer(batchSize*6*2, D3DUSAGE_WRITEONLY, D3DFMT_INDEX16, D3DPOOL_MANAGED, &batchIB, NULL);
short *indices;
batchIB->Lock(0,0, (void**)&indices, 0);
for(unsigned i=0; i<batchSize; ++i)
{
indices[i*6 +0] = i*4 + 0;
indices[i*6 +1] = i*4 + 1;
indices[i*6 +2] = i*4 + 2;

indices[i*6 +3] = i*4 + 0;
indices[i*6 +4] = i*4 + 2;
indices[i*6 +5] = i*4 + 3;
}
batchIB->Unlock();
...

void D3D9::FlushBatch()
{
if(batchVBLocked)
{
batchVBLocked=false;
batchVB->Unlock();
}
if(batchCount)
{
device->DrawIndexedPrimitive(D3DPT_TRIANGLELIST, 0, 0, batchCount*4, 0, batchCount*2);
batchCount=0;
}
}
VertexRHW *D3D9::BatchGetNewVerts()
{
if(batchCount == batchSize)
{
FlushBatch();
}
if (!batchVBLocked)
{
batchVBLocked=true;
}
return batchVerts+4*batchCount++;
}
void D3D9Sprite::DoDraw(D3DXMATRIXA16 &transform)
{
if(tex != d3d->currentTex)
{
d3d->FlushBatch();
d3d->currentTex = tex;
d3d->GetDevice()->SetTexture(0,tex);
}
D3DXVECTOR4
v1(0.0f, 0.0f,0.0f,1.0f),
v2(1.0f, 0.0f,0.0f,1.0f),
v3(1.0f, 1.0f,0.0f,1.0f),
v4(0.0f, 1.0f,0.0f,1.0f);

D3DXVec4Transform(&v1,&v1, &transform);
D3DXVec4Transform(&v2,&v2, &transform);
D3DXVec4Transform(&v3,&v3, &transform);
D3DXVec4Transform(&v4,&v4, &transform);

VertexRHW *verts = d3d->BatchGetNewVerts();
verts[0] = VertexRHW(v1.x,v1.y, 0.0f,0.0f);
verts[1] = VertexRHW(v2.x,v2.y, 1.0f,0.0f);
verts[2] = VertexRHW(v3.x,v3.y, 1.0f,1.0f);
verts[3] = VertexRHW(v4.x,v4.y, 0.0f,1.0f);
}


Ive been testing this on window Vista, with an Intel Core 2 Duo @ 2Ghz and a 256MB Dx10.1 Graphics card if that makes any diffrence...

##### Share on other sites
Batching only helps reduce the CPU overhead of submitting a draw call. While any batching system (that actually does batch calls) will reduce your CPU use, you won't see an effective performance increase unless that was your reducing factor. That is, if before you implemented batching the CPU was already waiting for the GPU each frame, after implementing batching, it will simply wait for it for longer.

If you're interested in getting improved performance, you've got to profile, identify the bottleneck, fix the bottleneck, rince, and repeat. Any other attempt to improve performance is likely to result in inconsistent improvements and sometimes even in worse performance.

Of course, if performance is very slow, there might be some other issue causing significant performance issues.

[Edited by - sirob on November 15, 2008 7:32:25 AM]

##### Share on other sites
Quote:
 Original post by sirobBatching only helps reduce the CPU overhead of submitting a draw call. While any batching system (that actually does batch calls) will reduce your CPU use, you won't see an effective performance increase unless that was your reducing factor. That is, if before you implemented batching the CPU was already waiting for the GPU each frame, after implementing batching, it will simply wait for it for longer.If you're interested in getting improved performance, you've got to profile, identify the bottleneck, fix the bottleneck, rince, and repeat. Any other attempt to improve performance is likely to result in inconsistent improvements and sometimes even in worse performance.Of course, is performance is very slow, there might be some other issue causing significant performance issues.

Completely true. In other terms, he's basically saying that batching would only benefit when your CPU was the bottleneck.

However, if your game starts becoming more CPU heavy because of physics/game logic/AI, you may start seeing the performance difference, which you can't notice right now.

You have a very fast CPU, try on a much slower CPU with the same GPU, and you may also see the difference

Cheers
Dark Sylinc

##### Share on other sites
One thing to check is CPU percent. Sometimes rendering in small batches can even be slightly faster than large ones (at least in my experience), but it does take more CPU. If your program isn't CPU bound you won't notice this unless you look at the task manager or another place which measures CPU usage.

I assume you're using an ATI card, and I remember that ATI had a tool similar to NVIDIA's PerfKit, but I can't find it now. You can use PIX to attempt to find bottlenecks in your app.

##### Share on other sites
Quote:
 Original post by ET3DOne thing to check is CPU percent. Sometimes rendering in small batches can even be slightly faster than large ones (at least in my experience), but it does take more CPU.

That's impossible It can happen.
There's a simple explanation to that. Batches are all-or-nothing: All of them are displayed, or none of them.
So if you're rendering 80 objects, and only one of them is visible, you send 80 objects to the GPU.
Without batching, you would make more drawCalls (hence, more CPU) but those 79 will most likely be culled and won't be send to the GPU. That will make the FPS grow despite the higher CPU usage.

That isn't true for most GUIs, because we can control what it can be seen. Things like "the camera rotated and now it isn't visible anymore" doesn't happen because GUI sprites are always visible.
And if we want to hide something, we just modify the batch.

Cheers
Dark Sylinc