Performance issues with static buffers

This topic is 4616 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

Recommended Posts

I seem to be having trouble rendering my static with optimal performance. I am creating my static buffers with write only flags and locking them only once to store my geometry passed in during program initialization. Here is the code that I use to transfer geometry to a static buffer.
HRESULT CFNGED3D::CreateBuffer(DWORD dwFormat, int nStride, int nVertices, void* pVertices, int nIndices, void* pIndices, int nMaterialID, int* pBufferID)
{
// First check to see if the passed in material id is valid.
if (nMaterialID < 0 || nMaterialID > m_materials.size())
{
return E_INVALIDARG;
}

FNGE_D3D_BUFFER tempBuffer;

tempBuffer.m_dwFormat    = dwFormat;
tempBuffer.m_nStride     = nStride;
tempBuffer.m_nVertices   = nVertices;
tempBuffer.m_nIndices    = nIndices;
tempBuffer.m_nMaterialID = nMaterialID;

void*			pData;

// Now create the vertex buffer based on the stride and number of vertices specified.
if (FAILED(m_pD3DDevice->CreateVertexBuffer(tempBuffer.m_nVertices * tempBuffer.m_nStride,
D3DUSAGE_WRITEONLY,
tempBuffer.m_dwFormat,
D3DPOOL_DEFAULT,
&tempBuffer.m_pVB,
NULL)))
{
}

if (FAILED(tempBuffer.m_pVB->Lock(0, 0, (void**)&pData, 0)))
{
}

memcpy(pData, pVertices, tempBuffer.m_nVertices * tempBuffer.m_nStride);

tempBuffer.m_pVB->Unlock();

if (tempBuffer.m_nIndices > 0)
{
if (FAILED(m_pD3DDevice->CreateIndexBuffer(tempBuffer.m_nIndices * sizeof(WORD),
D3DUSAGE_WRITEONLY,
D3DFMT_INDEX16,

D3DPOOL_DEFAULT,

&tempBuffer.m_pIB,

NULL)))
{
}
if (FAILED(tempBuffer.m_pIB->Lock(0, 0, (void**)&pData, 0)))
{
}
memcpy(pData, pIndices, tempBuffer.m_nIndices * sizeof(WORD));

tempBuffer.m_pIB->Unlock();
}

(*pBufferID) = m_staticBuffers.size();

m_staticBuffers.push_back(tempBuffer);

m_Log << "A new static buffer has been created and added to the engine.";
return S_OK;
}


Here is the code I use to render them
void CFNGED3D::Render(int nBufferID)
{
if (nBufferID < 0 || nBufferID > m_staticBuffers.size())
return;

// Determine whether or not this computer supports vertex shaders.
{
m_pD3DDevice->SetFVF(NULL);
}
else
{
m_pD3DDevice->SetFVF(m_staticBuffers[nBufferID].m_dwFormat);
}

//D3DMATERIAL9 mat;

//mat.Diffuse.a  = m_materials[m_staticBuffers[nBufferID].m_nMaterialID].m_Diffuse.A;
//mat.Diffuse.r  = m_materials[m_staticBuffers[nBufferID].m_nMaterialID].m_Diffuse.R;
//mat.Diffuse.g  = m_materials[m_staticBuffers[nBufferID].m_nMaterialID].m_Diffuse.G;
//mat.Diffuse.b  = m_materials[m_staticBuffers[nBufferID].m_nMaterialID].m_Diffuse.B;
//mat.Ambient.a  = m_materials[m_staticBuffers[nBufferID].m_nMaterialID].m_Ambient.A;
//mat.Ambient.r  = m_materials[m_staticBuffers[nBufferID].m_nMaterialID].m_Ambient.R;
//mat.Ambient.g  = m_materials[m_staticBuffers[nBufferID].m_nMaterialID].m_Ambient.G;
//mat.Ambient.b  = m_materials[m_staticBuffers[nBufferID].m_nMaterialID].m_Ambient.B;
//mat.Specular.a = m_materials[m_staticBuffers[nBufferID].m_nMaterialID].m_Specular.A;
//mat.Specular.r = m_materials[m_staticBuffers[nBufferID].m_nMaterialID].m_Specular.R;
//mat.Specular.g = m_materials[m_staticBuffers[nBufferID].m_nMaterialID].m_Specular.G;
//mat.Specular.b = m_materials[m_staticBuffers[nBufferID].m_nMaterialID].m_Specular.B;
//mat.Emissive.a = m_materials[m_staticBuffers[nBufferID].m_nMaterialID].m_Emissive.A;
//mat.Emissive.r = m_materials[m_staticBuffers[nBufferID].m_nMaterialID].m_Emissive.R;
//mat.Emissive.g = m_materials[m_staticBuffers[nBufferID].m_nMaterialID].m_Emissive.G;
//mat.Emissive.b = m_materials[m_staticBuffers[nBufferID].m_nMaterialID].m_Emissive.B;
//mat.Power      = m_materials[m_staticBuffers[nBufferID].m_nMaterialID].m_Power;

//m_pD3DDevice->SetMaterial(&mat);
//
//// Assign each texture assigned to this material to each respective stage.
//for (int i = 0; i < 5; i++)
//{
//	if (m_materials[nBufferID].m_textureID > -1)
//	{
//		LPDIRECT3DTEXTURE9 pTexture = (LPDIRECT3DTEXTURE9)m_textures[m_materials[m_staticBuffers[nBufferID].m_nMaterialID].m_textureID].m_pData;

//		m_pD3DDevice->SetTexture(i, pTexture);
//	}
//}
m_pD3DDevice->SetStreamSource(0, m_staticBuffers[nBufferID].m_pVB, 0, m_staticBuffers[nBufferID].m_nStride);

if (m_staticBuffers[nBufferID].m_nIndices > 0)
{
m_pD3DDevice->SetIndices(m_staticBuffers[nBufferID].m_pIB);
m_pD3DDevice->DrawIndexedPrimitive(D3DPT_TRIANGLELIST, 0, 0, m_staticBuffers[nBufferID].m_nVertices, 0, m_staticBuffers[nBufferID].m_nVertices / 3);
}
else
{
m_pD3DDevice->DrawPrimitive(D3DPT_TRIANGLELIST, 0, m_staticBuffers[nBufferID].m_nVertices / 3);
}
}


With the code I am using right now for managing static buffers I can only rendering 12k vertices with a framerate of 20fps. I know that frames per second is a bad scale for judging program performance because it scales non-linearly. However I have seen many programs which can render 100x the amount of vertices I am rendering and still get a better frame rate. Does anybody know why I might be seeing this massive performance drop.

Share on other sites
If you change D3DPOOL_DEFAULT to D3DPOOL_MANAGED and remove the D3DUSAGE_WRITEONLY flag, do you get an increase in performance?

Also, are you sure you're not calling CreateBuffer every frame?

Also, 12k triangles isn't too little. Unless you have a high-end card, it's possible you'd need other techniques to increase performance. What card is this on?

Share on other sites
No I do not see a performance increase when I change the pool to managed and get rid of the write only flags. As a matter of fact I see a performance decrease when I do this.

Also I am running this on a fairly modern video card. Its a ATI Radeon 9600XT 128mb DDR.

Share on other sites
How many times do you draw it per second?
(And how much space does it take on screen?)

Share on other sites
Also, what happens if you only draw one polygon? (just the first one from the buffer)

Share on other sites
I am drawing it only once per frame. So if I am getting ~20 frames per second than I am drawing 20 times per second. When I only draw 1 triangle I get roughly 3000-5000fps

Share on other sites
To my previous question, how large is this object that you draw, and what resolution are you running, it is possible that you are limited by your fillrate.
Screenshot perhaps?

Share on other sites
The object I am drawing is not that big. It fits perfectly in confines of the screen. Also the resolution I am running at is 640x480x16.

Share on other sites
Sounds kinda strange I must say, I really don't know about this, but could it be that the card dislikes of drawing 12k tris in one batch? Sounds a bit strange, but I don't know, I've heard some talks about such... could be bullshit too.

Otherwise it sounds really strange unless you are doing some costy overhead such as lightning or so.

Share on other sites
Yes, are you using an unusual number of texture stages, lights, or any costly effect that might be slowing you down?

If you make the model smaller (scale it down), do you gain much fps?

Share on other sites
no I am not using any textures, lights, or materials. However the way I am storing the 12k vertices is rather odd. I just copied 12 vertices from a pyramid 1000 times in my vertex array. Perhaps I am lagging due to overdraw? Is there a better way of rendering massive amounts of vertices without making an object loader?

Share on other sites
I'm going to make some assumptions about what you're doing in this post...

If that's a 12 vertex pyramid that fills the viewport, drawn 1000 times over, then you're likely hitting a fillrate limit.

640*480 = 300K pixels
300K * 1000 times overdraw = 300M pixels
300M * 20 FPS = 6000M Pixels.

The peak fill rate of a 9600XT is 2100M pixels according to one site on the web, which runs us into a bit of a problem, since you've described getting nearly 3 times that.

Lets assume that limit a memory bus limit, and using 16 bit allows us to double that to 4200M pixels. Let us further assume you're not quite filling the viewport, and that you rounded to 20FPS since it was a nice round number, bringing your 6000M pixels down somewhat, hopefully near 4200M.

The problem that comes up with 99% of the posts about only being able to draw x polys... they're all lots of fullscreen quads with alpha blending and hitting fill rate limitations. If this isn't what you're doing, and there is some other explaination for the low frame rate, I'd be quite interested to hear about it. Post more details and we'll see if we can't get to the bottom of it.

Share on other sites
I have discovered my problem and it was relating to copying the pyramid 1000 times. I was probably getting intense overdraw. I can now rendering 120k vertices with 1000 fps.