Optimising my renderer

Started by
41 comments, last by Hodgman 10 years ago

Is there a specific reason you place the buffer in D3DPOOL_SYSTEMMEM instead of D3DPOOL_DEFAULT? Default resources can be locked when dynamic, I always thought that was the preferred pool to store dynamic resources to. You know, since you have to write to the buffer from CPU-side, but the GPU also has to render from it afterwards.

You are correct that it should be D3DPOOL_DEFAULT.
Of note to the original poster is that in any case you should always use the actual enumerated value, not 0 (and especially not NULL; it’s not a pointer and that is extremely misleading), even though D3DPOOL_DEFAULT is 0.


Also L. Spiro, did you profile double/triplebuffering the vertex buffers, versus justing using nooverwrite-discard after rendering as lock flag, and for submitting the sprites locking with nooverwrite? I've never heard of anyone recommending to do this, and always just read locking like I described, which should have a similar effect as manual doublebuffering. I don't have any data from comparing the both, so thats why I'm asking, would be interesting to hear if there really is an additional performance gain by this biggrin.png

We use double-buffering at work, and they did (not I) profile it on Xbox 360.
These days it may be very similar in performance, but it is more likely to be similar to orphaning in OpenGL (which is slower than double-buffering, and I have tested that, as others have) because they are basically the same process, and if the driver tries to secretly double-buffer behind the scenes then the magic that makes that happen is implicitly more cycles than manually double-buffering.


L. Spiro

I restore Nintendo 64 video-game OST’s into HD! https://www.youtube.com/channel/UCCtX_wedtZ5BoyQBXEhnVZw/playlists?view=1&sort=lad&flow=grid

Advertisement

It's worth noting however that you can draw from a vertex buffer in D3DPOOL_SYSTEMMEM.

Dynamic buffers are more suitable for cases where you're using the discard/no-overwrite pattern, i.e you're continually appending to the buffer and never overwriting a region that's been previously written to since the last discard. A system meory buffer might be more useful if you're jumping around in the buffer and overwriting smaller regions of it with a more random access.

Direct3D has need of instancing, but we do not. We have plenty of glVertexAttrib calls.

It's more the case that if you have depth and stencil, and if you're clearing depth, then you should clear stencil at the same time. This is because we typically see depth and stencil interleaved in a D24S8 format, so they're not separate buffers: they're a single buffer that contains the data for both, and clearing both at the same time allows the hardware to do a fast clear (which may be as fast as just awapping out a pointer or setting a flag).

Yea, that's probably it. And it makes perfect sense. I somehow confused it in my memory.

for(texture in textures)

for(sprite in sprites)

if(sprite.texture == texture)

list.append(sprite)

if(list.size > 0)

list.draw()

thats the basics. implementation is up to you

Thanks guys. I have been away a few days and should be able to have another go at this today smile.png
I have had another crack at this.

Even though what I have got here is mainly in the 'create' phase of my app (I'll fix this up later). Is this more or less a working version of instancing? As, I do have two quads displaying now from the one draw call now


void *pVertexBuffer=NULL;
LPDIRECT3DVERTEXBUFFER9 pVertexObject=NULL;
LPDIRECT3DINDEXBUFFER9 pIndexBuffer=NULL; // the pointer to the index buffer

struct D3DVERTEX{float x,y,z,rhw;DWORD color;float u;float v;};

D3DVERTEX vertices[8]={ 0,256,0,1.0f,0xffffff,0.0,1.0,
0,0,0,1.0f,0xffffff,0.0,0.0,
256,256,0,1.0f,0xffffff,1.0,1.0,
256,0,0,1.0f,0xffffff,1.0,0.0,

512,256,0,1.0f,0xffffff,0.0,1.0,
512,0,0,1.0f,0xffffff,0.0,0.0,
768,256,0,1.0f,0xffffff,1.0,1.0,
768,0,0,1.0f,0xffffff,1.0,0.0};

// 2nd param was D3DUSAGE_WRITEONLY
//8 = 2x quads
if(FAILED(mRenderer->getDevice()->CreateVertexBuffer(8*sizeof(D3DVERTEX),NULL,D3DFVF_XYZRHW|D3DFVF_DIFFUSE|D3DFVF_TEX1,D3DPOOL_MANAGED,&pVertexObject,NULL)))
return(0);

if(FAILED(pVertexObject->Lock(0,8*sizeof(D3DVERTEX),&pVertexBuffer,0)))
return(0);

memcpy(pVertexBuffer,vertices,8*sizeof(D3DVERTEX));
pVertexObject->Unlock();

mRenderer->getDevice()->SetStreamSource(0,pVertexObject,0,sizeof(D3DVERTEX));
mRenderer->getDevice()->SetFVF(D3DFVF_XYZRHW|D3DFVF_DIFFUSE|D3DFVF_TEX1);

mRenderer->getDevice()->SetTexture(0,mRenderer->pTexture);

// Do the indicies
void *pVoid;

short indices[]={ 0,1,2,
2,1,3,

4,5,6,
6,5,7,};

mRenderer->getDevice()->CreateIndexBuffer(12*sizeof(short),0,D3DFMT_INDEX16,D3DPOOL_MANAGED,&pIndexBuffer,NULL);
pIndexBuffer->Lock(0,0,(void**)&pVoid,0);
memcpy(pVoid, indices, sizeof(indices));
pIndexBuffer->Unlock();

mRenderer->getDevice()->SetIndices(pIndexBuffer);
And the render phase


mRenderer->getDevice()->DrawIndexedPrimitive(D3DPT_TRIANGLELIST, 0, 0, 8, 0, 4);
I plan on cleaning this up to make it more 'dynamic', but is this the basics of how it is done?
So, now I have managed to draw 100 quads in one draw call

Complete render loop

mRenderer->getDevice()->DrawIndexedPrimitive(D3DPT_TRIANGLELIST,0,0,sprites*4,0,sprites*2);
But, I am finding that the results are identical to before at ~300 FPS.

With an identical scene in GM:S I am still getting ~1100 FPS.

I am still somewhat bewildered. Drawing all of the sprites in one call didn't seem to make a difference what so ever (compared to having a loop and drawing each quad individually).

I am surprised that instancing made no difference at all.

When you are writing the sprites to the vertex buffer, are you locking & unlocking it once, or for every sprite seperately? I found that locking multiple times can have a very bad impact on performance. Also:


if(FAILED(pVertexObject->Lock(0,8*sizeof(D3DVERTEX),&pVertexBuffer,0)))

You shouldn't lock with "0" as flag (last parameter). Since it does not appear you are double-buffering like suggested from L.Spiro, you should lock once with "D3DLOCK_DISCARD | D3DLOCK_NOOVERWRITE" after rendering, and then with "D3DLOCK_NOOVERWRITE", optimally only once for all sprites.

Only have the single line of code (as above) in the entire render loop (just seeing how much I can throw at the renderer).

The locking is done before the render loop and never gets called again. So, it is literally a one liner in the render cycle, nothing more than that. So, no state changes or anything (I'll do all that later on once I am happy with the renderer).

I'll try the double-buffering and see what happens.

I'll try the double-buffering and see what happens.

Double-buffering doesn’t do anything unless you are updating the buffers every frame.
Most of the advice you have gotten has been under the assumption that you are.

If you aren’t, as I said, go back to static buffers and draw all the sprites in a single call. Your bottleneck would be only the fact that you make 1,000 render calls instead of 1.


L. Spiro

I restore Nintendo 64 video-game OST’s into HD! https://www.youtube.com/channel/UCCtX_wedtZ5BoyQBXEhnVZw/playlists?view=1&sort=lad&flow=grid

This topic is closed to new replies.

Advertisement