[XNA] 1000 cubes, 4 Textures = 55fps... should I expect more?

Started by
16 comments, last by Zoner 14 years, 11 months ago
To summarize and throw in some new issues:

- each draw calls is overhead in the driver
-> minimize the number of draw calls each frame to a few hundreds
- do not dynamically allocate memory each frame, create it at initialization time and reuse it
-> it looks like you are creating a new VB for each iteration of the outer loop
- if blocks are not visible do not issue a draw call for this block
-> do a simple visibility check before drawing
- switching VB's involves overhead
-> put all vertex and index data into a single VB and IB and use the DrawPrimitive parameters instead
------------------------------------I always enjoy being rated up by you ...
Advertisement
Just to add to the previous list, instancing has a very thin line in where you are optimizing your code. To much vertices with to little instances will be costly, but to many instances with to little vertices will be costly as well (or something like that)

I once read hardware instancing (with two vertex buffers) is 20% more costly per draw, so it must be worth it...
Waterwalker - you pretty much summed it all up. Great!
I think I will persue the chunked vb approach. It shouldn't be too hard with the setup I have. And I can visualise the output.
If I understand it correctly, if I create a fixed size vb on init, then use the SetData to fill and do DrawPrimitive, I should remove the overhead of creating dynamic vbs for each texture loop. I can see the benefit of this.

The one large static vb with index buffer method I can see further benefit to, but not sure about having all the vertex data in GPU memory for access by the index buffer... or is this not how it works? From my reading, it seems that the vb is sent to the GPU and then the DrawIndexPrimitive calls lower the required bandwidth by only sending the index values and not full vertex information. Does this mean that the GPU needs all of the vertex information on board, regardless of my visibility culling? Or is there something smarter going on that I don't get?
It depends. When using static buffers all of the data resides on the GPU as long as there is enough VRAM available. If there is too many data the driver moves the buffers that were not used during the last calls to the AGP memory.

With dynamic buffers it also depends on the driver where it places the data but usually this should be AGP memory which requires the driver to upload the data to the graphics card for each rendering call if the data is not already in the VRAM from the last call.

With data not on the VRAM at the time of the draw call the driver should only copy the vertices from the vertex buffer that are used by the drawing calls (or rather all vertices from the first to the last vertex being used). Similarly the driver would only copy the indices from the index buffer that are used as defined by the primitive count and the start index.

Also, the graphics card would only transform those vertices that are referenced by indices when calling DrawIndexedPrimitive for performance's sake.

So using one big static vertex buffer is always a good idea to prevent copying the vertex data for each rendering call. I omit mentioning that there is of course a limit to the size you should use. Because if your buffer gets too big then it becomes slow if the driver has to move the buffer from VRAM to AGP or back. But in your case you won't run into such dimensions.
------------------------------------I always enjoy being rated up by you ...
I finally had a chance to refactor my code to use one vb and an index buffer.
And it has made a BIG difference.
Same scene, was 55fps, now 300+fps.
I have one query, though, about the creation of the index buffer for the DrawIndexedPrimitive call.
During my Draw code, I get the indices of the Blocks that I can see for each texture and have found that I need to do a
IndexBuffer ib = new IndexBuffer(GraphicsDevice, typeof(int), _iIBMaxSize, BufferUsage.WriteOnly);

for each texture loop (ie fetch of index values).
private void DrawBlocks(Matrix currentViewMatrix)        {            _beBlock.Begin();            foreach (DictionaryEntry de in _htTextures)            {                string sTexName = de.Key.ToString();                // Get all the triangles with this texture                List<int> lIdxs = new List<int>();                List<Block> al = GetVisibleBlocks();                foreach (Block b in al) {                    lIdxs.AddRange(b.Get_Texture_Triangles(sTexName));                }                if (lIdxs.Count > 0)                {                                        _beBlock.World = World;                    _beBlock.View = View;                    _beBlock.Projection = Projection;                    _beBlock.Texture = (Texture2D)de.Value;                    _beBlock.TextureEnabled = true;                    _beBlock.DiffuseColor = new Vector3(1.0f, 1.0f, 1.0f);                    _beBlock.AmbientLightColor = new Vector3(0.75f, 0.75f, 0.75f);                    _beBlock.DirectionalLight0.Enabled = true;                    _beBlock.DirectionalLight0.DiffuseColor = Vector3.One;                    _beBlock.DirectionalLight0.Direction = Vector3.Normalize(new Vector3(1.0f, -1.0f, 1.0f));                    _beBlock.DirectionalLight0.SpecularColor = Vector3.One;                    _beBlock.LightingEnabled = true;                    //_dvbBlock = new DynamicVertexBuffer(GraphicsDevice, lVerts.Count * VertexPositionNormalTexture.SizeInBytes, BufferUsage.WriteOnly);                    //_dvbBlock.SetData(lVerts.ToArray(),0,lVerts.Count);                                        //_vbBlock = new VertexBuffer(GraphicsDevice, lVerts.Count * VertexPositionNormalTexture.SizeInBytes, BufferUsage.WriteOnly);                    //_vbBlock.SetData(lVerts.ToArray(), 0, lVerts.Count);                    GraphicsDevice.Vertices[0].SetSource(_vbAllBlocks, 0, VertexPositionNormalTexture.SizeInBytes);                    //_ibAllBlocks.SetData(lIdxs.ToArray(), 0, lIdxs.Count);                    IndexBuffer ib = new IndexBuffer(GraphicsDevice, typeof(int), _iIBMaxSize, BufferUsage.WriteOnly);                    ib.SetData(lIdxs.ToArray(), 0, lIdxs.Count);                    GraphicsDevice.Indices = ib; // _ibAllBlocks;                    foreach (EffectPass ep in _beBlock.CurrentTechnique.Passes)                    {                        ep.Begin();                        //GraphicsDevice.Vertices[0].SetSource(_vbBlock, 0, VertexPositionNormalTexture.SizeInBytes);                        //GraphicsDevice.Vertices[0].SetSource(_dvbBlock, 0, VertexPositionNormalTexture.SizeInBytes);                        GraphicsDevice.VertexDeclaration = _vdBlock;                        GraphicsDevice.DrawIndexedPrimitives(PrimitiveType.TriangleList, 0, 0, _iVBActualSize, 0, lIdxs.Count / 3);                        //GraphicsDevice.DrawPrimitives(PrimitiveType.TriangleList, 0, lVerts.Count/3);                        //GraphicsDevice.DrawPrimitives(PrimitiveType.TriangleList, 0, 2);                        ep.End();                    }                }                            }            _beBlock.End();                                }

When I tried having just a member variable that I could use over and over, I got an :
You may not modify a resource that has been set on a device, or after it has been used within a tiling bracket
error.
Is what I am doing ok? Or should I be using a single member variable DynamicIndexBuffer?
Thanks again for the help so far. I have learned a lot.
Quote:Original post by Crow-knee
I finally had a chance to refactor my code to use one vb and an index buffer.
And it has made a BIG difference.
Same scene, was 55fps, now 300+fps.
I have one query, though, about the creation of the index buffer for the DrawIndexedPrimitive call.
During my Draw code, I get the indices of the Blocks that I can see for each texture and have found that I need to do a
<source lang="c#">
IndexBuffer ib = new IndexBuffer(GraphicsDevice, typeof(int), _iIBMaxSize, BufferUsage.WriteOnly);
</source>
for each texture loop (ie fetch of index values).
When I tried having just a member variable that I could use over and over, I got an :
You may not modify a resource that has been set on a device, or after it has been used within a tiling bracket
error.
Is what I am doing ok? Or should I be using a single member variable DynamicIndexBuffer?
Thanks again for the help so far. I have learned a lot.
You should never be creating a resource in your render loop, at best it'll be slow and at worst it'll fragment VRAM and cause your app to fail with an out of video memory error / exception.
If you can get away with making the index buffer static (I.e. only very infrequently lock it - less often than once per second), then do so. That's what the ID3DXSprite class does, and I assume the SpriteBatch class does too. The order of the vertices required to draw a cube will never change, so this can be precalculated.
Otherwise, create a large dynamic index buffer, and update it's contents when you need to (You don't have to use the whole buffer every time).
Ahh... yes. Well, I kinda thought that creating an index buffer for every texture for every frame was a bad idea - but I am not quite sure of what I should do then.
As I see it, using the BasicEffect, I need the DrawIndexedPrimitive to use an (Dynamic)IndexBuffer that contains only the vertex indices of the triangles that use that texture AND are visible. How do I feed the IndexBuffer for each texture when the second time through the loop I get the error I mentioned above?
You may not modify a resource that has been set on a device, or after it has been used within a tiling bracket
You mention locking - I am not at my home PC (with XNA) but is that a method of the IndexBuffer? Is this something I should do in the Update method, locking the IndexBuffer and feeding in the texture indices, taking note of the offsets and lengths of each with the one Indexbuffer to use in the draw call?
If so, I could see that, with a more relaxed FOV culling method, the IndexBuffer updates could be done only a couple of times a second...
Quote:You mention locking - I am not at my home PC (with XNA) but is that a method of the IndexBuffer? Is this something I should do in the Update method, locking the IndexBuffer and feeding in the texture indices, taking note of the offsets and lengths of each with the one Indexbuffer to use in the draw call?
If so, I could see that, with a more relaxed FOV culling method, the IndexBuffer updates could be done only a couple of times a second...


Crash course in the GPU Command Buffer:

Every D3D state management and drawing API call made creates a data packet that goes into a buffer. This is the GPU command buffer. The GPU reads from this buffer to know what to do next. Certain commands write fences into the buffer (up to D3D, we dont get to do this sadly), as well as when threshold of number of bytes written into the buffer has been reached. The fence id is just a serial number. When resources (textures, index buffers, vertex buffers, rendertargets, and shaders) are referenced, they are tagged with the current fence serial number, and are considered in-use until the CPU can determine that the GPU has read and processed the commands up to that fence id.

Now for the major part:

When you lock a resource that has an outstanding fence, it stalls until the GPU has processed up to that fence. If this resource is currently set on the hardware, it will stall up until the GPU is completely idle.

Now from the code you have it still looks like you are constructing some buffers in the render loop, so they are somewhat immune to this, but you would see it if you tried to recycle them. This is generally fixed by allocating dynamic index buffers and using D3DLOCK_DISCARD. Half the time the drivers are really just allocating more memory and queuing the original memory to be freed, so the cost of allocating a brand new index or vertex buffer or not isn't that bad, despite the dire warnings in the documentation or from other forum users :) The real advantage is in quantity of API calls to do it, Release, Create, and Lock is going to cost more than just a Lock with a different parameter or two (D3DLOCK_DISCARD and D3DLOCK_NOOVERWRITE), as on the PC the context switches to the OS layer are pretty expensive.
http://www.gearboxsoftware.com/

This topic is closed to new replies.

Advertisement