[XNA] 1000 cubes, 4 Textures = 55fps... should I expect more?

Started by
16 comments, last by Zoner 14 years, 12 months ago
Hi All, I am just starting an XNA trial to see what it's all about. I wanted to build a 2D game with 3D graphics - like Mario but in a world of cubes, not blocks. Anyway, I have done up a simple test framework to check some performance and ideas. I just create a camera with a .x file animated character and instanced 1000 cubes. I get about 55fps. Is this to be expected, or does it point to me doing something wrong. The cubes have a 128x128 texture applied to each face, with my test having a total of 4 different textures over the 1000 cubes. The cubes just create the VertexPositionNormalTexture array from a given position and size, and tracks which faces have what texture. I then use this to do the rendering :

private void DrawBlocks(Matrix currentViewMatrix)
        {
            _beBlock.Begin();
            foreach (DictionaryEntry de in _htTextures)
            {
                string sTexName = de.Key.ToString();

                // Get all the triangles with this texture
                List<VertexPositionNormalTexture> lVerts = new List<VertexPositionNormalTexture>();
                foreach (Block b in _alBlocks) {
                    lVerts.AddRange(b.Get_Texture_Triangles(sTexName));
                }

                if (lVerts.Count > 0) {
                    
                    _beBlock.World = World;
                    _beBlock.View = View;
                    _beBlock.Projection = Projection;
                    _beBlock.Texture = (Texture2D)de.Value;
                    _beBlock.TextureEnabled = true;
                    _beBlock.DiffuseColor = new Vector3(1.0f, 1.0f, 1.0f);
                    _beBlock.AmbientLightColor = new Vector3(0.75f, 0.75f, 0.75f);
                    _beBlock.DirectionalLight0.Enabled = true;
                    _beBlock.DirectionalLight0.DiffuseColor = Vector3.One;
                    _beBlock.DirectionalLight0.Direction = Vector3.Normalize(new Vector3(1.0f, -1.0f, 1.0f));
                    _beBlock.DirectionalLight0.SpecularColor = Vector3.One;
                    _beBlock.LightingEnabled = true;


                    _dvbBlock = new DynamicVertexBuffer(GraphicsDevice, lVerts.Count * VertexPositionNormalTexture.SizeInBytes, BufferUsage.WriteOnly);
                    _dvbBlock.SetData(lVerts.ToArray(),0,lVerts.Count);
                    //_vbBlock = new VertexBuffer(GraphicsDevice, lVerts.Count * VertexPositionNormalTexture.SizeInBytes, BufferUsage.WriteOnly);

                    foreach (EffectPass ep in _beBlock.CurrentTechnique.Passes)
                    {
                        ep.Begin();

                        //GraphicsDevice.Vertices[0].SetSource(_vbBlock, 0, VertexPositionNormalTexture.SizeInBytes);
                        GraphicsDevice.Vertices[0].SetSource(_dvbBlock, 0, VertexPositionNormalTexture.SizeInBytes);
                        GraphicsDevice.VertexDeclaration = _vdBlock;
                        GraphicsDevice.DrawPrimitives(PrimitiveType.TriangleList, 0, lVerts.Count/3);

                        //GraphicsDevice.DrawPrimitives(PrimitiveType.TriangleList, 0, 2);
                        ep.End();
                    }
                }
                
            }
            _beBlock.End();
       }


Now, am I on the right track with this rendering idea? I am getting all the block faces that use the given texture, create and stuff a DynamicVertexBuffer with the data and draw it for each effect Pass. Is there a better way? I have never used shaders before and am learning about them at the same time as XNA and its components. Thanks for any comments. Steele.
Advertisement
It looks like you are computing vertices at runtime. This is quite expensive by itself, as is using a dynamic vertex buffer. Using static buffers should run quite well, until the buffer starts to contain too much stuff that is offscreen (i.e. transformed but not visible).

Also, drawing indexed primitives will be quite a bit more efficient due to the caches involved. You could also compute index buffers instead of vertex buffers with a bit of work too probably, if it has to remain dynamic.
http://www.gearboxsoftware.com/
The vertices aren't being computed, just referenced at runtime.
The reason I was going with dynamic vertex buffers was that for the Mario-esque type game, the screen is only about 1/30 of the whole level. So I was trying to add in the ability to cull down the vertices to the current view (not yet implemented).

Are you saying that instead of building the vb from seperate Block objects, I should put all Block vertices into one big vb and build the index buffers to draw instead? Making my Blocks object responsible for vertex and index information and each block just containing information about textures/face and bounds information?

I guess I might be getting ahead of myself here and should just test out how it works for my end purpose (probably no more than 100 blocks on screen at a time), but I just expected more from my 8800 GT...
In addition to what Zoner said, are you visibility culling blocks at all? If all 1000 blocks are not on screen you should not draw them all. There are plenty of easy ways to do visibility culling on blocks in a Mario style side scroller.
With that many DrawPrimitive calls you're likely CPU-bound. Calls to the graphics API will often result in a lot of CPU overhead, and this applies particularly to DP calls. For a 2D platformer you'll probably never have to draw anywhere close to that many objects to screen at once so you won't have to worry about it, but for some reason you do you'll need to use some form of batching. Static geometry is typically batched by putting it all into a large vertex bugger, while dynamic geometry has to be batch using instancing techniques.
Besides all the above how are you computing your framerate? Is the SynchronizeWithVerticalRetrace member of the graphics device set to true?

Former Microsoft XNA and Xbox MVP | Check out my blog for random ramblings on game development

Thanks guys. This is very helpful.
Zoner - I will try the index primitive way. My only confusion is with handling the static vertexbuffer (or should that be vertexbuffers?). How do I know how big the vertexbuffer needs to be before I know the size of the geometry in view? Or do I just declare one of a large size and chunk as needed? If I'm using indexes, I imagine that once the vertexbuffer is built, it needs to remain the same regardless of size or scope. I take it the contents of this will remain the same and the whole thing passed to GPU and just the indexes get updated depending on visibility culling.
MJP - The number of drawprimitive calls is what I was attempting to reduce in my draw code by batching all of the Block faces that use a particular Texture into one drawprimitive call. Again, I am not too cluey on Effects (BasicEffect) so what I am doing may not be what I thought, so please let me know! And what is called a large vertexbuffer now-a-days? In my test example I have 36 * 1000 vertices of type VertexPositionNormalTexture. Does this pose a problem?
Machaira - Both SynchronizeWithVerticalRetrace = false and IsFixedTimeStep = false.
1000 DrawPrimitive() calls per frame will be a heavy toll on any graphics card. As the others have said, the solution is to use vertex batching - collecting a larger number of vertices and issuing just a handful of DrawPrimitive() calls. It's exactly what the SpriteBatch class in XNA does for 2D graphics, only that you seem to need an equivalent for 3D graphics.

One way to do this would be to create a vertex buffer manager that allows each cube to allocate vertices in a larger (or multiple larger) vertex buffers. Certainly a bit complicated if the lifetime of your cubes varies.

Another approach would be to simply write vertices into an array instead of calling DrawPrimitive() and when a certain threshold is reached, write them into a dynamic vertex buffer, make the DrawPrimitive() call and continue filling up the array anew. On the PC, make sure you call SetData() with SetDataOptions.Discard to avoid stalling the graphics pipeline.

For one of my XNA projects, I've written a PrimitiveBatch class that does this, analoguous to the SpriteBatch class, only that it works for vertices of any type. If you're interested, you can check out there sources here.

Getting accurate numbers on the optimal vertex batch size is hard and your best bet is to choose a target graphics card that you consider standard amongst your users and tune the vertex batch size to perform optimally on that card. The idea is to use small batches to regularly feed the graphics card (which is rendering asynchronously while the CPU already puts together the next batch), but not so small that the call overhead is bigger than the gain. It probably doesn't say much, but batches of 24K vertices seemed to perform optimally for me on a GeForce 8800 GTS 512.
Professional C++ and .NET developer trying to break into indie game development.
Follow my progress: http://blog.nuclex-games.com/ or Twitter - Topics: Ogre3D, Blender, game architecture tips & code snippets.
This is not XNA related, but I have a question dealing with dynamic buffers in general. Is it almost as expensive to update vertices for a batched collection of meshes as it is to do many draw calls if the meshes were not batched into one buffer? I am using shader instancing so batching is limited to X amount of instances at a time, according to the shader model used (To render Y instances I'll have to use Y/X amount of draw calls). But I would have to lock the buffer every frame to update the location of every instanced mesh.
Electronic Meteor - My experiences with XNA and game development
Quote:Original post by JustChris
This is not XNA related, but I have a question dealing with dynamic buffers in general. Is it almost as expensive to update vertices for a batched collection of meshes as it is to do many draw calls if the meshes were not batched into one buffer? I am using shader instancing so batching is limited to X amount of instances at a time, according to the shader model used (To render Y instances I'll have to use Y/X amount of draw calls). But I would have to lock the buffer every frame to update the location of every instanced mesh.


That depends on the size of your batch, but generally i think updating a vertex buffer with 1000 positions is not as expensive as using 1000 draw calls.

Keep in mind that writing to shader parameters is not free either.

This topic is closed to new replies.

Advertisement