[DX11] Tile map performance

Started by
6 comments, last by Dawoodoz 12 years, 7 months ago
Hi

I have been working on 2d tile map loading and rendering and im noticing very bad performance with my simple approach.

Basically i have a 2d array which specify the tiles in the map using a number to represent each type of tile (water, rock etc.) Each of these are a sprite of its own with a vertex buffer and a texture. All tiles are 32x32.

This is the render function of the map:


XMMATRIX VPMatrix = g_Game->GetViewProjectionMatrix();
ID3D11Buffer* MVPBufferHandle = g_Game->GetMVPBuffer();
ID3D11DeviceContext* D3DContext = g_Game->GetContext();
XMMATRIX world = GetWorldMatrix();

for(int i = 0; i < m_TileList.size(); i++)
{
for(int j = 0; j < m_TileList.size(); j++)
{
XMMATRIX offset = XMMatrixMultiply(world, XMMatrixTranslation(j * m_TileSize, i * m_TileSize, 0.0f));
XMMATRIX mvp = XMMatrixMultiply(offset, VPMatrix);
mvp = XMMatrixTranspose(mvp);

D3DContext->UpdateSubresource(MVPBufferHandle, 0, 0, &mvp, 0, 0);
D3DContext->VSSetConstantBuffers(0, 1, &MVPBufferHandle);
m_SpriteList[m_TileList[j]].Render();
}
}



Simply put i get the pointer for my constant buffer to send my final matrix to the shader, I multiply the location of each tile by the location of the map and the projectionview matrix and send it off for rendering. The render function of a sprite just binds the vertex buffer and the texture resource and renders a quad(using triangle list).

As an example i tried to render 100 tiles with only 2 types of sprites.
without map = 500 fps;
with map = ~250 fps;

This looks to me like very bad performance probably due to my approach but i have no idea how i can couple things together to save draw calls or texture binds sad.gif

Can anyone guide me on how i can increase the performance?

Thanks
Advertisement
As an example i tried to render 100 tiles with only 2 types of sprites.
without map = 500 fps;
with map = ~250 fps;

This looks to me like very bad performance probably due to my approach but i have no idea how i can couple things together to save draw calls or texture binds sad.gif

Can anyone guide me on how i can increase the performance?
Yes. Add more work. Surprised?
Modern drivers are not just "hardware translators" of API commands. They go trough extensive buffering and mangling, based on statistical analysis of real-world workloads. If you don't give them "real world" workload, or give them some "really odd" pattern of commands, they won't gear up. Much less they'll drive the hardware properly.

Therefore, don't even start talking about performance unless you're below 100 (and even this is a real stretch). Framerates such as 500 fps... or 200 fps for that matter, are just ridiculous and hardly indicative of a "real world" performance problem.

But, if I would be in you, I'd just pre-transform all tiles in a big batch as I hardly believe they move at all with regard to each other, no tile is an island.

Previously "Krohm"

For 2D tile layers, use a texture with NN sampling for deciding what part of a texture atlas to render. Then you can have thousands of tiles by rendering 2 triangles with an HLSL shader.
When using a single texture atlas, you can also draw everything with a single draw call if you use hardware instancing.
Thanks for the suggestions. I pre transform my sprites on load and any time the map is moved now which gave me a decent fps boost (~30 fps).
Unfortunately im not using sprite sheets and i have a separate image for each entity to keep it simple. Later i can create a system where my individual sprites would be compiled into a large image when i create a map editor.

Also what is NN sampling?
There's a lot of things you can improve, though as Krohm already mentioned, I'd only worry about it if performance actually becomes an issue.
Here's a more detailed list of things you can do to (greatly) improve your performance:

  • Use a single quad (2 triangles) to render all your tiles. That's really all you need.
    Your quad should have the size of a single tile and be created at the origin of your world space.
    Whenever you draw a tile, use a vertexshader to move the quad to the appropriate location by providing a WVP matrix.

    This will reduce your total vertex count and eliminate the need to update your vertex buffer every frame.
    You can now also set the vertex buffer to default or immutable.
  • Use frustum culling.
    You only need to draw the tiles that are actually visible on screen.
  • Put all your tile images into one big texture (texture atlas).
    I understand that you might not want to manually do that yet, but you can easily have your program do it for you at startup.
    Just calculate the texture size needed hold all of your tiles and create your texture atlas using it.
    Then render every tile to your new texture atlas, and keep track of the UV location for every single tile.
    Now you can render all of your tiles using the very same texture.
    Instead of switching textures you switch UV coordinates.

    This greatly cuts down your state changes.
  • If you want even more performance, do everything in a single draw call using hardware instancing.
    You'll need to create a second (dynamic) vertexbuffer that holds the WVP matrix and UV data for every tile to be rendered.

    This will reduce your draw calls down to one.
    At this point you can easily render over 10k tiles without performance issues.

There's a lot of things you can improve, though as Krohm already mentioned, I'd only worry about it if performance actually becomes an issue.
Here's a more detailed list of things you can do to (greatly) improve your performance:

  • Use a single quad (2 triangles) to render all your tiles. That's really all you need.
    Your quad should have the size of a single tile and be created at the origin of your world space.
    Whenever you draw a tile, use a vertexshader to move the quad to the appropriate location by providing a WVP matrix.

    This will reduce your total vertex count and eliminate the need to update your vertex buffer every frame.
    You can now also set the vertex buffer to default or immutable.
  • Use frustum culling.
    You only need to draw the tiles that are actually visible on screen.
  • Put all your tile images into one big texture (texture atlas).
    I understand that you might not want to manually do that yet, but you can easily have your program do it for you at startup.
    Just calculate the texture size needed hold all of your tiles and create your texture atlas using it.
    Then render every tile to your new texture atlas, and keep track of the UV location for every single tile.
    Now you can render all of your tiles using the very same texture.
    Instead of switching textures you switch UV coordinates.

    This greatly cuts down your state changes.
  • If you want even more performance, do everything in a single draw call using hardware instancing.
    You'll need to create a second (dynamic) vertexbuffer that holds the WVP matrix and UV data for every tile to be rendered.

    This will reduce your draw calls down to one.
    At this point you can easily render over 10k tiles without performance issues.



Thanks, im going to implement all that you have said soon but first i have to make a decision on what is a sprite and an entity so i can conveniently change texture and buffer states. I guess that's a design problem.

Anyway many thanks to everyone that helped
NN sampling is when you sample a texture with "Nearest Neighbour" interpolation so that the textures looks like in Wolfenstein3D. It is faster than using bilinear interpolation between the 4 closest points.

This topic is closed to new replies.

Advertisement