• Advertisement
Sign in to follow this  

DX11 [DX11] Tile map performance

This topic is 2347 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

Hi

I have been working on 2d tile map loading and rendering and im noticing very bad performance with my simple approach.

Basically i have a 2d array which specify the tiles in the map using a number to represent each type of tile (water, rock etc.) Each of these are a sprite of its own with a vertex buffer and a texture. All tiles are 32x32.

This is the render function of the map:


XMMATRIX VPMatrix = g_Game->GetViewProjectionMatrix();
ID3D11Buffer* MVPBufferHandle = g_Game->GetMVPBuffer();
ID3D11DeviceContext* D3DContext = g_Game->GetContext();
XMMATRIX world = GetWorldMatrix();

for(int i = 0; i < m_TileList.size(); i++)
{
for(int j = 0; j < m_TileList.size(); j++)
{
XMMATRIX offset = XMMatrixMultiply(world, XMMatrixTranslation(j * m_TileSize, i * m_TileSize, 0.0f));
XMMATRIX mvp = XMMatrixMultiply(offset, VPMatrix);
mvp = XMMatrixTranspose(mvp);

D3DContext->UpdateSubresource(MVPBufferHandle, 0, 0, &mvp, 0, 0);
D3DContext->VSSetConstantBuffers(0, 1, &MVPBufferHandle);
m_SpriteList[m_TileList[j]].Render();
}
}



Simply put i get the pointer for my constant buffer to send my final matrix to the shader, I multiply the location of each tile by the location of the map and the projectionview matrix and send it off for rendering. The render function of a sprite just binds the vertex buffer and the texture resource and renders a quad(using triangle list).

As an example i tried to render 100 tiles with only 2 types of sprites.
without map = 500 fps;
with map = ~250 fps;

This looks to me like very bad performance probably due to my approach but i have no idea how i can couple things together to save draw calls or texture binds sad.gif

Can anyone guide me on how i can increase the performance?

Thanks

Share this post


Link to post
Share on other sites
Advertisement
As an example i tried to render 100 tiles with only 2 types of sprites.
without map = 500 fps;
with map = ~250 fps;

This looks to me like very bad performance probably due to my approach but i have no idea how i can couple things together to save draw calls or texture binds sad.gif

Can anyone guide me on how i can increase the performance?
Yes. Add more work. Surprised?
Modern drivers are not just "hardware translators" of API commands. They go trough extensive buffering and mangling, based on statistical analysis of real-world workloads. If you don't give them "real world" workload, or give them some "really odd" pattern of commands, they won't gear up. Much less they'll drive the hardware properly.

Therefore, don't even start talking about performance unless you're below 100 (and even this is a real stretch). Framerates such as 500 fps... or 200 fps for that matter, are just ridiculous and hardly indicative of a "real world" performance problem.

But, if I would be in you, I'd just pre-transform all tiles in a big batch as I hardly believe they move at all with regard to each other, no tile is an island.

Share this post


Link to post
Share on other sites
For 2D tile layers, use a texture with NN sampling for deciding what part of a texture atlas to render. Then you can have thousands of tiles by rendering 2 triangles with an HLSL shader.

Share this post


Link to post
Share on other sites
When using a single texture atlas, you can also draw everything with a single draw call if you use hardware instancing.

Share this post


Link to post
Share on other sites
Thanks for the suggestions. I pre transform my sprites on load and any time the map is moved now which gave me a decent fps boost (~30 fps).
Unfortunately im not using sprite sheets and i have a separate image for each entity to keep it simple. Later i can create a system where my individual sprites would be compiled into a large image when i create a map editor.

Also what is NN sampling?

Share this post


Link to post
Share on other sites
There's a lot of things you can improve, though as Krohm already mentioned, I'd only worry about it if performance actually becomes an issue.
Here's a more detailed list of things you can do to (greatly) improve your performance:

  • Use a single quad (2 triangles) to render all your tiles. That's really all you need.
    Your quad should have the size of a single tile and be created at the origin of your world space.
    Whenever you draw a tile, use a vertexshader to move the quad to the appropriate location by providing a WVP matrix.

    This will reduce your total vertex count and eliminate the need to update your vertex buffer every frame.
    You can now also set the vertex buffer to default or immutable.
  • Use frustum culling.
    You only need to draw the tiles that are actually visible on screen.
  • Put all your tile images into one big texture (texture atlas).
    I understand that you might not want to manually do that yet, but you can easily have your program do it for you at startup.
    Just calculate the texture size needed hold all of your tiles and create your texture atlas using it.
    Then render every tile to your new texture atlas, and keep track of the UV location for every single tile.
    Now you can render all of your tiles using the very same texture.
    Instead of switching textures you switch UV coordinates.

    This greatly cuts down your state changes.
  • If you want even more performance, do everything in a single draw call using hardware instancing.
    You'll need to create a second (dynamic) vertexbuffer that holds the WVP matrix and UV data for every tile to be rendered.

    This will reduce your draw calls down to one.
    At this point you can easily render over 10k tiles without performance issues.

Share this post


Link to post
Share on other sites

There's a lot of things you can improve, though as Krohm already mentioned, I'd only worry about it if performance actually becomes an issue.
Here's a more detailed list of things you can do to (greatly) improve your performance:

  • Use a single quad (2 triangles) to render all your tiles. That's really all you need.
    Your quad should have the size of a single tile and be created at the origin of your world space.
    Whenever you draw a tile, use a vertexshader to move the quad to the appropriate location by providing a WVP matrix.

    This will reduce your total vertex count and eliminate the need to update your vertex buffer every frame.
    You can now also set the vertex buffer to default or immutable.
  • Use frustum culling.
    You only need to draw the tiles that are actually visible on screen.
  • Put all your tile images into one big texture (texture atlas).
    I understand that you might not want to manually do that yet, but you can easily have your program do it for you at startup.
    Just calculate the texture size needed hold all of your tiles and create your texture atlas using it.
    Then render every tile to your new texture atlas, and keep track of the UV location for every single tile.
    Now you can render all of your tiles using the very same texture.
    Instead of switching textures you switch UV coordinates.

    This greatly cuts down your state changes.
  • If you want even more performance, do everything in a single draw call using hardware instancing.
    You'll need to create a second (dynamic) vertexbuffer that holds the WVP matrix and UV data for every tile to be rendered.

    This will reduce your draw calls down to one.
    At this point you can easily render over 10k tiles without performance issues.


Thanks, im going to implement all that you have said soon but first i have to make a decision on what is a sprite and an entity so i can conveniently change texture and buffer states. I guess that's a design problem.

Anyway many thanks to everyone that helped

Share this post


Link to post
Share on other sites
NN sampling is when you sample a texture with "Nearest Neighbour" interpolation so that the textures looks like in Wolfenstein3D. It is faster than using bilinear interpolation between the 4 closest points.

Share this post


Link to post
Share on other sites
Sign in to follow this  

  • Advertisement
  • Advertisement
  • Popular Tags

  • Advertisement
  • Popular Now

  • Similar Content

    • By Stewie.G
      Hi,
       
      I've been trying to implement a basic gaussian blur using the gaussian formula, and here is what it looks like so far:
      float gaussian(float x, float sigma)
      {
          float pi = 3.14159;
          float sigma_square = sigma * sigma;
          float a = 1 / sqrt(2 * pi*sigma_square);
          float b = exp(-((x*x) / (2 * sigma_square)));
          return a * b;
      }
      My problem is that I don't quite know what sigma should be.
      It seems that if I provide a random value for sigma, weights in my kernel won't add up to 1.
      So I ended up calling my gaussian function with sigma == 1, which gives me weights adding up to 1, but also a very subtle blur.
      Here is what my kernel looks like with sigma == 1
              [0]    0.0033238872995488885    
              [1]    0.023804742479357766    
              [2]    0.09713820127276819    
              [3]    0.22585307043511713    
              [4]    0.29920669915475656    
              [5]    0.22585307043511713    
              [6]    0.09713820127276819    
              [7]    0.023804742479357766    
              [8]    0.0033238872995488885    
       
      I would have liked it to be more "rounded" at the top, or a better spread instead of wasting [0], [1], [2] with values bellow 0.1.
      Based on my experiments, the key to this is to provide a different sigma, but if I do, my kernel values no longer adds up to 1, which results to a darker blur.
      I've found this post 
      ... which helped me a bit, but I am really confused with this the part where he divide sigma by 3.
      Can someone please explain how sigma works? How is it related to my kernel size, how can I balance my weights with different sigmas, ect...
       
      Thanks :-)
    • By mc_wiggly_fingers
      Is it possible to asynchronously create a Texture2D using DirectX11?
      I have a native Unity plugin that downloads 8K textures from a server and displays them to the user for a VR application. This works well, but there's a large frame drop when calling CreateTexture2D. To remedy this, I've tried creating a separate thread that creates the texture, but the frame drop is still present.
      Is there anything else that I could do to prevent that frame drop from occuring?
    • By cambalinho
      i'm trying draw a circule using math:
      class coordenates { public: coordenates(float x=0, float y=0) { X = x; Y = y; } float X; float Y; }; coordenates RotationPoints(coordenates ActualPosition, double angle) { coordenates NewPosition; NewPosition.X = ActualPosition.X*sin(angle) - ActualPosition.Y*sin(angle); NewPosition.Y = ActualPosition.Y*cos(angle) + ActualPosition.X*cos(angle); return NewPosition; } but now i know that these have 1 problem, because i don't use the orign.
      even so i'm getting problems on how i can rotate the point.
      these coordinates works between -1 and 1 floating points.
      can anyone advice more for i create the circule?
    • By isu diss
      I managed convert opengl code on http://john-chapman-graphics.blogspot.co.uk/2013/02/pseudo-lens-flare.html to hlsl, but unfortunately I don't know how to add it to my atmospheric scattering code (Sky - first image). Can anyone help me?
      I tried to bind the sky texture as SRV and implement lens flare code in pixel shader, I don't know how to separate them (second image)


    • By jonwil
      I have some code (not written by me) that is creating a window to draw stuff into using these:
      CreateDXGIFactory1 to create an IDXGIFactory1
      dxgi_factory->CreateSwapChain to create an IDXGISwapChain
      D3D11CreateDevice to create an ID3D11Device and an ID3D11DeviceContext
      Other code (that I dont quite understand) that creates various IDXGIAdapter1 and IDXGIOutput instances
      Still other code (that I dont quite understand) that is creating some ID3D11RenderTargetView and ID3D11DepthStencilView instances and is doing something with those as well (possibly loading them into the graphics context somewhere although I cant quite see where)
      What I want to do is to create a second window and draw stuff to that as well as to the main window (all drawing would happen on the one thread with all the drawing to the sub-window happening in one block and outside of any rendering being done to the main window). Do I need to create a second IDXGISwapChain for my new window? Do I need to create a second ID3D11Device or different IDXGIAdapter1 and IDXGIOutput interfaces? How do I tell Direct3D which window I want to render to? Are there particular d3d11 functions I should be looking for that are involved in this?
      I am good with Direct3D9 but this is the first time I am working with Direct3D11 (and the guy who wrote the code has left our team so I cant ask him for help
       
  • Advertisement