Thinking of making a tilemap with DirectX? Read this.

Started by
16 comments, last by Bimble Bob 18 years ago
My rogue-like game is undergoing fundamental restructuring to allow all entities to be read in from XML files. I wouldn't call it a rewrite, really. Only the base classes have required refactoring, thanks to what I now feel is a pretty good OO design. Anyways, in the process of doing this, I had an excellent opprotunity to rework my MapCtrl, which is a component that attaches to a Map and renders it using Direct3D. I thought I would share my experience trying to get performant tilemap rendering out of DirectX in case anyone else was tempted to go this route now or in the future. DirectX is unweildy for making simple 2D games because DirectDraw has been deprecated - so to do anything graphical, you need to use Direct3D. Originally, I just wanted to get something working, so I was rendering my tilemap in a very straightforward way. I have a vertex buffer holding a quad that I would translate over the map, drawing one tile at a time. This gave me awful performance (1.5 FPS with a 100x100 fully in view). After much research, I learned that changing render states is very expensive in D3D. Specifically, set texture and DrawPrimitive calls. In order to get a fast tilemap in D3D, it is necessary to do the following: 1. Batch all your textures. To minimize texture changes, you need to put all your tiles onto one (or possibly several) larger textures. 2. Use dynamic vertex buffers to blast several thousand polies to the screen at once. For each tile you want to add, you append the geometry to the buffer and set the texture coordinates to corrospond to your batched tile. Using an index buffer is a good idea, you will use 33% less bandwidth piping geometry to the GPU. 3. When the dynamic buffer fills up, or you need to make a texture swap, write out the buffer with DrawIndexedPrimitives, and start to fill it up again. 4. Clearly you only do this for onscreen tiles. 5. If you have a ton of textures, you should sort your tiles to minimize texture swaps. This probably isn't an issue since even my Geforce 440 Go supports 2048x2048 textures and you can put a lot of tiles into a bitmap that large. I just got this working today, after fighting with DirectX for the past couple of days. I do get really good performance: 70 FPS rendering a 100x100 tilemap, fully in view (at arbitrary scale). But it took more effort than it should have. I needed this level of performance so that I can run animations at 30 FPS without needing dirty rectangle techniques and without driving my CPU to 100% (my computer sounds like a jet plane taking off when it gets going). But probably, if you will be much happier with OpenGL or SDL if you want to make a simple graphical tilemap. My 2c. --- PS - Thanks to Barakus and EDI for helping me fix the issues I was having with my tilemap.

Shedletsky's Bits: A Blog | ROBLOX | Twitter
Time held me green and dying
Though I sang in my chains like the sea...

Advertisement
This is what I do in my 2D sidescroller too. The artist creates 256x256 textures, and the map editor uses these to paint tiles on the map. When the game loads the map, it gets a list of all the textures it needs, and works out the best size to use (512x512, 1024x1024, 2048x2048 or 4096x4096) based on the number of textures required, the amount of space that would be wasted, and the maxiumum texture size the card supports.
Example: One map requires 7 different 256x256 textures. That means that the game can chose to use a 1024x1024 texture, and waste 9 256x256 areas, or it can chose to use two 512x512 textures, and only wast 1 256x256 texture area.
Also note that most ATi cards only support up to 2048x2048, apart from the latest ones (X1xxx cards).

Another thing you could try doing (I don't, but I intend to) is if your visible map area is 30x20 or something, you can fill the VB with 32x22 tiles, and then just use the world matrix to translate the map. That means you only need to fill the vertex buffer when a new tile comes into view, which should give you a bit of performance.

Using index buffers is pointless for 2D tile engines, since each vertex is used exactly once (Due to texture coordinates being different where vertices could be shared). So you're just as well to render as much as possible (often the whole screen) in one DrawPrimitive() call.
I just use DirectX's sprite class...
Quote:Original post by Evil Steve
...
Using index buffers is pointless for 2D tile engines, since each vertex is used exactly once (Due to texture coordinates being different where vertices could be shared). So you're just as well to render as much as possible (often the whole screen) in one DrawPrimitive() call.

Actually, for each quad, you can cut the number of vertices down from 6 to 4, since the two triangles that make up the quad do indeed share two vertices, texture coordinates and all. (Thus Telamon's 33% less bandwidth figure.) I'm guessing you probably knew this, but just briefly forgot about it. But for anyone else's information...
"We should have a great fewer disputes in the world if words were taken for what they are, the signs of our ideas only, and not for things themselves." - John Locke
Yea, D3dX sprite functionality make 2d games an absolute breeze. Direct Draw is crap. They did good to abandon it.

An old game I did a few years ago in school was incredibly easy with d3dx sprite.

Game
Quote:Original post by Telamon
DirectX is unweildy for making simple 2D games because DirectDraw has been deprecated - so to do anything graphical, you need to use Direct3D. Originally, I just wanted to get something working, so I was rendering my tilemap in a very straightforward way. I have a vertex buffer holding a quad that I would translate over the map, drawing one tile at a time. This gave me awful performance (1.5 FPS with a 100x100 fully in view).


I'm getting really tired of this "bring back DiriectDraw" crap. Direct3D isn't that hard, don't blame it because you couldn't research the proper way to do things. D3D can do ANYTHING DirectDraw can, some even FASTER than DirectDraw.

For my TileStudio loader and Snake.Net , I used the Managed Direct3D sprite class and the frame rate is fine even in debug mode.

I'm sorry if this post sounds mean, but I've gotten tired of having to defend D3D every few days. Don't bash an API you don't know how to use. I'm sure the OGL crowd wouldn't like it if I went in and said "OMG I can't draw 1000000000 quads at once, OGL SUCKS!!!!111!".

Edit: Thanks DrEvil.
Scet, trailing ? breaks your TileStudio link.
Telamon, there must be something else in your code that's straining DirectX: I can do 4 layers with tiles spanning two textures, using simple DrawPrimitiveUP (insert BOOs here), some alphablended and a background image without any framerate problems on a P3 800.

Do you cache your renderstate sets? Do you use several tiles on a single texture? Any reason you can't use pretransformed vertices?

Fruny: Ftagn! Ia! Ia! std::time_put_byname! Mglui naflftagn std::codecvt eY'ha-nthlei!,char,mbstate_t>

My 2-pence for what it's worth...

Quote:DirectX is unweildy for making simple 2D games because DirectDraw has been deprecated
I'm with Scet on this one. I will concede that the initial learning curve (or is it a hurdle?) is steeper, but the rewards are awesome. My experiences in the good old-days were that you could get something simple up and running in DD7 very quickly but you could quite quickly hit it's limits. Once you'd hit the limits you either had to stop or invest a lot of time developing custom code and/or work-arounds. In the long-run Direct3D doesn't have this problem in my experience.

Quote:I was rendering my tilemap in a very straightforward way. I have a vertex buffer holding a quad that I would translate over the map, drawing one tile at a time. This gave me awful performance (1.5 FPS with a 100x100 fully in view).
(emphasis mine) Does not surprise me in the slightest that you got crap performance this way [smile] 100x100 tiles would generate 10,000 Draw**() calls which is optimistically only 10x higher than any Direct3D 9 application should be generating [lol]

Quote:1. Batch all your textures. To minimize texture changes, you need to put all your tiles onto one (or possibly several) larger textures.
Agreed, this is a definite performance advantage.

Quote:2. Use dynamic vertex buffers to blast several thousand polies to the screen at once. For each tile you want to add, you append the geometry to the buffer and set the texture coordinates to corrospond to your batched tile. Using an index buffer is a good idea, you will use 33% less bandwidth piping geometry to the GPU.
Makes sense, but a far bigger gain in my experience is to not modify the geometry in the main loop. It's not easy to architect, but it's complication that gets you a cookie [smile]

Two optimizations I did in one of my tile-based engines was to store a CPU-side cache of the visible set (located in RAM = fast(er) CPU manipulation) and only ever updated the buffers when the camera/player moved. Because of the CPU-side cache I could throw a single chunk of vertex data (note: index data never changes) up between a lock (no processing between Lock()/Unlock() [wink]) and take full advantage of no-overwrite/discard/dynamic buffers.

Quote:I do get really good performance: 70 FPS rendering a 100x100 tilemap
That's not bad, but I suspect you're actually clamped to a 70hz refresh rate rather than it actually being the upper performance limit.

Now, for an interesting idea... I had a "pure 3D" tile-map engine a while back that could render enough 2x2 tiles to cover a 1024x768 screen (roughly 200,000 tiles) to still clock up around 25-30hz - which is just over 7x the triangle throughput of your example [smile]

Okay, I don't wish to be cocky - but just thought it was an interesting example of what should be possible given a bit of effort.

By sticking to regular 3D geometry, that is, not POSITIONT or D3DFVF_XYZRHW and an orthogonal view you can move pretty much every piece of graphics related work to the GPU. No more resource manipulation required, hardware accelerated vertex effects (e.g. lighting and animation). The only difficult part in implementing it this way is everyones favourite texels-pixels mapping. In my example I didn't really need that so I conveniently ignored it, but it should be solveable.

Quote:I can do 4 layers with tiles spanning two textures, using simple DrawPrimitiveUP (insert BOOs here)
* boo... hiss... *

Cheers,
Jack

<hr align="left" width="25%" />
Jack Hoxley <small>[</small><small> Forum FAQ | Revised FAQ | MVP Profile | Developer Journal ]</small>

Quote:Original post by Agony
Quote:Original post by Evil Steve
...
Using index buffers is pointless for 2D tile engines, since each vertex is used exactly once (Due to texture coordinates being different where vertices could be shared). So you're just as well to render as much as possible (often the whole screen) in one DrawPrimitive() call.

Actually, for each quad, you can cut the number of vertices down from 6 to 4, since the two triangles that make up the quad do indeed share two vertices, texture coordinates and all. (Thus Telamon's 33% less bandwidth figure.) I'm guessing you probably knew this, but just briefly forgot about it. But for anyone else's information...
Ah, I had a mental blank there [smile]

What I meant was, if you have a grid of tiles (E.g. 30x20), then you can't share the top right vertex of the first tile with the top left vertex of the second tile for instance.
You can still render them as triangle strips, but you'd end up having to make one DrawPrimitive() call per tile, or end up using a bunch of degenerate triangles (Actually, an index buffer might help there).

Has anyone done any profiling about this? It'll probably be pretty close, performance-wise anyway...

This topic is closed to new replies.

Advertisement