# Tile map optimization?

## Recommended Posts

So I have this code setup to test rendering a tile map. This code works...but it puts a huge dent into the FPS. With one layer it goes from ~1000 FPS to 400 FPS, and with two layers it goes down to ~200 FPS. I'm planning on having more than two layers, along with players and other objects and effects. Is my OpenGL code bad? Is there a way to optimize the for loops? The only way I can think of to really fix the problem is simply render the different layers onto textures, then just render the one big texture, I can't foresee any problems with that...then again it's 1AM.
		for(int y = 0; y <= 14; y++) {
for(int x = 0; x <= 18; x++) {
_graphics->drawSubImage(tiles, x * 32, y * 32, 0.0,  makeQuad(32, 0, 32, 32));
}

}

void Graphics::drawSubImage(Image *image, int x, int y, float angle, Quad srcRect, Color color) {
// Start from 0, 0

// Move and rotate
glTranslatef(x, y, 0);

if(angle != 0.0) {
glRotatef(angle, 0, 0, 1);
}

glColor4f(color.r, color.g, color.b, color.a);

GLuint vertices[] = {0, 0,
srcRect.w, 0,
srcRect.w, srcRect.h,
0, srcRect.h};

// Calculate the texture coords
// tex coords are between 0 and 1
float srcX = (float)srcRect.x / image->getWidth();
float srcY = (float)srcRect.y / image->getHeight();
float srcW = (float)srcRect.w / image->getWidth();
float srcH = (float)srcRect.h / image->getHeight();

GLfloat texCoords[] = { srcX, srcY,
srcX + srcW, srcY,
srcX + srcW, srcY + srcH,
srcX, srcY + srcH};

glBindTexture(GL_TEXTURE_2D, image->getTexture());

glEnableClientState(GL_VERTEX_ARRAY);
glEnableClientState(GL_TEXTURE_COORD_ARRAY);

glVertexPointer(2, GL_INT, 0, &vertices[0]);
glTexCoordPointer(2, GL_FLOAT, 0, &texCoords[0]);

glDisableClientState(GL_TEXTURE_COORD_ARRAY);
glDisableClientState(GL_VERTEX_ARRAY);

}


##### Share on other sites
I'd guess that your graphics->DrawSubImage routine probably makes a single draw call for a quad corresponding to that tile, which is probably where all the performance is going.

In 3D APIs like OpenGL or Direct3D, you want to minimize state changes (different materials/shaders/textures) and batch like states into a single draw call.

rather than DrawSubImage *actually* drawing right then and there, it should be replaced with a function which creates a context (a material and vertex buffer) for each material, and places the quad vertices into the vertex buffer, then, when all rendering is complete, another function should loop through the list of contexts, activating the material and drawing the whole vertex buffer in one go.

That's sort of a basic explaination/approach, but this or something similar is miles ahead of the naive draw-things-one-at-a-time strategy.

##### Share on other sites
When you say it drops from 1000 to 400, what were you doing at 1000? Drawing nothing at all? Because, of course, actually drawing something is going to be orders of magnitude slower than drawing nothing at all. If I were you, I would get it working with all the layers you need to draw, plus your AI, and if you still manage visually acceptable rates, don't worry about it. Get it working first, then identify bottlenecks and try to optimize them.

##### Share on other sites
Yea, really at this point I'm no where near the point of needing optimizations...so I can wait and see.

##### Share on other sites
You really should talk in milliseconds, not framerate. It will make it more obvious how much slower things are

1000fps = 1ms
400fps = 2.5ms

So, drawing all that stuff takes 1.5ms. Not the end of the world, but really you should be sorting by material (texture) and ideally drawing all tiles with that material with a single draw call.

And, any time you're interested in gpu performance, you really should get familiar with using a gpu profiler (like NVPerfHUD)

##### Share on other sites
Quote:
 Original post by CirdanValenYea, really at this point I'm no where near the point of needing optimizations...so I can wait and see.

But here we're discussing basic design, not sophisticated optimizations: you should take care of "broken windows" at this point, while you aren't doing anything very complex, before your code becomes too messy to change.
My observations below.
glLoadIdentity();
redundant if you get rid of glTranslatef and glRotatef below
glTranslatef(x, y, 0);	if(angle != 0.0) {	glRotatef(angle, 0, 0, 1);}
redundant, compute vertex coordinates on your own even if it requires sine and cosine calls
glColor4f(color.r, color.g, color.b, color.a);
you don't seem to use this color
/* SNIP */
texture coordinates should be precomputed, vertex coordinates should include rotation and translation
glBindTexture(GL_TEXTURE_2D, image->getTexture());
the same texture is used for every tile: bind it once (or once per frame
glEnableClientState(GL_VERTEX_ARRAY);	glEnableClientState(GL_TEXTURE_COORD_ARRAY);
call once per frame, before rendering tiles
glVertexPointer(2, GL_INT, 0, &vertices[0]);	glTexCoordPointer(2, GL_FLOAT, 0, &texCoords[0]);		glDrawArrays(GL_QUADS, 0, 4);
you could batch all tiles together, this array has only 1 quad
glDisableClientState(GL_TEXTURE_COORD_ARRAY);	glDisableClientState(GL_VERTEX_ARRAY);
call once per frame, after rendering tiles

##### Share on other sites
Over the weekend I implemented VBOs so I draw an entire layer with one glDrawArrays call. Here is the map renderer in it's current state. I'm getting ~1300 fps rendering one layer. Now I'm trying to figure out how to get smooth scrolling maps with this code.

void Graphics::drawMap(Texture *texture, Map *map) {	int mapWidth = (_width / 32) + 1;	int mapHeight = (_height / 32) + 1;	int tileCount = mapHeight * mapWidth;	int vertexCount = tileCount * 4;		std::list<MapLayer> layers = map->getMapLayers();	std::list<MapLayer>::iterator iter;		glColor4f(1.0, 1.0, 1.0, 1.0);	glBindTexture(GL_TEXTURE_2D, texture->getTexture());		glEnableClientState(GL_VERTEX_ARRAY);	glEnableClientState(GL_TEXTURE_COORD_ARRAY);		for(iter = layers.begin(); iter != layers.end(); iter++) {			Vertex *verts = new Vertex[vertexCount];		int index = 0;		int tindex = 0;				for(int y = 0; y < 19; y++) {			for(int x = 0; x < 25; x++) {				                                float srcY = ((*iter)[tindex] / 16.0) / 256.0;				float srcX = (((*iter)[tindex] % 16) * 16 ) / 256.0;				float srcW = 16.0 / 256.0;				float srcH = 16.0 / 256.0;								tindex++;								verts[index].x = x * 32;				verts[index].y = y * 32;				verts[index].tx = srcX;				verts[index].ty = srcY;								index++;								verts[index].x = (x * 32) + 32;				verts[index].y = y * 32;				verts[index].tx = srcX + srcW;				verts[index].ty = srcY;								index++;								verts[index].x = (x * 32) + 32;				verts[index].y = (y * 32) + 32;				verts[index].tx = srcX + srcW;				verts[index].ty = srcY + srcH;								index++;								verts[index].x = x * 32;				verts[index].y = (y * 32) + 32;				verts[index].tx = srcX;				verts[index].ty = srcY + srcH;								index++;			}					}				glBindBufferARB(GL_ARRAY_BUFFER_ARB, _vbo);		glBufferDataARB(GL_ARRAY_BUFFER_ARB, sizeof(Vertex) * vertexCount, &verts[0].x, GL_STATIC_DRAW_ARB);				delete verts;				glVertexPointer(2, GL_FLOAT, sizeof(Vertex), 0);		glTexCoordPointer(2, GL_FLOAT, sizeof(Vertex), BUFFER_OFFSET(8));				glDrawArrays(GL_QUADS, 0, vertexCount);			}		glDisableClientState(GL_TEXTURE_COORD_ARRAY);	glDisableClientState(GL_VERTEX_ARRAY);		glBindBufferARB(GL_ARRAY_BUFFER,0);}

##### Share on other sites
First, a suggestion... I don't know if you need to delete and re-allocate verts[] once per layer, but if you don't need to, you shouldn't. Check the semantics of glBufferDataARB. If you already have, my bad.

1. srcW, srcH are constant; move them out of the loop and declare them const.
2. Your srcY calculation doesn't look right; should it read ((*iter)[tindex]/16)*16.0)/256.0?
3. 16/256 is a constant.
4. y*32 need only be computed once per Y loop; x*32 only once per X loop.
5. (*iter) can be expensive for some containers; do it once per layer.

Next, an outright contradiction. There's no point to using vertex buffers if you're going to destroy and re-create them every frame; have a generateMap method that does nearly everything you do here, and call it whenever the map changes; and have a drawMap that does nothing but call glDrawArrays on the most recently generated arrays.

Finally, a suggestion. For smooth scrolling, you seem to generate an entire map in this code. Simply do a glTranslate before calling drawMap to provide an offset to scroll. And second, so you don't draw too many unnecessary tiles, use this code for drawing fixed-size map sectors or something, instead of the entire map no matter how little of it is visible.

##### Share on other sites
The code I posted above has changed because I found a problem with just out right rendering all the layers: I need to draw the players and the characters in-between the layers. So I changed the method to render a layer instead of the entire map. I moved around the equations you mentioned and I did notice a speed increase, thanks! I appreciate your feedback. I'll start working on precalculating/caching the vertex calculations.

void Graphics::drawMapLayer(Texture *texture, MapLayer layer, int mapWidth, int xOffset, int yOffset) {	// 25 tiles across	// 19 tiles down	int tileCount = 19 * 25;	int vertexCount = tileCount * 4;		// Calculate the first tile in a 1D array	// based on our pixel offset	int initialTile = ((yOffset / 32) * 25) + (xOffset / 32);		// The increment value is how many spaces we 	// skip in our array. This value is incremented	// after each row	int increment = mapWidth - 25;		// We need to shift our quads for smooth scrolling	// instead of just scrolling tile by tile.	// This calculates the offset for our quads	int drawOffsetX = xOffset & 31;	int drawOffsetY = yOffset & 31;		Vertex *verts = new Vertex[vertexCount];	int index = 0;	int tindex = initialTile;	float srcY;	float srcX;	float srcW = 16.0 / (float)texture->getWidth();	float srcH = 16.0 / (float)texture->getHeight();		for(int y = 0; y < 19; y++) {		srcY = ((layer[tindex] / 16.0) * 16) / (float)texture->getHeight();		for(int x = 0; x < 25; x++) {			srcX = ((layer[tindex] % 16) * 16 ) / (float)texture->getWidth();			tindex++;						verts[index].x = (x * 32) - drawOffsetX;			verts[index].y = (y * 32) - drawOffsetY;			verts[index].tx = srcX;			verts[index].ty = srcY;						index++;						verts[index].x = ((x * 32) + 32) - drawOffsetX;			verts[index].y = (y * 32) - drawOffsetY;			verts[index].tx = srcX + srcW;			verts[index].ty = srcY;						index++;						verts[index].x = ((x * 32) + 32) - drawOffsetX;			verts[index].y = ((y * 32) + 32) - drawOffsetY;			verts[index].tx = srcX + srcW;			verts[index].ty = srcY + srcH;						index++;						verts[index].x = (x * 32) - drawOffsetX;			verts[index].y = ((y * 32) + 32) - drawOffsetY;			verts[index].tx = srcX;			verts[index].ty = srcY + srcH;						index++;		}		tindex += increment;			}		glBindBufferARB(GL_ARRAY_BUFFER_ARB, _vbo);	glBufferDataARB(GL_ARRAY_BUFFER_ARB, sizeof(Vertex) * vertexCount, &verts[0].x, GL_STATIC_DRAW_ARB);		delete verts;		glColor4f(1.0, 1.0, 1.0, 1.0);	glBindTexture(GL_TEXTURE_2D, texture->getTexture());		glEnableClientState(GL_VERTEX_ARRAY);	glEnableClientState(GL_TEXTURE_COORD_ARRAY);		glVertexPointer(2, GL_FLOAT, sizeof(Vertex), 0);	glTexCoordPointer(2, GL_FLOAT, sizeof(Vertex), BUFFER_OFFSET(8));		glDrawArrays(GL_QUADS, 0, vertexCount);				glDisableClientState(GL_TEXTURE_COORD_ARRAY);	glDisableClientState(GL_VERTEX_ARRAY);		glBindBufferARB(GL_ARRAY_BUFFER,0);}

##### Share on other sites
Okay so I got the new system working...however I experience no speed increase at all. Very peculiar. System looks a little like this

class MapLayer {public:	MapLayer(std::vector<int> tiles, int width, int height);		void generateVertexArray(int xOffset, int yOffset);	Vertex *getVertexArray() const;	int getVertexCount() const;	private:	int _width, _height, _vertexCount;	std::vector<int> _tiles;	Vertex *_vertexArray;};

then drawMapLayer simply getsVertexArray, binds the vertex array, binds the texture, and glDrawArrays. I only calculate the vertex array once when I first create the layer. With and without VBO, I am getting the same FPS as before.

##### Share on other sites
I encountered the same problem for my project. The quickest way (according to me) is to use a vbo and draw the full layer in one batch and offset it on the screen. Don't bother to calculate the good offset, just draw it one time, but draw it as fast as the hardware can.

##### Share on other sites
You're experiencing no speedup because you are still not using vertex buffer objects (VBOs) correctly. Every time you call glBufferDataARB, a transfer from system memory to video memory takes place. You must do that in generateVertexArray, uploading the vertex data to the VBO, and then only binding the VBO, binding the texture, and issuing the actual draw command in drawLayer.

You are, essentially, still just drawing quads in immediate mode; in correct use of VBOs, you put all the geometry into the graphics card, and then each frame the CPU need only send the series of state changes necessary to do the rendering, instead of re-calculating the full set of data to put on-screen every frame.

Try this; create a MapLayer object (very similar to what you have now) which allocates its own VBO, and you feed it a tilemap to read from. It populates its VBO, and only when your map data changes you tell it to re-populate the VBO. When you ask the MapLayer to draw itself, it should just attach its VBO and issue the draw command. Move the selection of texture (glBindTexture) and scrolling (glTranslate) out of the MapLayer object and into a MapComposite object which understands how to organize your layers and sprites.

##### Share on other sites
Ah okay, I get it. Rendering a single layer at ~1425 fps. Thanks for your help! Rendering code now looks like this. I'll figure out a way to remove glColor and glBindTexture here soon

void Graphics::drawMapLayer(Texture *texture, MapLayer *layer) {	glBindBufferARB(GL_ARRAY_BUFFER_ARB, layer->getVertexBuffer());		glColor4f(1.0, 1.0, 1.0, 1.0);	glBindTexture(GL_TEXTURE_2D, texture->getTexture());		glEnableClientState(GL_VERTEX_ARRAY);	glEnableClientState(GL_TEXTURE_COORD_ARRAY);		glVertexPointer(2, GL_FLOAT, sizeof(Vertex), 0);	glTexCoordPointer(2, GL_FLOAT, sizeof(Vertex), BUFFER_OFFSET(8));		glDrawArrays(GL_QUADS, 0, layer->getVertexCount());				glDisableClientState(GL_TEXTURE_COORD_ARRAY);	glDisableClientState(GL_VERTEX_ARRAY);		glBindBufferARB(GL_ARRAY_BUFFER,0);}

##### Share on other sites
Quote:
 Original post by CirdanValenThe code I posted above has changed because I found a problem with just out right rendering all the layers: I need to draw the players and the characters in-between the layers. So I changed the method to render a layer instead of the entire map. I moved around the equations you mentioned and I did notice a speed increase, thanks! I appreciate your feedback. I'll start working on precalculating/caching the vertex calculations.*** Source Snippet Removed ***

What is your hardware configuration? I'm in the process of writing my own tile engine and my code is almost exactly the same as in your post. I'm getting an average of 1500 FPS in Windows 7 and 3000+ in XP. I made the same mistake that you did by allocating and deleting the array in each frame. Once I removed that, my FPS jumped by more than 2x.

I also noticed a speed increase when I moved from quads to triangles. I know that GL_QUADS is removed under GL 3.2 and there are some cards out there that don't support it very well even in GL 2.x. I've also heard rumors (but cannot substantiate them) that some cards draw quads in software.

##### Share on other sites
Macbook Pro 2.4Ghz core 2 duo with GeForce 9600M GT. I get about ~1400 FPS rendering only one layer of 25x19 tiles. I've ran Counter-Strike Source through Crossover (emulator of sorts) and gotten smooth framerate at 1920x1200, hopefully adding more layers and sprites won't slow it down horribly. I'll have to try the triangle thing

##### Share on other sites
Quote:
 Original post by Wyrframe3. 16/256 is a constant.4. y*32 need only be computed once per Y loop; x*32 only once per X loop.

You really should trust your compiler to do these optimizations (so leave these calculations as they are currently). Modern C++ compilers will do it much, much better than average programmer.

##### Share on other sites
Quote:
Original post by bubu LV
Quote:
 Original post by Wyrframe3. 16/256 is a constant.4. y*32 need only be computed once per Y loop; x*32 only once per X loop.

You really should trust your compiler to do these optimizations (so leave these calculations as they are currently). Modern C++ compilers will do it much, much better than average programmer.

16 / 256 is and will always be .0625. However, if the complier doesn't optimize this for some reason (and that is unlikely), an unnecessary division will be added to the loop. It is always best for a programmer to do simple optimizations like this. It is just good programming practice. Unless the programmer wrote the compiler or is willing to step through the compiled machine code, it is a wild guess as to whether the compiler actually did the optimization. Why throw an unnecessary division into a loop and take a chance?

##### Share on other sites
You're running @ 1400. Why are worried about 'slowing down horribly'? As long as you do at least 30-60, you'll be fine.

##### Share on other sites
Quote:
 I also noticed a speed increase when I moved from quads to triangles. I know that GL_QUADS is removed under GL 3.2 and there are some cards out there that don't support it very well even in GL 2.x. I've also heard rumors (but cannot substantiate them) that some cards draw quads in software.

GL_QUADS has always been and always will be implemented by drawing two triangles. Why would anyone draw them in software? That would take an extra special kind of stupid.

##### Share on other sites
Quote:
 Original post by DeyjaGL_QUADS has always been and always will be implemented by drawing two triangles. Why would anyone draw them in software? That would take an extra special kind of stupid.

Like I said, I've heard rumors. Nothing substantial. I agree, it would be stupid. However, I have read reports on the OpenGL.org forums stating that people have had issues with GL_QUADS on some cards. It is best to use triangles instead of quads, especially since GL_QUADS is depreciated in 3.x and removed in 3.2.

It's been awhile since I heard that "rumor", but I think I may be misstating it. I think the issue is that there is no guarantee that using a removed feature in 3.2 under compatibility mode will be hardware accelerated.

[Edited by - maspeir on January 2, 2010 9:28:33 AM]

##### Share on other sites
So I can't find any good example code of drawing a square sprite with gl_triangles through google >.<. I moved to GL_TRIANGLE_STRIP, but the FPS dropped to ~1000 and I have a weird artifact coming off of one of my tiles. I'm guessing it's because I'm rendering vertices I really don't need to, and my loop wasn't really made to setup triangle strip verts. Can anyone point me to a good resource or example?

##### Share on other sites
Quote:
 Original post by CirdanValenSo I can't find any good example code of drawing a square sprite with gl_triangles through google >.<. I moved to GL_TRIANGLE_STRIP, but the FPS dropped to ~1000 and I have a weird artifact coming off of one of my tiles. I'm guessing it's because I'm rendering vertices I really don't need to, and my loop wasn't really made to setup triangle strip verts. Can anyone point me to a good resource or example?

I'm not drawing triangle strips, but individual triangles. I'm calculating the visible vertices each frame. Instead of 4 vertices per tile, I'm calculating 6. This is unfortunately necessary. If you use triangle strips, and thus have the tiles sharing vertices, the textures will blend between tiles. Each tile must be specified as separate geometry and as such, will require 6 vertices.

Here is my code. I'm about to completely rewrite this, but it is how I'm doing it now. The variable "verts" is a global array set up to the maximum number of tiles in the map. Since my tile map wraps, I need to calculate the tiles being drawn, so I cannot send the entire array to the card and be done with it. By specifying this as a global, I eliminated the need to allocate and delete the array each frame. A tile map value of -1 specifies an empty tile, where the lower layers can show through.

void DrawTileLayer(TileLayer *layer,TileMapRect *src_rect,long h_offset,long v_offset){	long	i,j,layer,value,num_tris;	float	s1,s2;	float	t1,t2;	long	start_col,end_col;	long	start_row,end_row;	float	x,y;	long	tile_width,tile_height;	if(!layer->draw)		return;	tile_width = layer->image_data->width;	tile_height = layer->image_data->height;	start_col = src_rect->left / tile_width;	end_col = src_rect->right / tile_width;	start_row = src_rect->top / tile_height;	end_row = src_rect->bottom / tile_height;	x = h_offset;	y = v_offset;	num_tris = 0;	for(j = start_row;j <= end_row;j++)	{		for(i = start_col;i <= end_col;i++)		{			value = layer->map[j][i];			if(value > -1)			{				s1 = layer->image_data->tile_coords[value].s1;				s2 = layer->image_data->tile_coords[value].s2;				t1 = layer->image_data->tile_coords[value].t1;				t2 = layer->image_data->tile_coords[value].t2;				verts[num_tris].x = x;				verts[num_tris].y = y;				verts[num_tris].s = s1;				verts[num_tris].t = t1;				num_tris++;				verts[num_tris].x = x;				verts[num_tris].y = y + tile_height;				verts[num_tris].s = s1;				verts[num_tris].t = t2;				num_tris++;				verts[num_tris].x = x + tile_width;				verts[num_tris].y = y;				verts[num_tris].s = s2;				verts[num_tris].t = t1;				num_tris++;				verts[num_tris].x = x + tile_width;				verts[num_tris].y = y;				verts[num_tris].s = s2;				verts[num_tris].t = t1;				num_tris++;				verts[num_tris].x = x;				verts[num_tris].y = y + tile_height;				verts[num_tris].s = s1;				verts[num_tris].t = t2;				num_tris++;				verts[num_tris].x = x + tile_width;				verts[num_tris].y = y + tile_height;				verts[num_tris].s = s2;				verts[num_tris].t = t2;				num_tris++;			}			x += tile_width;		}		y += tile_height;		x = h_offset;	}	glBindTexture(GL_TEXTURE_2D,layer->image_data->data->texID);	glBufferDataARB(GL_ARRAY_BUFFER,num_tris * 4 * sizeof(float),verts,GL_STREAM_DRAW);	glVertexPointer(2,GL_FLOAT,sizeof(TileMapVertex),0);	glTexCoordPointer(2,GL_FLOAT,sizeof(TileMapVertex),((char *)NULL + (8)));	glEnableClientState(GL_VERTEX_ARRAY);	glEnableClientState(GL_TEXTURE_COORD_ARRAY);	glDrawArrays(GL_TRIANGLES,0,num_tris);	glDisableClientState(GL_TEXTURE_COORD_ARRAY);	glDisableClientState(GL_VERTEX_ARRAY);}

##### Share on other sites
If all you changed was the GL_QUADS to GL_TRIANGLE_STRIP, then yes, you aren't doing it right at all. Consider; with GL_QUADS, vertices are consumed as follows, where vertices 2/4 and 3/5 are at the same position...
0--24-6|A |B ||  |  |1--35-7
... where 4N vertices produce N quads. And with GL_TRIANGLES, like this...
0-146-8|A/|C/|/B|/2357
... where 3N vertices produce N triangles (only 9 vertices and 3 triangles shown so I don't have to go into hexadecimal vertex indices).

Triangle strips and fans, however, and entirely different beasts. They allow you to optimize your vertex data for triangles that are assumed to have many superpositioned edges. The correct usage of GL_TRIANGLE_STRIP is as follows...
0--2--4|A/|C/||/B|/D|1--3--5
... where N+2 vertices are used to display N triangles. Each triangle is made of vertices (i), (i+1), and (i+2). If you fed your N-vertex GL_QUADS vertex data to GL_TRIANGLE_STRIP, you'd not only produce N-2 triangles instead of N/2 triangles (use an N like 1000 and you'll see the difference), but half those triangles will be wasted; 012 and 123 are fine, but 234 and 345 are zero-width lines.

Besides which, using tile slices you can't use GL_TRIANGLE_STRIP, as shared-position vertices don't have shared texture UVs. Unless you want to go the road of pixel shaders to select tiles from a texture map, you're best sticking with GL_QUADS.

[Edited by - Wyrframe on January 4, 2010 12:09:18 AM]

## Create an account

Register a new account

• ### Forum Statistics

• Total Topics
628293
• Total Posts
2981869

• 11
• 10
• 10
• 11
• 17