Tile map optimization?

Started by
21 comments, last by Wyrframe 14 years, 3 months ago
So I have this code setup to test rendering a tile map. This code works...but it puts a huge dent into the FPS. With one layer it goes from ~1000 FPS to 400 FPS, and with two layers it goes down to ~200 FPS. I'm planning on having more than two layers, along with players and other objects and effects. Is my OpenGL code bad? Is there a way to optimize the for loops? The only way I can think of to really fix the problem is simply render the different layers onto textures, then just render the one big texture, I can't foresee any problems with that...then again it's 1AM.
		for(int y = 0; y <= 14; y++) {
			for(int x = 0; x <= 18; x++) {
				_graphics->drawSubImage(tiles, x * 32, y * 32, 0.0,  makeQuad(32, 0, 32, 32));
			}
			
		}
void Graphics::drawSubImage(Image *image, int x, int y, float angle, Quad srcRect, Color color) {
	// Start from 0, 0
	glLoadIdentity();
	
	// Move and rotate
	glTranslatef(x, y, 0);
	
	if(angle != 0.0) {
		glRotatef(angle, 0, 0, 1);
	}
	
	glColor4f(color.r, color.g, color.b, color.a);
	
	GLuint vertices[] = {0, 0,
						srcRect.w, 0,
						srcRect.w, srcRect.h,
						0, srcRect.h};
	
	// Calculate the texture coords
	// tex coords are between 0 and 1
	float srcX = (float)srcRect.x / image->getWidth();
	float srcY = (float)srcRect.y / image->getHeight();
	float srcW = (float)srcRect.w / image->getWidth();
	float srcH = (float)srcRect.h / image->getHeight();
	
	GLfloat texCoords[] = { srcX, srcY,
							srcX + srcW, srcY,
							srcX + srcW, srcY + srcH,
							srcX, srcY + srcH};
	
	glBindTexture(GL_TEXTURE_2D, image->getTexture());
	
	glEnableClientState(GL_VERTEX_ARRAY);
	glEnableClientState(GL_TEXTURE_COORD_ARRAY);
	
	glVertexPointer(2, GL_INT, 0, &vertices[0]);
	glTexCoordPointer(2, GL_FLOAT, 0, &texCoords[0]);
	
	glDrawArrays(GL_QUADS, 0, 4);
	
	glDisableClientState(GL_TEXTURE_COORD_ARRAY);
	glDisableClientState(GL_VERTEX_ARRAY);
	
}
Advertisement
I'd guess that your graphics->DrawSubImage routine probably makes a single draw call for a quad corresponding to that tile, which is probably where all the performance is going.

In 3D APIs like OpenGL or Direct3D, you want to minimize state changes (different materials/shaders/textures) and batch like states into a single draw call.

rather than DrawSubImage *actually* drawing right then and there, it should be replaced with a function which creates a context (a material and vertex buffer) for each material, and places the quad vertices into the vertex buffer, then, when all rendering is complete, another function should loop through the list of contexts, activating the material and drawing the whole vertex buffer in one go.


That's sort of a basic explaination/approach, but this or something similar is miles ahead of the naive draw-things-one-at-a-time strategy.

throw table_exception("(? ???)? ? ???");

When you say it drops from 1000 to 400, what were you doing at 1000? Drawing nothing at all? Because, of course, actually drawing something is going to be orders of magnitude slower than drawing nothing at all. If I were you, I would get it working with all the layers you need to draw, plus your AI, and if you still manage visually acceptable rates, don't worry about it. Get it working first, then identify bottlenecks and try to optimize them.
Yea, really at this point I'm no where near the point of needing optimizations...so I can wait and see.
You really should talk in milliseconds, not framerate. It will make it more obvious how much slower things are

1000fps = 1ms
400fps = 2.5ms

So, drawing all that stuff takes 1.5ms. Not the end of the world, but really you should be sorting by material (texture) and ideally drawing all tiles with that material with a single draw call.

And, any time you're interested in gpu performance, you really should get familiar with using a gpu profiler (like NVPerfHUD)
Quote:Original post by CirdanValen
Yea, really at this point I'm no where near the point of needing optimizations...so I can wait and see.

But here we're discussing basic design, not sophisticated optimizations: you should take care of "broken windows" at this point, while you aren't doing anything very complex, before your code becomes too messy to change.
My observations below.
glLoadIdentity();
redundant if you get rid of glTranslatef and glRotatef below
glTranslatef(x, y, 0);	if(angle != 0.0) {	glRotatef(angle, 0, 0, 1);}
redundant, compute vertex coordinates on your own even if it requires sine and cosine calls
glColor4f(color.r, color.g, color.b, color.a);
you don't seem to use this color
/* SNIP */
texture coordinates should be precomputed, vertex coordinates should include rotation and translation
glBindTexture(GL_TEXTURE_2D, image->getTexture());
the same texture is used for every tile: bind it once (or once per frame
glEnableClientState(GL_VERTEX_ARRAY);	glEnableClientState(GL_TEXTURE_COORD_ARRAY);
call once per frame, before rendering tiles
glVertexPointer(2, GL_INT, 0, &vertices[0]);	glTexCoordPointer(2, GL_FLOAT, 0, &texCoords[0]);		glDrawArrays(GL_QUADS, 0, 4);
you could batch all tiles together, this array has only 1 quad
glDisableClientState(GL_TEXTURE_COORD_ARRAY);	glDisableClientState(GL_VERTEX_ARRAY);
call once per frame, after rendering tiles

Omae Wa Mou Shindeiru

Over the weekend I implemented VBOs so I draw an entire layer with one glDrawArrays call. Here is the map renderer in it's current state. I'm getting ~1300 fps rendering one layer. Now I'm trying to figure out how to get smooth scrolling maps with this code.

void Graphics::drawMap(Texture *texture, Map *map) {	int mapWidth = (_width / 32) + 1;	int mapHeight = (_height / 32) + 1;	int tileCount = mapHeight * mapWidth;	int vertexCount = tileCount * 4;		std::list<MapLayer> layers = map->getMapLayers();	std::list<MapLayer>::iterator iter;		glColor4f(1.0, 1.0, 1.0, 1.0);	glBindTexture(GL_TEXTURE_2D, texture->getTexture());		glEnableClientState(GL_VERTEX_ARRAY);	glEnableClientState(GL_TEXTURE_COORD_ARRAY);		for(iter = layers.begin(); iter != layers.end(); iter++) {			Vertex *verts = new Vertex[vertexCount];		int index = 0;		int tindex = 0;				for(int y = 0; y < 19; y++) {			for(int x = 0; x < 25; x++) {				                                float srcY = ((*iter)[tindex] / 16.0) / 256.0;				float srcX = (((*iter)[tindex] % 16) * 16 ) / 256.0;				float srcW = 16.0 / 256.0;				float srcH = 16.0 / 256.0;								tindex++;								verts[index].x = x * 32;				verts[index].y = y * 32;				verts[index].tx = srcX;				verts[index].ty = srcY;								index++;								verts[index].x = (x * 32) + 32;				verts[index].y = y * 32;				verts[index].tx = srcX + srcW;				verts[index].ty = srcY;								index++;								verts[index].x = (x * 32) + 32;				verts[index].y = (y * 32) + 32;				verts[index].tx = srcX + srcW;				verts[index].ty = srcY + srcH;								index++;								verts[index].x = x * 32;				verts[index].y = (y * 32) + 32;				verts[index].tx = srcX;				verts[index].ty = srcY + srcH;								index++;			}					}				glBindBufferARB(GL_ARRAY_BUFFER_ARB, _vbo);		glBufferDataARB(GL_ARRAY_BUFFER_ARB, sizeof(Vertex) * vertexCount, &verts[0].x, GL_STATIC_DRAW_ARB);				delete verts;				glVertexPointer(2, GL_FLOAT, sizeof(Vertex), 0);		glTexCoordPointer(2, GL_FLOAT, sizeof(Vertex), BUFFER_OFFSET(8));				glDrawArrays(GL_QUADS, 0, vertexCount);			}		glDisableClientState(GL_TEXTURE_COORD_ARRAY);	glDisableClientState(GL_VERTEX_ARRAY);		glBindBufferARB(GL_ARRAY_BUFFER,0);}
First, a suggestion... I don't know if you need to delete and re-allocate verts[] once per layer, but if you don't need to, you shouldn't. Check the semantics of glBufferDataARB. If you already have, my bad.

Second, small things that make me twitch about your code...

1. srcW, srcH are constant; move them out of the loop and declare them const.
2. Your srcY calculation doesn't look right; should it read `((*iter)[tindex]/16)*16.0)/256.0`?
3. 16/256 is a constant.
4. y*32 need only be computed once per Y loop; x*32 only once per X loop.
5. (*iter) can be expensive for some containers; do it once per layer.

Next, an outright contradiction. There's no point to using vertex buffers if you're going to destroy and re-create them every frame; have a generateMap method that does nearly everything you do here, and call it whenever the map changes; and have a drawMap that does nothing but call glDrawArrays on the most recently generated arrays.

Finally, a suggestion. For smooth scrolling, you seem to generate an entire map in this code. Simply do a glTranslate before calling drawMap to provide an offset to scroll. And second, so you don't draw too many unnecessary tiles, use this code for drawing fixed-size map sectors or something, instead of the entire map no matter how little of it is visible.
RIP GameDev.net: launched 2 unusably-broken forum engines in as many years, and now has ceased operating as a forum at all, happy to remain naught but an advertising platform with an attached social media presense, headed by a staff who by their own admission have no idea what their userbase wants or expects.Here's to the good times; shame they exist in the past.
The code I posted above has changed because I found a problem with just out right rendering all the layers: I need to draw the players and the characters in-between the layers. So I changed the method to render a layer instead of the entire map. I moved around the equations you mentioned and I did notice a speed increase, thanks! I appreciate your feedback. I'll start working on precalculating/caching the vertex calculations.

void Graphics::drawMapLayer(Texture *texture, MapLayer layer, int mapWidth, int xOffset, int yOffset) {	// 25 tiles across	// 19 tiles down	int tileCount = 19 * 25;	int vertexCount = tileCount * 4;		// Calculate the first tile in a 1D array	// based on our pixel offset	int initialTile = ((yOffset / 32) * 25) + (xOffset / 32);		// The increment value is how many spaces we 	// skip in our array. This value is incremented	// after each row	int increment = mapWidth - 25;		// We need to shift our quads for smooth scrolling	// instead of just scrolling tile by tile.	// This calculates the offset for our quads	int drawOffsetX = xOffset & 31;	int drawOffsetY = yOffset & 31;		Vertex *verts = new Vertex[vertexCount];	int index = 0;	int tindex = initialTile;	float srcY;	float srcX;	float srcW = 16.0 / (float)texture->getWidth();	float srcH = 16.0 / (float)texture->getHeight();		for(int y = 0; y < 19; y++) {		srcY = ((layer[tindex] / 16.0) * 16) / (float)texture->getHeight();		for(int x = 0; x < 25; x++) {			srcX = ((layer[tindex] % 16) * 16 ) / (float)texture->getWidth();			tindex++;						verts[index].x = (x * 32) - drawOffsetX;			verts[index].y = (y * 32) - drawOffsetY;			verts[index].tx = srcX;			verts[index].ty = srcY;						index++;						verts[index].x = ((x * 32) + 32) - drawOffsetX;			verts[index].y = (y * 32) - drawOffsetY;			verts[index].tx = srcX + srcW;			verts[index].ty = srcY;						index++;						verts[index].x = ((x * 32) + 32) - drawOffsetX;			verts[index].y = ((y * 32) + 32) - drawOffsetY;			verts[index].tx = srcX + srcW;			verts[index].ty = srcY + srcH;						index++;						verts[index].x = (x * 32) - drawOffsetX;			verts[index].y = ((y * 32) + 32) - drawOffsetY;			verts[index].tx = srcX;			verts[index].ty = srcY + srcH;						index++;		}		tindex += increment;			}		glBindBufferARB(GL_ARRAY_BUFFER_ARB, _vbo);	glBufferDataARB(GL_ARRAY_BUFFER_ARB, sizeof(Vertex) * vertexCount, &verts[0].x, GL_STATIC_DRAW_ARB);		delete verts;		glColor4f(1.0, 1.0, 1.0, 1.0);	glBindTexture(GL_TEXTURE_2D, texture->getTexture());		glEnableClientState(GL_VERTEX_ARRAY);	glEnableClientState(GL_TEXTURE_COORD_ARRAY);		glVertexPointer(2, GL_FLOAT, sizeof(Vertex), 0);	glTexCoordPointer(2, GL_FLOAT, sizeof(Vertex), BUFFER_OFFSET(8));		glDrawArrays(GL_QUADS, 0, vertexCount);				glDisableClientState(GL_TEXTURE_COORD_ARRAY);	glDisableClientState(GL_VERTEX_ARRAY);		glBindBufferARB(GL_ARRAY_BUFFER,0);}
Okay so I got the new system working...however I experience no speed increase at all. Very peculiar. System looks a little like this

class MapLayer {public:	MapLayer(std::vector<int> tiles, int width, int height);		void generateVertexArray(int xOffset, int yOffset);	Vertex *getVertexArray() const;	int getVertexCount() const;	private:	int _width, _height, _vertexCount;	std::vector<int> _tiles;	Vertex *_vertexArray;};


then drawMapLayer simply getsVertexArray, binds the vertex array, binds the texture, and glDrawArrays. I only calculate the vertex array once when I first create the layer. With and without VBO, I am getting the same FPS as before.

This topic is closed to new replies.

Advertisement