Sign in to follow this  
eriatarka

OpenGL Immediate mode faster than vertex arrays?

Recommended Posts

Hi there. First post, so greetings (though I have often lurked on this forum/site; very good resource). I'm currently trying to do some 2d stuff in OpenGL, and I'm getting some "interesting" performance characteristics. To be more precise, I'm trying to emulate an ASCII console, so I divide the 1024x768 screen into 128 by 96 rectangles of 8x8 pixels each and render each one as a quad which is textured using some part of a 128x128 font bitmap. I use color modulation so I can have a different color for each rectangle. I first implemented this in immediate mode, just to get something running. Then I converted it to vertex arrays, and surprisingly the framerate dropped by a wide margin. To be precise, I get about 273fps in immediate mode and 177fps with VAs. Here's the rendering code for immediate mode (in D):
	void render_im()
	{
		ConsoleChar *ch;
		const dx = 8f / 128f;
		
		glEnable(GL_TEXTURE_2D);
		fontTex.bind();
		glTexEnvi(GL_TEXTURE_ENV, GL_TEXTURE_ENV_MODE, GL_MODULATE);
		
		glBegin(GL_QUADS);
		ch = &chars[0];
		for (int y = 0; y < height; ++y) {
			for (int x = 0; x < width; ++x) {
				glColor3ubv(ch.fore.ptr);
				int xidx = ch.ch / 16;
				int yidx = ch.ch % 16;
				
				glTexCoord2f(xidx*dx, (yidx+1)*dx); glVertex2i(x, y);
				glTexCoord2f((xidx+1)*dx,(yidx+1)*dx); glVertex2i(x+1, y);
				glTexCoord2f((xidx+1)*dx,yidx*dx); glVertex2i(x+1, y+1);
				glTexCoord2f(xidx*dx,yidx*dx); glVertex2i(x, y+1);
				ch++;
			}
		}
		glEnd();
	}
And here for VAs:
	void render_va()
	{
		glEnable(GL_TEXTURE_2D);
		fontTex.bind();
		glTexEnvi(GL_TEXTURE_ENV, GL_TEXTURE_ENV_MODE, GL_MODULATE);
		
		glEnableClientState(GL_VERTEX_ARRAY);
		glEnableClientState(GL_COLOR_ARRAY);
		glEnableClientState(GL_TEXTURE_COORD_ARRAY);
		
		glVertexPointer(2, GL_FLOAT, 0, vertices.ptr);
		glColorPointer(4, GL_UNSIGNED_BYTE, 0, foreColors.ptr);
		glTexCoordPointer(2, GL_FLOAT, 0, texCoords.ptr);
		
		glDrawArrays(GL_QUADS, 0, 4*width*height);
		
		glDisableClientState(GL_VERTEX_ARRAY);
		glDisableClientState(GL_COLOR_ARRAY);
		glDisableClientState(GL_TEXTURE_COORD_ARRAY);
		
		glDisable(GL_TEXTURE_2D);
	}
Basically, all I do is call glClear/render/flip in my main loop. (I realize glClear isn't strictly needed here, and removing it does give some relative speedup, but nothing that changes the basic problem.) Now, in order to find out where my bottleneck lies, I scaled down the window size by some factors and repeated the test. Here are the framerates I got:
                        imm		VA
screen / 8		400		521
screen / 4		400		521
screen / 2		400		252
screen			273		177
To clarify, the last line is at 1024x768, the one before at 512x384, and so on. I find this very interesting. The big speedup with VAs at smaller resolutions seems to indicate that I'm fillrate limited; but is this realistic? This is on a Radeon 9500; definitely not a bleeding edge card, but I'm only rendering each pixel on the screen exactly once, so I don't see how that could be a problem. Also, if fillrate is the problem, then why does IM actually perform better at high resolutions? My only theory right now is that due to the fact that I have to store the color 4 times per rectangle (once per vertex) when rendering in VA mode, I get additional bandwidth problems. But why the screen resolution dependence, then? It's crazy. If anyone has any idea as to what I might be doing wrong or why this happens, I'd be very grateful.

Share this post


Link to post
Share on other sites
Hi and welcome.

That's pretty interesting, have you tried the same code on another graphics card?
I can't tell if you actually need to modify the colors on a frame-to-frame basis?
If not, then I don't think a Vertex Array is the way to go for this kind of static data, a Display List would probably run significantly faster.

Share this post


Link to post
Share on other sites
Yes, I want to do realtime animation with this stuff, so I do need to change the colors/texture coordinates from frame to frame. So, unfortunately, display lists are not an option.

I've done some more testing. The naked render/flip main loop isn't realistic anyway, so I added in some stuff that hogs up some CPU, and it actually seems that the VA method performs better (relative to IM) in that case. So I guess immediate mode only performs better when you throw all available cycles at it, whereas VAs offload more work to the GPU. It's still not quite the speedup I'd expect, and I still don't understand the number from my benchmarks, but I can at least live with it for the moment.


Another, only tangentially related question: I want to draw my rectangles as having different background color and foreground color, and the texture's alpha channel determines what is foreground and what background. So, currently I do:

render all quads with background colors, without texturing,
render all quads again with foreground colors, with texturing and alpha test.

Can anybody think of a more clever way, possibly doing this without two passes (and without shaders)?

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this