SDL + OpenGL Speed Question

Started by
4 comments, last by clb 15 years, 3 months ago
Hey everbody. I'm developing a 2d game engine, which is supposed to be very simple. Engine uses SDL for window management and OpenGL for hardware acceleration. Firstly, i want to list some classes, which my renderer depends on. * cSceneManager: Holds all nodes' pointers in a std::vector<cSceneNode *> and uses cGraphicsProvider to render them. * cSceneNode: Base class for scene nodes. It has SetLocation function which checks if the node moved to a location outside of cSceneManager's active camera's FOV. If so, it updates it's bVisible to false. * cCamera: Simply a rectangle to describe a field of view. It has x, y, w, h properties as well as some methods to move it around. * cGraphicsProvider: This class is the only class that sends vertices to the graphics card. Any class that wants to render something, must use this class' DrawTextureRegion (to draw a portion of texture) or DrawTexture (to draw whole texture) Any suggestions about the implementation or design of the engine will be highly appreciated. Currently, i'm benchmarking the engine. My test application runs at 30 FPS with 10000 scene nodes attached and set to visible.

	for (int i = 0; i < 10000; i++)
	{
		for (int k = 0; k < 1; k++)
		{
			car * myCar = new car();
			myCar->LoadAnimation(&ex, 100, 100);
			myCar->SetPosition((i * 100) + (i * 5), (k * 100) + (k * 100));
			engine->smgr->AddNode(myCar);
		}
	}


I wonder if 30 FPS is really OK for 10000 nodes. I'm not sure since i'm not using complex shader operations or other fancy high-computation-requiring features. Every node is an animation and in every loop, it's Update() method is being called. My node rendering code is like this:

		glPushMatrix();
			glBindTexture(GL_TEXTURE_2D, texture->uiTexId);
			glLoadIdentity();
			glTranslatef(x, y, 0.0f);
			glEnable(GL_BLEND);
			glColor4ub(255, 255, 255, alpha);
			glScalef(scale, scale, 1.0f);

		
			glBegin(GL_TRIANGLE_STRIP);
				glTexCoord2d(fClippingX, fClippingY);    glVertex2f(0, 0);
				glTexCoord2d(fClippingX2, fClippingY);   glVertex2f(fDrawingW, 0);
				glTexCoord2d(fClippingX, fClippingY2);   glVertex2f(0, fDrawingH);
				glTexCoord2d(fClippingX2, fClippingY2);  glVertex2f(fDrawingW, fDrawingH);
			glEnd();


Any comments will be great. Thanks in advance. [Edited by - by on December 27, 2008 12:58:58 PM]
Advertisement
Well, you have to draw a quad for every sprite, so you're effectively rendering 20,000 triangles. 30FPS seems like a reasonable framerate for that. I'm a rookie with OpenGL myself, but I don't think there's a lot you can do to optimize that drawing code. You believe you could gain a huge performance boost in your benchmark by using PBOs, but in a practical application where there are lots of different types of sprites being displayed it probably wouldn't make much difference.

There are also VBOs, but -- and someone please correct me if I'm wrong -- I don't think four vertices are enough to gain a significant performance boost by using them.
First of all:
If you're worried about performance, DO NOT use immediate mode, its slow. (You are using it)

also , make sure you make a release build (Visual Studio) or compile with atleast -O2 (gcc) when testing the performance.

Since you are effectivly only drawing quads you should probably use either a display list or a vbo and use glTranslate*/glScale* to get it the right size and position.
[size="1"]I don't suffer from insanity, I'm enjoying every minute of it.
The voices in my head may not be real, but they have some good ideas!
Thanks! I will try to modify my renderer to use display lists.

P.S. I'm already making release build. (MSVC++ 2008)
Recent news :)

Getting rid of un-neccessary texture binding doubled the rendering speed!

if (uiCurrentTexture != texture->uiTexId){glBindTexture(GL_TEXTURE_2D, texture->uiTexId);uiCurrentTexture = texture->uiTexId;}
Quote:Original post by by
I wonder if 30 FPS is really OK for 10000 nodes.


You're the only person who can answer that. Optimization is not something you do until you get the fastest code possible, it is something you do to reach a level of performance you contend with. Now, as other posters said, there indeed are several methods of optimizing your code (most of them being about some form of batching to reduce CPU overhead).

Did you use 10000 nodes because you need the 10k nodes, or was it just for stress testing? If you only ever need something like 100-1000 nodes, there might be little point in optimizing that piece of code.

Consider the pros/cons for working on optimizing the rendering:
pros:
- Better performance. You'll have less CPU overhead with deferred-mode rendering and OpenGL call batching.
- Learning experience. If you don't have the faintest clue on how to optimize the code right now, perhaps it will be useful to investigate so, since *in this case*, the optimization involves using different techniques (contrast this e.g. to the dozen micro-optimization discussions 'i++' vs '++i' that often pop up in the forums)

cons:
- If you optimize, there will be more complexity (~more LoC), or less flexibility/lines of code. Notice that for your current immediate-mode rendering you can easily fine-tune the vertex positions and other attributes without having to write several lines of lock/unlock code etc. to manage the position or other data. Instead, you can tightly integrate animation/adjustment parameters into the rendering loop, which is ideally simple in a simple situation like this. (i.e. wobble-effect on vertices, or animating vertex UV's or colors using some complicated function)
- Optimizing will take time, which, if you will not even need the extra performance, is just wasted time. The smart thing to do, if you don't know right now whether to optimize or not, is to prepare for that you might need to. That is, contain your I-Know-This-Is-Not-The-Most-Optimal-Code lines so that if it happens that you need to optimize, you can just rewrite that small part of the code instead of it escalating to a full system-wide refactoring. If you don't have the faintest idea of how to do that now, perhaps the 'Learning Experience' -thing above gains a bit more weight.

While time spent on unnecessary or unsuccessful optimizations is a waste, time spent profiling unknown parts of code most often is not (even when there is no performance problem at all). Now, profiling doesn't mean only that you enable/disable some parts of the code and watch the FPS counter go up and down, but needs a more thorough examination:

Do I know how many times min/avg/max different code paths are executed?
Do I know what triggers the min/max cases?
How do these different cases take up CPU time?
Which part of the code takes proportionally the most time?
If I were to optimize that part, what is the best speedup I could get?
Is it a best-/worst-/average -case speedup?
Am I CPU or GPU -bound?
What parts of the GPU are most busy, how about most idle?

Tools like Intel vTune/AMD CodeAnalyst, Microsoft PIX/graphicRemedy gDEBugger and nVidia PerfSDK/AMD GpuPerfStudio are a programmer's salvation. (so much that I bought an nVidia GPU over ATI one recently just because I couldn't take it any more that AMD's GpuPerfStudio is so crappy compared to PerfSDK)

This topic is closed to new replies.

Advertisement