Sign in to follow this  
roos

Performance questions for a 2D game

Recommended Posts

Hi, I'm working on a tile-based 2D RPG with some other folks. Anyway when I joined the project, it was running at 200 fps, which is pretty horrid since my machine should be able to handle a 2D RPG without flinching... So, previously we had every image in its own file which is loaded into its own texture. I wrote a system that allows you to still keep images in their own files (for convenience) but at run time it will fit multiple images into one texture. So now the game runs at 650 fps. Actually I was surprised that it ran so fast- I hadn't even written a system to sort images by texture when rendering. However in practice it works quite well because I load all tiles into one texture, all characters into another texture, so batching is kind of automatic. Now, the other optimization that I'm considering is batching vertex submission. Currently I'm just sending vertices one at a time with glVertex2f() and friends. Of course in a 3D game this would be the kiss of death but for a 2D game I'm not sure since there aren't as many vertices to push through. Also, every frame I'd have to basically create a new vertex array and fill it up, so that probably has some performance impact. What do you guys think, should I spend some time on improving the vertex submission? If so, what would be a good way to do it- filling up a vertex array every frame? Thanks very much, roos

Share this post


Link to post
Share on other sites
You don't have to make any complicated batching stuff with VBO's or vertex arrays.
One optimization you could do (assuming you haven't done already) is to render as many quads as possible between glBegin()...glEnd(). I'm not sure how much would that help, because I mainly work with DirectX.
But anyway, vertex submission isn't the worst bottleneck, especially in a 2D game, but fillrate is. Seems like your game is running pretty fast already, so I don't think you should spend anymore time optimizing vertex submission.

Share this post


Link to post
Share on other sites
650 FPS isn't anything to laugh at. (Neither is 200 FPS, but...) If your getting speeds like this optimizing the rendering should probably be the last of your worries. The only time I every worry about my frame rate is when it starts dropping around that magic 60 number ^_^

In any case, if you DO want to squeeze every last frame out of it, then I don't know that vertex batching is going to help a whole lot. As pointed out earlier at this point you're probably 100% fill-rate limited, so effecient vertex transfer won't make a lick of visible different. I would look into a simple sort-by-material/texture routine, like you mentioned, but I think that the biggest think to look into may be finding a way to "cache" your scene as much as possible.

While I'm not familiar with this game's actual setup, most of the time a 2D RPG involves a very few moving charecters and a lot of static background. If you could render the static areas to a texture at certain view-change intervals you would be able to quickly throw the background up on screen and only draw the dynamic charecters. This may or may not work for your setup, but it's worth looking into.

Share this post


Link to post
Share on other sites
Quote:
Original post by centipede
You don't have to make any complicated batching stuff with VBO's or vertex arrays.
One optimization you could do (assuming you haven't done already) is to render as many quads as possible between glBegin()...glEnd(). I'm not sure how much would that help, because I mainly work with DirectX.
But anyway, vertex submission isn't the worst bottleneck, especially in a 2D game, but fillrate is. Seems like your game is running pretty fast already, so I don't think you should spend anymore time optimizing vertex submission.


Thanks, that info really helps! Hmm, I'm a bit rusty with OpenGL myself since I've been using DirectX a lot the last year. So... Putting all the draws in between a glBegin() and glEnd() sounds like a good idea- I'll have to look into one thing though... I know there are certain OpenGL commands that you're NOT allowed to have within a glBegin/glEnd block, but I can't remember which. That might be a problem since for each image that's drawn, there's a little bit of setup code that needs to be done like binding texture, and setting up a transform using mostly glTranslatef.

I have another question while I'm at it. Say you call glBindTexture() 500 times for drawing 500 images, but each time you bind to the same texture. Should I implement a check that keeps track of the last bound texture and skips binding the texture if you do the same one twice? My guess is that it's not necessary- OpenGL already seems to eliminate the 499 redundant calls, judging from how my frame rate skyrocketed when I put all my images in 1 texture.

Quote:
Original post by skittleo
200fps is not slow. Did you mean 20? Or could your problem lie in your timing system?


Yeah that's a good point :) There are two reasons I said 200 fps is slow... Firstly it's because this was a release build on a very very fast machine, so on lower-end machines it'd be likely to run much slower. Secondly, our game isn't done yet- right now it was giving 200 fps on just a simple tile map. In the near future we're planning to add loads of eye candy, like realistic water, multitexturing for smooth transitions between tiles, simple 2D lighting effects, particle systems, etc... So, I wanted to make sure that at least the core of the rendering was decently optimized before going forwards.

Quote:
Original post by Toji
In any case, if you DO want to squeeze every last frame out of it, then I don't know that vertex batching is going to help a whole lot. As pointed out earlier at this point you're probably 100% fill-rate limited, so effecient vertex transfer won't make a lick of visible different.


Hmm, yeah I guess you're right :) I'll probably just leave the tile rendering as it is and only worry about vertex batching for the particle systems.

Quote:
Original post by TojiWhile I'm not familiar with this game's actual setup, most of the time a 2D RPG involves a very few moving charecters and a lot of static background. If you could render the static areas to a texture at certain view-change intervals you would be able to quickly throw the background up on screen and only draw the dynamic charecters. This may or may not work for your setup, but it's worth looking into.


Thanks, I'll check into it although I'm not quite sure how that'd work for this setup. The two "thorn in my sides" are that the background is constantly scrolling which makes it a bit tougher, and also some of the map tile layers aren't 100% "background" since they can be displayed on top of the character.

Anyway again thanks very much guys, I guess I'll ditch the idea of optimizing the vertex submission, and I might even not bother sorting them when rendering since grouping images into the same texture largely solves that problem.

Thanks,
roos

Share this post


Link to post
Share on other sites
Quote:
Original post by roos

I have another question while I'm at it...

Usually drivers notice redundant render state settings and won't execute them. So, you don't need to add any checking for them in your code (although it wouldn't hurt).

Share this post


Link to post
Share on other sites
Quote:
Original post by centipede
Quote:
Original post by roos

I have another question while I'm at it...

Usually drivers notice redundant render state settings and won't execute them. So, you don't need to add any checking for them in your code (although it wouldn't hurt).


not true
eg
for ( i=0; i<10000;i++)
glBindTexture( texture0, grass_textureID )

is 10x slower (not a few percent but 1000%) than

for ( i=0; i<10000;i++)
{
if ( currenttextureID != grass_textureID )
{
glBindTexture( texture0, grass_textureID )
currenttextureID = grass_textureID;
}
}

Share this post


Link to post
Share on other sites
Quote:
Original post by zedzeek
not true
eg
....

I'd disagree with you there, zedzeek. I've preformed similar tests on multiple cards and MOST of the time the speed comes out just about even.

I think that the exact behavior in this case is entirely driver dependant. A driver maker is allowed a lot of interpretation of the OpenGL specs when it comes to optimizations like this, and thus while some drivers may contain a simple redundancy check, others may not. I think the best path for a game programmer is to implement one yourself regaurdless, as a double redundancy check (yours and the drivers) is certainly going to be less painful than none at all.

Share this post


Link to post
Share on other sites
The best way in my opinion to do what you are trying to do is to batch as many sprites as you can in one glDrawElements. for example what you need to do is Create an Index Buffer or in GL Term an GLuint* Buffer that will hold all the Indices of the specified vertex buffer for the sprite to draw. For example if you have 3 textures then you need to 3 Index Buffer One for each Textures and at load them you sort them and put them in their own vertex Indices then all you have to do in your rendering is call them one by one. So if we had 3 Texture then we would just do
for(int i=0;i<3;i++) {
glBindTexture(GL_TEXTURE_2D,tex[i]);
glDrawElements(GL_QUADS,sizeof(buffer[i]),GL_UNSIGNED_BYTE,buffer[i]);
}

so as you can see the Texture get set only 3 times and you have all your sprites on screen. i hope you understand what i am trying to say. I did something like that last year and when i render around 2000 sprites i was still getting 600fps

Share this post


Link to post
Share on other sites
Quote:
Original post by Toji
Quote:
Original post by zedzeek
not true
eg
....

I'd disagree with you there, zedzeek. I've preformed similar tests on multiple cards and MOST of the time the speed comes out just about even.


did u try on a nvidia card?, i tested about a month ago the above bit of code + theres a huge difference

Share this post


Link to post
Share on other sites
Yes, I've tried it on Nvidia, ATI, and some onboard Intel chips. On my 5500fx it does make slow things down a little to leave out the redundancy check, but overall it's not a big change. Other Nvidia boards seem to not care one way or the other. I've only been able to test it on two ATI cards, but neither of them showed any difference between the two methods. The Intel chip, predictably, did show a good chunk of slowdown though. (Not quite as much as you'd expect, mind you.)

Really it should be a non issue, though. Like I said: Just stay on the safe side and implement your own anyways.

Share this post


Link to post
Share on other sites
just tested again + yes there is a huge difference, i assume with your testing u didnt do so many loops (im doing 10million below) + so the differences were so small that the results looked the same, anyways try this if theres still doubt (doing the checking yourself is >20x quicker) which does make sense for one thing its a comparrions vs a function call

//#define ALWAYSBIND

int start = timeGetTime();
int current_tex;
for ( int i=0; i<10000000; i++ )
{
#ifdef ALWAYSBIND
glBindTexture( GL_TEXTURE_2D, 1 );
#else
if ( current_tex != 1 )
{
glBindTexture( GL_TEXTURE_2D, 1 );
current_tex = 1;
}
#endif
}
int end = timeGetTime();
cout << (end - start) / 1000.0 << endl;

times with
glBindTexture( GL_TEXTURE_2D, 1 );

0.562 0.562 0.563 0.563 0.561 0.564 0.563 0.562 0.563 0.562 0.581 0.562 0.565 0.561 0.563 0.564 0.563 0.563 0.561 0.565
0.562 0.564 0.563 0.562 0.562 0.562 0.562 0.562 0.562 0.564 0.563 0.564 0.561 0.562 0.563 0.561 0.563 0.562 0.562 0.561

times with
if ( current_tex != 1 )
{
glBindTexture( GL_TEXTURE_2D, 1 );
current_tex = 1;
}

0.021 0.019 0.021 0.021 0.02 0.022 0.02 0.021 0.02 0.02 0.019 0.021 0.02 0.02 0.02 0.019 0.021 0.021 0.02 0.02
0.02 0.021 0.02 0.021 0.02 0.021 0.02 0.02 0.021 0.02 0.021 0.02 0.021 0.019 0.021 0.019 0.02 0.021 0.021 0.02
0.019 0.021 0.02 0.02 0.021 0.02 0.021 0.021 0.02 0.02 0.02 0.02 0.021 0.02 0.021 0.019 0.021 0.021 0.02 0.021
0.021 0.02 0.021 0.02 0.022 0.02 0.021 0.02 0.021 0.02 0.02 0.02 0.019 0.021 0.02 0.021 0.021 0.021 0.019 0.02
0.021 0.02 0.021 0.02 0.022 0.02 0.02 0.021 0.021 0.021 0.021 0.021 0.019 0.02 0.02 0.021 0.02 0.021 0.019 0.021

Share this post


Link to post
Share on other sites
zedzeek... your example is terribly flawed. You are only checking the overhead of that many function calls. Not the binding overhead. A function call is going to be much more expensive than a simple conditional statement. The compiler may even optimize that conditional out since it's very simple. You would need to have an effectively empty function which is called (and must make sure it's not optimized out)... and then compare the runtime to glBindTexture... then you can see what kind of overhead the body of glBindTexture really has.

Share this post


Link to post
Share on other sites
Quote:
Original post by TojiI think the best path for a game programmer is to implement one yourself regaurdless, as a double redundancy check (yours and the drivers) is certainly going to be less painful than none at all.


Hehe, anyway I think Toji's idea is right. I implemented this in my game, took like 5 minutes and now I don't have to worry about it :D

roos

Share this post


Link to post
Share on other sites
Quote:
Original post by iambile
zedzeek... your example is terribly flawed. You are only checking the overhead of that many function calls. Not the binding overhead. A function call is going to be much more expensive than a simple conditional statement. The compiler may even optimize that conditional out since it's very simple. You would need to have an effectively empty function which is called (and must make sure it's not optimized out)... and then compare the runtime to glBindTexture... then you can see what kind of overhead the body of glBindTexture really has.

yes i mentioned that in my post,
but even making the test + bind a seperate function instead of using a straight glBindTexture.(..) is magnitudes faster

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this