Drastic performance loss

Started by
6 comments, last by fread 9 years, 10 months ago

When using immediate mode drawing I get roughly 500 fps (60 rendered frames per second so roughly 9 frames of logic per rendered frame), when changing absolutely nothing other than changing to vertex array and glDrawArrays, I drop down to roughly 120fps.

http://i.imgur.com/ToaW9Ur.png (linked because it's a pretty big image, also "frames displayed last second" is a mistake, it's not the displayed frames any more -used to be- it's the number of logic loops, actual rendered fps is a solid 60)

The reason I tried vertex arrays was because I thought it was supposed to be faster than immediate mode (single operation stream instead of multiple). Am I doing something drastically wrong?

Would vertex buffer objects be faster? How would I implement that in the current code?

Note: I don't show any other code because they are otherwise identical, and I made sure of that.

Advertisement

are you doing an optimized build ?

You are creating new arrays on the stack on each function call (create them outside the function and pass them by reference)

It is also a good idea to move the clientstate calls outside the function, enabling and disabling things isn't free and you seem to be doing so for each tile you draw (even though all tiles seem to have vertex array and texture_coord array enabled)

if you are drawing rectangles only you can use a fixed square vertex array and only re-bind the texture coordinate array for each tile to further speed things up. (you can change the shape and size of the rectangle by scaling it)

[size="1"]I don't suffer from insanity, I'm enjoying every minute of it.
The voices in my head may not be real, but they have some good ideas!

Ah yes, taking out the Enable/disable, should have thought of that, however it made little if no difference at all.

And the builds are exactly the same, same compiler settings, same code except for that draw code.

The array thing can't be avoided to an extent, the tiles are called from the same texture but with differing sub-bitmap positions, and are not pre-set, so regardless that array will have to be set to differing values every tile, unless I set hundreds of bytes of data aside... Which I'd rather not do when immediate mode is so much faster, at around 300fps when I put the array creation into it for no reason at all other than to see how much it would bog it down. And the draw code is generic, not just for drawing tilemaps, any further optimization to that regard set-multi-array grids, would make it less useful for freely positioned sprites.

Seeing as how immediate mode is so much faster and making this work with optimized array creation etc. would take more work than it's worth I think it would be pointless.

So is there a better way of doing it? A more OpenGL ES way? FBOs (I'm too tired to look into that right now)

I was writing a little rendering engine for multiple platforms. One of the target devices was a broadcom digital set top box.

In testing I was getting 200fps on my very old MacBook, 210fps on my Samsung S3, then I tested it on the broadcom box and got 20fps.

This confused me a lot as on paper it should be more powerful than the S3.

After a few iterations I tried using VBO's

MacBook 215fps (faster, but not by much)

S3 235fps (faster, but not by much)

Broadcom 325fps (WTF!!!!!)

You are working on top of other peoples code. Your graphics drivers can be your friend or your enemy.

If you have the performance you need on the platform you are targeting, LEAVE IT ALONE smile.png

Spend your coding time on the bits of the code that aren't working yet, you can always come back and change something later if you find a need

The reason I tried vertex arrays was because I thought it was supposed to be faster than immediate mode (single operation stream instead of multiple). Am I doing something drastically wrong?

Vertex arrays and buffers work well for small (and a quad is very small in this meaning) vertex count per object if and only if objects are batched. When not batching, many small draw calls are executed, so the cost-benefit ratio is high and the overhead per draw call is killing your performance. The "immediate" in "immediate mode" is nowadays just a relict. In fact, the driver does batching internally, and you never really know when rendering actually happens.

The array thing can't be avoided to an extent, the tiles are called from the same texture but with differing sub-bitmap positions, and are not pre-set, so regardless that array will have to be set to differing values every tile, unless I set hundreds of bytes of data aside...

The usual way is to allocate arrays big enough to hold a couple of, say, some hundreds sprites, invoke a rendering routine that actually batches by writing the quads to the arrays, and cause an OpenGL level draw call only if either the arrays are full, the scene is finished, or alternative settings (texture, blending, shader, …) enforce a draw call.

… Which I'd rather not do when immediate mode is so much faster, at around 300fps when I put the array creation into it for no reason at all other than to see how much it would bog it down.

The reason to get rid of immediate mode, as well as the matrix stack and some other stuff, if possible is because it is deprecated.

For drawing a single quad at a time, the way you are, I'd say that vertex arrays versus VBOs versus immediate mode shouldn't exhibit any major performance difference. It's surprising to see that you lost 6 milliseconds per frame from this change (even with the per-quad enable/disable removed), but we can put that down to hitting what looks like a suboptimal driver path.

Since you potentially need to change state for each quad, batching and sorting may become expensive and it looks as though your immediate mode results are close to the best you're going to get.

The truth is that switching to vertex arrays or VBOs is not some kind of magic button that will suddenly make all programs go faster. If you have truly dynamic data then VBOs may very well even be slower unless you manage updates carefully. If your bottlenecks are elsewhere (e.g fragment shading or ROP) then you're going to see no improvement whatsoever.

For your current drawing requirements immediate mode is just fine. I second Stainless's recommendation above - the code is doing what you want and you have decent enough performance, so leave it alone.

Deprecation is only going to be an issue if you ever want to move to GL3.x+ core contexts, but I can see that you're also using the matrix stack, which is more deprecated functionality, so you've quite a bit of work above and beyond just moving away from immediate mode before you can achieve that (and you may not even want to, which is a perfectly reasonable decision for certain classes of program).

Direct3D has need of instancing, but we do not. We have plenty of glVertexAttrib calls.


Deprecation is only going to be an issue if you ever want to move to GL3.x+ core contexts

Or if you ever want to deploy to WebGL or an OpenGL ES device (i.e. Android, iOS or most embedded devices), as those don't support immediate mode at all.

Tristam MacDonald. Ex-BigTech Software Engineer. Future farmer. [https://trist.am]

I guess that integer image coordinates are not supported by GPU and they are convertex to floats on the way

This topic is closed to new replies.

Advertisement