Custom view matrices reduces FPS phenomenally

Started by
16 comments, last by JohnnyCode 10 years ago

So I've been upgrading my engine to use custom view matrices instead of the OpenGL gl_ModelView and gl_Projection which are deprecated in newer versions.

Now, as I'm using Shadow mapping, Skeletal Animation, and various other shader techniques, I've split my matrices into:

modelMatrix

viewMatrix

modelViewMatrix

projectionMatrix

modelViewProjectionMatrix

So each time I translate() an object, or manipulate any of these matrices, all 5 are uploaded to my Uniform Buffer Object on the GPU.

And my FPS has dropped from 1200 to 170, this is unacceptable for me considering all I've done is change the matrices behind the scene. Nothing has changed in the engine itself.

Can someone tell me what has caused the drop in performance? I'm guessing it's something along the lines of:

- My matrix operations in Java are slow

- Uploading 5 matrices regularly is using up my bandwidth?

Advertisement

- My matrix operations in Java are slow

- Uploading 5 matrices regularly is using up my bandwidth?

Probably neither. Unless you've got a really really bad matrix library, or an absolutely huge amount of matrices to upload, both of which are extreme and unlikely scenarios, you'll need to look elsewhere.

That UBO update - that's what I'd point my finger at. There are threads about UBO performance and how slow they are to update, as well as the hoops you need to jump through in order to make them fast again.

Before we go any further this is worth testing and fortunately it's an extremely simple and minimally-intrustive test. Just convert to standalone uniforms from a UBO. See if things improve. If they do then we've established that yes, it's the UBO that's causing your performance problems.

Direct3D has need of instancing, but we do not. We have plenty of glVertexAttrib calls.

So each time I translate() an object, or manipulate any of these matrices, all 5 are uploaded to my Uniform Buffer Object on the GPU.


You can't be doing this, its too expensive.

If you modify a model matrix, don't update your projection matrix just for shits and giggles, that's inefficient and a huge performance drop. Imagine how many times per second you are doing that!

Instead, figure out the offset of each matrix into the buffer and store those indices then whenever you NEED to do an operation, map the buffer and modify the matrix based on one of those indices.

Also, instead of working out the model view proj matrix on the CPU, do the multiplication in your shader. GPUs are far better at matrix multiplication in almost any situation.
Instead of looking at it as an FPS drop, look at it as an increase of 5ms of work per frame. Put some timers in your code and try and find where this work has been added.

Upload them only once just before rendering - though the driver will probably already do this for you behind the scenes.

The more important point you might want to consider is how you measure your performance. If you get >1000fps with those various effects, your scene is probably too small to test on. If your rendering is not the bottleneck, then your memory lanes are. So you could probably throw a way more complex scene at the program and it'd run at the same speed. Another point is that measuring with fps can be deluding - a drop from 1200fps to 170fps is not that massive, the render times went from 1ms to 6ms - you should maybe measure which parts take how much time, opengl has time query objects for this.

Also, instead of working out the model view proj matrix on the CPU, do the multiplication in your shader. GPUs are far better at matrix multiplication in almost any situation.

...but the CPU typically only has to do this particular multiplication once, whereas the GPU will need to do it per vertex. Yes, the GPU is faster, but tens of thousands of times per frame versus once? It's not that much faster.

Direct3D has need of instancing, but we do not. We have plenty of glVertexAttrib calls.


If you modify a model matrix, don't update your projection matrix just for shits and giggles, that's inefficient and a huge performance drop. Imagine how many times per second you are doing that!
This.

You don't update the uniforms each time you modify a matrix.

First you do all your computations (rotations, translations, scaling, model view projection, whatever) for all your objects. Then, when you're about to draw the mesh, you update the uniforms for that object.

"I AM ZE EMPRAH OPENGL 3.3 THE CORE, I DEMAND FROM THEE ZE SHADERZ AND MATRIXEZ"

My journals: dustArtemis ECS framework and Making a Terrain Generator

Upload them only once just before rendering - though the driver will probably already do this for you behind the scenes.

The more important point you might want to consider is how you measure your performance. If you get >1000fps with those various effects, your scene is probably too small to test on. If your rendering is not the bottleneck, then your memory lanes are. So you could probably throw a way more complex scene at the program and it'd run at the same speed. Another point is that measuring with fps can be deluding - a drop from 1200fps to 170fps is not that massive, the render times went from 1ms to 6ms - you should maybe measure which parts take how much time, opengl has time query objects for this.

I'd counter-argue that going from 1ms to 6ms is extremely significant, particularly if all other factors are equal between the two tests. You've just blown one-third of your frametime budget on ... nothing. Yes, that's significant.

Now, if it was going from - say - 8ms to 13 ms, you'd have a point, particularly if there was a nice new effect, higher LOD, or whatever to look at in return for it. Blowing one-third of your frametime budget just on account of using a different way of doing the same thing? Nope, you don't have a point, sorry.

Direct3D has need of instancing, but we do not. We have plenty of glVertexAttrib calls.

whereas the GPU will need to do it per vertex.


If your graphics drivers are any good, it will only do the multiplication once. If your driver can't perform this optimization, switch gpu vendor.

whereas the GPU will need to do it per vertex.

If your graphics drivers are any good, it will only do the multiplication once. If your driver can't perform this optimization, switch gpu vendor.
I'd love to see proof of this. In my experience, if you ask the GPU to perform operations on uniforms per vertex/pixel, then the GPU will do so. The only "preshaders" that I've seen that are reliable are ones that modify your shader code ahead of time and generate x86 routines for patching your uniforms...
Anyway, even if this does work on 1 vendor, you're just playing into the hands of their marketing department by deliberately writing bad code that's going to (rightfully) run slow for two thirds of your users, and act as a marketing tool for one vendor :(

This topic is closed to new replies.

Advertisement