In theory UBOs should be faster than glUniform calls, for all the reasons given in the previous post. They also give the advantage that uniforms can be shared by different program objects.
In practice UBOs can be considerably slower.
This reduction in speed is nothing to do with use of uniforms in buffers, nothing to do with number of function calls.
It's everything to do with OpenGL's buffer object API and how you use it.
With buffer objects you can't just treat them the same as a block of system memory that you can grab a pointer to and write to, read from, as required without suffering serious performance overhead. You have to manage them carefully and you have to know what kind of performance characteristics you can expect.
I'm going to assume that you have 1000 objects and that you don't have persistent buffer mapping. There are a number of ways you can manage this.
(1) Each object can have it's own UBO. To draw an object you update the UBO, bind it, then draw. You have 1000 UBO binds and 1000 UBO updates per frame. This is going to run slow.
(2) You have a single small UBO that all objects share. It's bound once during startup. To draw an object you update the UBO, then draw. You have a single UBO bind at startup, but 1000 UBO updates per frame. This is going to run slow.
(3) You have a single large UBO sized for 1000 objects. At the start of each frame you make a pass through your objects. You update the data they're going to use and copy it off to a system memory buffer. Then you make a single glBufferSubData call. Each object stores an object id from which you can reconstruct the offset it's data is at in the UBO. To draw an object you make a glBindBufferRange call, then draw. You have 1000 glBindBufferRange calls but one UBO update per frame. This is going to run fast.
The conclusion is that using UBOs involves some re-architecting. You can't just take a bunch of code using standalone uniforms, port it over to UBOs without changing anything else, and expect to get the same performance from it. You need to think about your updates, group them all together, and that doesn't mean doing 1000 updates at the same time, that means doing one update that covers all 1000 objects.
How can I so confidently lay the blame at the API here? Because you can do (1) and (2) in D3D, and with all other things being equal they run fast (with (2) being faster than (1) owing to D3D's explicit "discard" semantics). UBOs in GL, however, do not run fast under those circumstances.
Edited by mhagain, 27 April 2014 - 01:28 PM.