I've just implemented a pretty nice batch renderer, but I'm struggling with uniform buffers. I have a great system which only maps a single buffer (built on a lot of experience with doing fast buffer mapping), and I place all my uniform block data in a single uniform buffer. I was assuming that this would be the fastest way to handle uniform changes, but it turns out that uniform buffers have some fatal drawbacks. For example, changing a single variable in a block forces me to allocate a new one with a minimum size of 256 bytes, which is HUGE. I can barely get over 64 bytes right now, so there's a lot of wasted space which seems to inhibit performance a lot, especially for simpler shaders with few uniforms. In almost all cases I change some kind of uniform value between draw calls, and in many cases I end up
I had this idea that I would split up uniforms into different blocks so that I only had to update the ones that change (view+projection matrices in one block, materials in one block, etc), but as it is now the winning move is to just pack everything into one block so that I don't waste that much space and reupload stuff that has changed to avoid the risk of having to update two smaller blocks with even more padding. It's getting to a point where I think it would be faster to just build a list of glUniform**() calls to do instead of bothering with uniform buffers.
Are uniform buffers just nonviable for real-life usage? Can I work around the offset alignment problem to reduce the padding? Is glUniform() simply superior in most cases and on most drivers?
EDIT: After googling a bit, I want to clarify that my buffer handling is very effective. I place all my uniform data in a single mapped buffer (persistently coherently mapped if possible, otherwise cycling unsynchronized), so there's only a single map operation done per frame. The problem is that the data uploading is simply really slow when the padding is added (can't batch upload it), and the buffers get really big. I think I'm gonna implement some hacky glUniform() calls to compare performance.
Also, this is OpenGL for PC, tested on an Nvidia card.