I'm trying to implement a stateless multithreaded renderer as described in the "Firaxis LORE" presentation about Civilization 5 renderer from GDC 2011 and their system seems to be very elegant, clean and simple to use.
Their rendering commands are self-contained and can be submitted in any order which, among other nice things, lends itself to easy parallelization. For example, a 'COMMAND_RENDER_BATCHES' command contains a list of surfaces to render, each with shader constant payload.
The recommended way to update uniform shader constants is to put them into uniform blocks (constant buffers in Direct3D parlance) according to their update frequency (to minimize memory transfers). Besides, uniform blocks can be shared between different programs (conserves memory).
Let's say, I have several global uniform buffers (e.g. PerFrame, PerView, PerInstance and PerLight), which can be updated independent of any shader programs (no need to bind a program). But shader dependencies on global stuff ruin the whole 'statelessness' idea - I can no longer sort the draw calls, because I need to preserve the original update-set-draw order for correct rendering!
Should I abandon the idea of using global uniform buffers and resort to the old OpenGL 2.* way of setting all shader uniforms on each batch submission (which is said to be most inefficient) ?