I am currently reworking my game's engine, and trying to rework the architecture further. One of the primary beliefs behind the architecture so forth is parallelism and scalability.
The basic structure has it so that there are two running asynchronous tasks - game processing, and rendering. I've already implemented and tested the linking between them, and it exhibits good latency and performance. The most the render thread can be behind is one frame due to how the game thread synchronizes data. Both threads are of course able to spawn their own dispatched tasks that are local to them.
My consideration is further splitting the rendering thread into a game rendering thread(s) and a UI thread. The game rendering thread(s) would draw to a render buffer, which the UI thread would lock and draw for game UI objects (shared by display-list sharing across contexts). My thinking is that this would allow the UI to keep working properly (menus, for instance) even if the game is being slow/lagging. The downside is that it increases the maximum frame-behind state from 1 frame to two frames (due to the extra step). Also, I am unsure how well OpenGL implementations would take to this; technically, this sort of multi-threading is safe, but I somehow doubt that it will be efficient -- I am not confident that the drivers don't just wrap each function in a common critical section.
The typical solution to this is to share a "draw command list" which is populated by both your UI and your game rendering logic. Each render tick, you draw whatever is in the command list, period. If this is just UI updates, so be it; if your game has new stuff to draw, it replaces its previous entries in the command list with the new stuff. The trick is to allow the command list to retain state across render frames: you don't have to remove all your draw commands every tick. Instead, only depopulate the old render commands when new ones should replace them.
This minimizes latency between your logic and your rendering, and ensures that all subsystems which need to draw can do so at peak throughput.
This is what I'm already doing. I am asking more precisely if I should be splitting the UI rendering into its own task, and treating it as a pseudo-asynchronous renderer, that just captures the game rendering from a renderbuffer and draws it into a UI object.
Basically, one rendering thread for graphics + UI, or two rendering threads, one for graphics and one for UI?
One thread for game logic, one thread for UI logic, one thread for rendering the output of both to the actual hardware.
I wouldn't recommend drawing the game stuff into a render target and then displaying that under the UI; full-screen-size render targets had some pitfalls last time I worked with them (although granted it's been a few years). Besides, it's needlessly complicated.
Just have a thread that does the game stuff as fast as it can, and a thread for keeping the UI responsive; when either one wants to update its render state, update the draw list. If a thread doesn't update its draw list by the time you come around for another render tick, you should just draw whatever it asked for last tick.
This way you have maximum decoupling of all three tick rates: game ticks, UI ticks, and render ticks. All three aspects will be maximally responsive.
As per the full screen buffers, they are needed for both deferred rendering (as that is done using the GBuffer anywho) and any sort of full-screen effects.
As per the draw list, if I am understand you correctly, my current methodology seems equivalent to yours in practice; I feed pointers to objects that need to be drawn using a DrawList of sorts, but also feed matrices in a vectored form (for SSE2) separately for objects that need them. Matrices are obviously updated more often than the objects themselves. This is done at a single sequence point. If no new updates are performed, the render thread uses its cached data (rudimentary velocity/rotation information) to extrapolate so it at least tries to keep the frames "moving".
But yes, your method does seem better in this form in regards to not having a full dedicated UI Render Task - simplifies it for systems that don't support multithreaded GL as well.
One thing to note, though, is that there is seemingly three way dependency here...
Game thread is authorative to both the UI Thread and the Render Thread, in terms of what is being displayed / is active. UI thread is authorative to both the Game Thread, in that it can control behavior via callback lists, and the render thread, in that it can request UI things to be rendered. The rendering thread, however, is a pure consumer.
I see some possible contention between the game and UI thread, particularly since the UI thread should not be able to direct access any game-specific functionality (it should have to rely on callback lists).
Traditionally in a three-thread setup your UI communicates to the game logic using some kind of signal or event mechanism. A common way to do this is to have a queue of inputs from the UI logic which the game checks every game tick to see what it needs to do to respond; the UI simply adds to this queue as it goes. Double buffering is a great way to make the contention issue here all but vanish; you have one list of inputs that the UI is continually appending to, and then another list that the game is reading from. When the game tick begins, you atomically swap the two buffers, so that the game is now reading from the last-updated list of inputs, and the UI proceeds on its merry way adding to the other buffer. When the game tick ends, it clears its buffer so that when the next tick starts the UI is handed a blank slate.
Bidirectional communication can be done this way as well, so that the UI can display most-recently-updated data from the game logic (hit points, object positions for ray picking, whatever else). Again since you are only doing a single pointer swap you have minimal contention between threads.
The trick to good concurrency is to minimize the cross-talk between threads. If you think in terms of relaying data between threads as they execute, you're in for a world of pain - deadlock, livelock, starvation, priority inversion, race conditions, etc. etc. Instead, think in terms of handing off chunks of data between threads at the head of each tick.
In some multithreaded setups there is an explicit synchronization step where all threads temporarily serialize in order to share their current work; this is a common idiom in job or task-based systems, for instance. It is possible to break this up so that you only have explicit synchronizations where needed, e.g. one sync between game and UI and one sync between game and render with a final sync between UI and render. This is a tradeoff between code complexity and potential concurrency headaches and raw speed. You sacrifice simple code and risk a lot of nasty bugs, but you can eke out a nontrivial fraction more performance by doing it this way.