I think conditional branching in shaders only provides a performance hit if each SIMD instance might take a unique branch. If so, you will get a SIMD 'stall'. The shared branch instances will execute in the current active working set, then the other branch set will execute, and then eventually they will 'sync' up afer all branch sets and once again grinf away efficiently as SIMD across the entire working set.
But just because you take a hit doesn't mean you can't do it. Just be aware there is a hit. Instrument and measure the hit, compare normal cases with extremes. It might not be so bad, that totally depends on the logic.
This issue becomes more front and center with openCL but it also applies to shaders.
OpenGL is perfectly happy with multiple contexts sharing the same handle space via sharelisting. (However, not all flavors of OpenGl currently support sharelisting; OpenGL ES, for example, though it might in the future, according to the folks at Khronos.)
For 'desktop' OpenGL, It is often beneficial to have 'prep' threads and a main render thread that consumes resources prepared by those prep threads. But for that to happen, the contexts in each thread must share the same handle space. Prep threads can be used to isolate disk and other latency that must be dealt with to prepare resources from the main render thread, which should only ever deal with prepared resources.
A resource pool manager(that delivers availan;e handles and accepts freed handles), plus sharelisted threads isolated by threadsafe FIFOs, is more than adequate to guarantee collision free operation without expensive locls. (The only locls required are in the low duty cycle updates to FIFO state and pool manager state; the prep threads and main render thread spend most of their duty cycle prepping and rendering, and little time changing FIFO state, which is simply a matter of updating a couple integers for head and tail.)
Headsup with sharelisting; make sure all contexts that are going to be sharelisted are requested before any of them are selected as a current opengl congtext(and this for sure, before any resource handles are allocated among the contexts that will be sharelisted. Sharelisted means 'share the same resource handle space' which is required for multithread Opengl.
Headsup with the design of the threadsafe FIFO; it must use a two-step allocate and release model, because there is finite execution time between when a handle is pulled and when it is prepped or consumed. But that is easily done. A FIFO object is basically tracking a head and a tail in a circular fashion, with some maximum FIFO size. The FIFO should provife booleans for IsFull, IsEmpty, etc.
You don't have to do any of that when you write a game. It adds complexity. But it provides performance and behavior you can't achieve in a single threaded model.
as in --------Please wait....scene loading---------...