I'm treating the GPU as a big train which must not be stopped for a small amount of passengers. Option 1 helps to reduce the amount of shader variants, which helps for batching ( And can always be divided into sub-shaders later on ).
...
I remember OpenGL GDC lecture about zero driver over head, swapping shader was very expensive.
There's CPU expense and GPU expense. Different games will have a different workload balance. Ideally, you would be as balanced as possible, having a similar frame-time on both processors.
e.g.
- If your game takes 30ms per frame on the GPU, but only 5ms per frame on the CPU, then CPU expense is irrelevant to you -- if you can save 0.5ms of GPU time by spending 8ms of CPU time, then it makes sense for that situation.
- If it's reversed, and your game takes 5ms per frame on the GPU, but 30ms per frame on the CPU, then GPU expense is irrelevant to you -- you should optimize in a way that reduces CPU time at all costs, even if that means increasing the GPU workload.
Switching shaders is primarily a CPU cost, but allows you to save GPU time by reducing per-pixel waste.
The exception is if you switch shaders too often (e.g. with batches that only draw 10 pixels each), then switching shaders becomes a massive GPU cost as well... so you've got to be sensible.
FWIW, I just loaded up my test scene in D3D11 and profiled a frame -- it has 646 draw calls using 45 shaders -- 14 draws per shader switch on average. The CPU cost of all my D3D11 function calls is ~300?s (0.3ms!)... Sure, I could use 5 shaders instead of 45... but why bother when doing so is going to be optimizing a section of the code that's already only taking 0.3ms? :)
For comparison, my GPU takes 5.3ms to actually execute these commands that the CPU has prepared, which is how it should be :D
In my situation, I can afford to waste lots of CPU time if it means reducing the GPU frametime. However, I cannot afford to waste any GPU time whatsoever.
Note that the situation is a bit different for D3D9/OpenGL, as they've got much larger CPU overheads than D3D11.
The situation is different again for D3D12/Vulkan, as they have extremely small CPU overheads when compared to D3D11/GL.