I honestly would just build the shaders using #ifdefs, as it's the safest bet. You don't have to do alot of extra work just to see if it works. I combine shaders by implementing my own #include preprocessor statement, and since I do this before sending the shader to OpenGL I can be sure it will the fastest. I also use #ifdefs liberally mostly from configuration settings. So if a setting has changed, I'd have to recompile the shader, but it's trivial.
I also have to wonder if this is something that could get better in the future. The GPU is lots of weaker cores running in parallell, so with that in mind the fastest solution will always be the one that can run sequentially on many cores in parallell. Imagining that the driver makers have abstracted this functionality into shader --> subshader, subshader, all it would do is introduce another (albeit small) step when binding a shader. It's what I would do anyways, as it sounds like a ridicolous feature.