I'm not an expert on this, so take what I say with a grain of salt, but I do believe switching constant buffers is much faster than switching shaders (which is what will happen when using multiple techniques) and that branching on a uniform isn't a big deal (see MJP's post here). So I would say stick with 1 technique and use branching.
It is not that simple.
GPU’s shade in blocks at a time, typically near 8×8 pixels. If every pixel in the block falls through the same set of branches then the branches will indeed be virtually free, but if not, only one branch can be taken at a time while the pixels going through the other branch must wait—parallelism is lost.
Typical cases in which this happens is when a translucent object has a branch for discarding below a certain threshold.
On the other hand, for most basic renders of solid objects they will all take the same path.
Still, the fastest code is code that is never executed.
Swapping shaders is more costly than updating buffers, but this can be misleading at face value. If you set the shader on every render call (even if it is the same shader set over and over) it will certainly be the slowest option, but via a simple sort on the renderables and a manual record of the last shader set, renderables using the same shaders can be grouped together and the swapping between shaders can be reduced to its bare minimum.
Using a single non-permutating Uber Shader requires more heavy updating of buffers, which approaches the overhead of a shader swap the more that has to be updated, and updating a part of a buffer that won’t even be used by the shader is something you want to avoid.
Creating multiple shaders with a reasonable amount of branching and then sorting by shader and removing redundant shader applies is the best way to reach a reasonable middle ground in which you can lean towards more shaders or more branches later as your own benchmarks properly inform you.