vertex and pixel shader binding costs are similar?
Yes, in that changing either costs.
The GPU is not a single program state machine; when work is dispatched it will be a bundle of commands sent together which will likely contain both vertex and pixel shaders to use to dispatch the work - there is a reason that DX12/Vulkan encode a lot of stuff in to a single structure after all.
On the driver side changing state is initially cheap; the cost comes when the draw is kicked and the driver has to figure out wtf has changed via hashing etc and then build the commands to send. Chances are changing both has the same cost, there or there abouts, of changing one in that case.
Systems I've worked on bundle vertex and pixel shaders together, using hashing/names to reduce the API object count at run time (for example, if two materials use the same vertex shader then only one is created, but both reference it) - however a material is considered a discreet thing so if you have one object which uses Foo and another which uses Bar then even if they share a shader they are different. Now, the graphics API itself (talking DX9/DX11 here) will maintain a copy of the state so that rebinding of shaders doesn't happen (so if the material shaders any shaders you won't redundantly rebind) - but a material itself is a combination of shaders, textures and constants arranged in to passes etc as previously mentioned.
and conditional branching in uber shaders can be less costly than two smaller single effect shaders?
Maybe.
It depends somewhat on the hardware, somewhat on the driver and somewhat on how it is used; if you have zero to little divergent flow (some of your pixels go one way, some the other) then it can be a win.
However the flip side of this is that Ubershaders have to assume you will take both paths, which brings up the daemon of 'register pressure' - a GPU only has so many registers it can allocate to running threads; the more your shaders use the less wavefronts/warps the hardware can keep in flight and the worse your performance can be.
Branching causes a problem because lets say the GPU has 40 registers to keep work in flight (real hardware has many more but run with it) - if your shader takes 4 registers to execute then we can run 10 instances at once. If it takes 5 then we are down to 8, 6 leads to 6 and so on (always round down). Now, lets say that your shader has a branch with two paths to it - one requires 4 registers, the other requires 3 - the compiler will produce code which requires 7 registers to run which are statically allocated by the GPU when execution begins for that shader - now you have at most 5 instances running (40/7 = 5.7, and we round down) for any draw calls using that shader. If, however, you had two shaders then you would get 10 and 13 instances depending on which shader you took.
So while you might write an ubershader, it can often be much better to compile two version of it and select at runtime to get maximal throughput on the device.