Most GPU's are really just SIMD machines. What this means for you, is that the GPU will effectively execute the same instruction for multiple "lanes" simultaneously (usually referred to as "threads"), with each lane having different data. So if you do write code that does "y = x + z", the GPU will execute an add instruction for multiple lanes simultaneously, where x and z might have different values.
This makes branching complicated. Take a simple branch like this:
if(x > 1.0f) x = x * 10.0f;
On a GPU, multiple nearby SIMD lanes will be executing these instructions with different values of x. So for some lanes x might be greater than 1.0, and for some it might not. Since all of the lanes execute the same instructions, the GPU can't just multiply the value by 10.0. Instead what it does is it will evaluate the condition for all lanes, and use that to set a per-lane mask. This mask then controls whether any executed instructions actually have any effect. So the GPU will execute the multiply instruction, but it won't actually do anything unless the mask bit is set for that lane. This ends up giving you the same result as if you actually excecuted the branching behavior for each lane individually. The only time the GPU doesn't have to mess with this masking business is if you all lanes take the same path, resulting in a mask of all 0's or all 1's. In this case if the mask was all 0's, then the GPU could actually skip the multiply instruction.
So now we get to sampling and derivatives. In order to pick which mip level to use, GPU's look at nearby pixels in a 2x2 quad and calculate the difference in texture coordinates. This difference gives you the partial screen-space derivatives. If the difference is very small relative to the texture size, then the GPU can use a higher-resolution mip level. If the difference is large, it uses a low-resolution mip level. In practice, this "looking at nearby pixels" is done by packing these nearby pixels in nearby SIMD lanes, and executing special instructions that allow one lane to get the value of something for another lane. Most of the time this is perfectly okay, since the GPU always executes the same instructions for all lanes. Where it goes awry, is with the conditional masking that I explained above. If one lane is masked off but another is masked off, the masked off lane won't be able to give it's neighbor the value of its texture coordinate. Without that value it can't compute the partial derivatives, and so it can't sample with mipmaps.
The typical workaround is to compute the derivatives manually outside of the branch, and then pass them into a version of sample() that takes explicit derivatives. Or if you don't need automatic mipmap selection, you can use a version of sample() that lets you pick the mip level yourself.