There is no comprehensible reason why it is so excessively slow other than the reason given by mhagain: the specification makes no promises about speed.
Typically, the implementation will have around 150-200 formats available, so in order to take 2-3 seconds, it would have to spend an entire 10-15 milliseconds matching a single format. For comparing a dozen or so values against a list of capabilities, that is... a... lot. Even if you add half a millisecond on top to sort the list of 10-12 formats that match the minimum required specs.
My suspicion is that it's slow not only because the task is complicated, but also because it loads DLLs and fires up the shader compiler, etc. At least on my system, the second call is significantly faster than the first. Also, merely enumerating (without making the driver choose one) formats is slow already. On my system, the driver always chooses the first hit, too. Of course it may look different on a different system.
You can make it slightly faster if you use GetPixelFormat instead of ChoosePixelFormat to enumerate them one by one, look at the caps, and stop as soon as you've found one that you're happy with. It's still quite slow, however.
Caching the format's integer identifier (would have to verify at the next run though, in case hardware or driver gets updated) is something I haven't tried, but this might actually work. Other than that, I'm afraid you'll simply have to live with the fact that it takes a moment to come up.