When you call the graphics API, your command is most of the time enqueued in a command queue, the GPU will process this command queue independently of the API call. We talk about a stall, when you enqueue a command which requires an immediate feedback, in this case the CPU has to wait until the GPU has processed all commands up to the according API call, which is more or less some kind of busy waiting (aka flushing the command queue, the CPU is not actually busy waiting, but your processing thread is paused at least).
What I don't get is how occlusion queries are different from, for example, multiple render passes. In deferred shading, you render to a G buffer, and afterwards you get the buffer back and use it instead in the lighting shaders pass. Wouldn't that cause some kind of stall just like getting back occlusion query results?
Therefore it is very important to almost never call such a feedback API function to optimise the usage of CPU and GPU. A simple OGL error checking call can already flush the command queue, requesting the result of an occlusion query will flush the command queue up to the occlusion query command at least, therefore the safest way to check it, is once the framebuffer is displayed (most likely in the next frame when the framebuffer has been swapped).
Occ queries, much like geometry shaders, sounds awesome first, but fall short of your expectation once you understand their limitations.
So I'm wondering if my first pass is a bad idea. I read that Battlefield 3 and Cryengine 3 do this kind of thing with software occlusion queries instead to avoid the CPU stall and use the results in the same frame instead of next frame. Am I effectively halfing my frame rate by doing this with hardware queries?