In the case of a deferred engine with loads of lights, it takes a lot more to generate a lit correct cubemap than just render geo. You would need some way to get lighting information roughly consistent with the main scene.
I don't see exactly why this matters; if we assume that each pixel takes roughly the same amount of time to render whether it's being rendered as part of our cubemap or not (which is not a terrible assumption, because, as you said, for the cubemap, we're rendering roughly the same scene, just with a 360 degree angle of view), if we are rendering at 1920x1080, we're rendering 2,073,600 pixels to render the main scene. A 256*256*6 cube map only requires 393,216 pixels, which is substantially smaller by a constant factor. And we can probably save even more time if we lower the quality of the cube map rendering in other ways.
After that, the issue with having one single cubemap for the camera is that the reflections can end up being wildly off for some objects. For example, you have a bunch of moving objects, which are reflective. It's very hard to get them to reflect each other plausibly without generating a cubemap for each entity (at least the ones close to the camera). It seems like ssr would kind of work much better in those cases, though admittedly I have never implemented them myself. From seeing a collegue implement them, it takes quite a lot of tuning to make them really usable after getting the base technique working, but at this point I can't think of achievable alternatives without generating loads of cubemaps. I guess if you have voxel cone tracing up and running for lighting, then you can use that, but it's not exactly cheap to setup and run.
This is the part that doesn't actually make sense. I certainly wouldn't dispute that the cube-mapped reflection makes all of these assumptions, (not that there aren't ways to mitigate this); the problem is that screen space reflections don't actually give any additional information. On a basic level, consider this: we could always replace the screen buffer with the section of the cubemap that is visible from the screen (projected into 2D of course) and do our "screen space" reflections this way instead (admittedly with a hit to resolution). This was what I meant originally when I said "it seems that any of the information needed for screen space reflections could also be included in a cubemap."
I am arguing that, in a sense, we can actually treat screen space reflections as a "special case" of cube-mapped reflections, since the data that's present on screen is also present in our cube map. And so, I'm simply saying that any sense in which screen-space reflections actually improve on the assumptions that are problematic to cubemap reflections could, at least in theory, be incorporated into the cubemap reflection algorithm itself.
EDIT: One more thing. There are actually perfectly valid reasons that, if we only use a single camera-centered cubemap, we don't actually have to worry about interreflection (multiple reflective objects reflecting each other):
A) if we calculate the cube map for the subsequent frame after we do our main rendering (using, say, a black cubemap for the very first frame), each successive "bounce" of our reflections will lag behind by one additional frame, but otherwise not have a problem
B) because our cubemap is camera-centered, we basically never even have to worry about objects reflecting themselves (except when they should,) because, on our cubemap, objects will always be "behind themselves." Unfortunately, this isn't the case with refraction (because objects do refract light from behind themselves), but for normal, shiny reflections it's always the case.