From a theoretical point of view, it does not really matter in which space, or coordinate system, you evaluate the rendering equation. Without introducing artificial spaces, you can chose among object, world, camera and light spaces. Since the number of objects and lights in typical scenes is way larger than the number of cameras, object and light spaces can be skipped. If you selected object space, all the lights (of which the number of lights is dynamic) need to be transformed to object space in the pixel shader. If you selected light space, all objects need to be transformed to each light space in the pixel shader. Both cases, clearly waste precious resources on "useless" transformations on a per fragment level.
So world and camera space remain. Starting with a single camera in the scene, camera space seemed the most natural choice. Lights can be transformed just once in advance. Objects can be transformed inside the vertex shader (on a per vertex level). Furthermore, positions in camera space are always equal to the inverse lighting direction used in BRDFs. So no offsetting calculations need to be performed, since the camera will always be located at the origin in its own space.
Given that you can use multiple cameras, each having its own viewport, the process repeats itself, but now for a different camera space. The data used in the shaders must be updated accordingly. For example: the object-to-camera transformation matrix should reflect the right camera. This implies many map/unmap invocations for object data. If however, lighting is performed in world instead of camera space, I could just allocate a constant buffer per object and update all the object data once per frame and bind it multiple times in case of multiple passes. Finally given my current and possible future voxelization problems, world space seems more efficient than camera space. It is possible to use a hybrid of camera and world space, but this will involve many "useless" transforms back and forth, so I rather stick to one space.
Given all this, I wonder if camera space is still appealing? Even for LOD purposes, length(p_view) and length(p_world - eye_world) are pretty much equivalent with regard to performance.