That doesn't sound crazy at all. Let's see if I get it straight:
It works a bit the same as ("old") Deferred Lighting, where light volumes (spheres/cones/...) were rendered into the scene, only affecting geometry it intersects. I'll use a "roughness" G-Buffer produced earlier to "blur" properly (taking multiple samples and/or picking lower mipmaps from the probe).
In the alpha channel we could store the weight (based on distance between pixel & probe centre and its radius; "attenuation"). It may happen 2 or even more probes overlap the same pixels, so we use additive blending to sum up both colors & weights. Finally we grab the produced "Reflection/GI texture", normalize it, and add it to the rest of the scene. Since I'm also using Screen Reflections (I can involve that here as well. Pixels that can make good use of realtime reflections, should use no or a lower weight for pre-baked reflections.
You know what, that sounds a whole lot easier than the tiled approach I had in mind. Only downside is that I'll have to resample the G-Buffer for each probe. Then again there won't be that many (overlapping) usually. And I guess its still a good idea to use a cubeMap array (or bindless textures) so we don't have to render probes one-by-one, switching cubeMap textures in between. But then I could first render the low quality (small-res cubemaps) array, then a high-quality array for example.