You're pretty much right. What I'm trying to do is nicely described in this paper:
https://mediatech.aalto.fi/~ari/Publications/SIGGRAPH_2015_Remedy_Notes.pdf
So, the world is broken down in a tree structure. Each cell is connected to 1 probe, and eventually divides further into 4x4x4 (64) subcells, Advantage is that you don't need tenthousands of probes in an uniform 3D grid. Disadvantage is well, you need to traverse the tree first before you know which probe to pick for any given pixel-position.
The traversing in my case goes jumps deeper up to 3 times. So the first fetch will be a large cell, A cell has an offset and bitMask (int64), where each bit tells whether there is a deeper cell or not. Using this offset and how many bits were counted, we know where to access the next cell.
If no deeper cell was found, the same counting mechanism will tell where to fetch the actual probe data. The probe in my case is basically a 1x1 faced cubemap. Plus it tells a few more details, like which Specular Probe to use, or stuff like fog-thickness. All in all, big-data (50+ MB in my case).
Currently I use "traditional" lightmaps, but having several problems. UV mapping issues in some cases, though that will be most likely gone if they simply refer further to a probe (your 1st solution). Still, doesn't work too well for dynamic objects / particles / translucent stuff (glass).
Splatting the probes (I think that is your option3) on screenspace G-Buffers (depth/normal/position) is probably much easier. Like deferred lighting, each probe would render a cube (sized according the tree + some overlap with neighbours to cause interpolation), and apply its light-data on whatever geometry it intersects.
Downside might be the large number of cubes overlapping each other, giving potential fill-rate issues. Plus particles and such require a slight different approach. There is also a chance on light leaking (splatting probes from neihgbour rooms), though I can think we can mask that with some "Room ID" number or something.
What I did in the past is simply making an unform 3D grid - thus LOTS of probes EVERYWHERE. I injected the probes surrounding the camera into a 32x32x32 3D texture. Simple & Fast, but no G.I. and popping for distant stuff, and a lot of probes (+baking time) wasted on vacuum space. Also sensitive for light leaks in some cases.