Though that my old algorithm, the recursive one, ...
I thought that the new one is also recursive. You have to search the next octree nodes given the octree nodes from the last BFS pass. And this search is recursive. Or does it change if one projects the octree on a viewplane instead of a cubemap?
I'm considering whether I should keep track of the bounds of each octree node or instead track the position of the octree node and use that to recompute its bounds. It makes sense to pick the later, because I have to sort on the depth of the node and thus need to keep track of that value anyway. Though that also means that I'll need to switch to floating point computations. And I'm unsure whether floats will be accurate enough.
I also guess that the second one is faster, since (global) memory access is very expensive on the gpu, especially if its not coalesced, which is true for your shader code (i think). I can remember that one access takes 500 gpu cycles to complete, though I can't find where I read this. So recomputing stuff should be faster, after all the gpu is optimized to calculate float operation as fast as possible.
Floats should be accurate enough. I ran my code with both float and int and they produced the same output. However in the end, there is only one way to find out...