In his latest source code he doesn't have the cubemap as part of the main render loop and his pdf is the old technique which he was using.
I also don't really use a cubemap. I just identify for each screen-ray which side of the cube it would correspond. I've just drawn a link between my and bcmpinc's old algorithm in saying that I project the octree on a cubemap, what I effectively do.
which he succeeded, no perspective division at all in the code
That's impressive, but I can't seem to find where bcmpinc is mentioning this.
I'm not entirely sure why you are raycasting when attempting to mimic his algorithm
I don't try to mimic his algorithm. I just got inspiration of it for this gpu-raycaster. The benefit I see with it, is that the ray voxel intersection is much simpler and the front to back sorting of the octree is automatically given. This should theoretically be a benifit, but in practice it isn't. The question is: Why is that?
I had the same questions as to how his code works and you can view our exchange in the comment section in the following link (https://bcmpinc.wordpress.com/2013/11/03/a-sorting-based-rendering-method/). My user name was D.V.D and I ask him in more detail exactly how he does his rendering.
From what I understand, he basically projects the screen on to each corner and finds how far in each axis (l,r,t,b) it is from the screen bounds to the corner. He then uses this to interpolate in world space how these bounds change when he subdivides the octree to its children (this is all linear since its done in world space). At this point, he checks his quadtree if the current octree node is inside the frustum bounds of the current quadtree node. If it is, he subdivides the quadtree into 4 children and creates a sub frustum for each new quadtree node. He then calls the rendering function recursively on each of the quadtree children with each of the 4 sub frustum's and continues rendering. If he ever finds the octree node to be larger than the current frustum, he recursively splits the octree into its 8 children and calls the rendering function recursively on the octree children. Once he hits a leaf node for the quadtree, if the current octree node is inside the leafs frustum then he just sets the pixels color to the octree nodes color.
It basically becomes a method of splitting up the view frustum in world space until eventually each sub frustum is a pixel in size. At that point if a node is inside the frustum bounds, he just sets the pixel in the quadtree leaf. He doesn't have to do any division because all of this is done in worldspace (the frustum checks and what not).
As to the benefits of his algorithm, in complete software with simd and a single thread, he gets between 5-10 fps so this isn't a really fast algorithm unfortunately. On a GPU, I guess you can skip the quadtree entirely by just doing the frustum checks for each pixel in parallel but then that just amounts to raycasting and you retraverse the octree for each pixel. His algorithm does all of this hierarchally so it traverses the octree once but that makes it very linear. I haven't ever worked with GPU volume rendering so I'm not sure why your approach is slow but I did my own version of bcmpinc's renderer in software so if you know how his algorithm works, it might help identifying why your approach is slow.