I skimmed through the discussion and didn't see anyone post this here but Jonathan Blow has talked a lot about his issues with C++ and actually implemented his own language. He shows why his language is more well-suited for games than C++ is. He started off with lectures and then went to demos. You can find the first lectures here (the demos are on his channel)
To give a quick summary of his points about why he doesn't like C++:
1) A language should understand that the shipping code is not the way the code will look like through development. He says that certain code modifications should be fast and seamless so that we have less busy/braindead work. A few examples of these are the fact that accessing a member of a class by value uses the "." operator while accessing via pointer uses the "->" operator. In reality, both are exactly the same but as you are writing your code and your switching between accessing a struct by value or by reference, you constantly have to modify these operators even though the logic is the same (could kill a lot of time depending on how many times you switch and how many members are part of the struct).
Another example was the process of taking code that starts to repeat itself and putting it into a lambda function and then into a proper function. The idea here is that if I have a function and I notice that I'm repeating some lines of code, then maybe its useful to have a lambda function which defines that process and then call it where I need it in that one function. I don't want to make it a proper function just yet because I'm only using it in one function so when someone comes along to analyze the code, they don't have to assume that the given function can be called from anywhere. Keeping it lambda keeps it more local. Once I actually need that functionality somewhere else, than I want to easily move my lambda function out into a proper function. In C++, this requires a ton of work since due to how lambdas are defined. And then going from lambda to a proper function requires more busy work since now the layout of everything is changed again. Jonathan's approach is like pythons where you just have nested functions that have the exact same syntax as normal functions. This makes the transformation of code from one state to another as you are in the process of writing it much easier since you only copy paste the function to some other location unlike C++ where the way I define functions and lambdas is completely different.
2) Another point he makes is since this language should be for games as a priority, it would be nice to support common things game developers due for optimization. For example, if I have a struct for a model with a pointer to the vertices and pointer to the indices, then its a lot faster to allocate one block of memory for these 2 pointers and then just say that the second pointer points to where the vertices data ends (save a allocation). In C++, I have to do a lot of array indexing or pointer arithmetic to do that but he makes a keyword which tells the language that the memory 2 pointers point to should be one block of memory. Then when we deallocate the memory for the struct, we deallocate the entire block and the language knows the size and all that. This is a really specific feature but he notices that its something he and his friends use a lot so giving support for it and making it one keyword vs like 15 lines of code that you feel unsafe about is a much better option.
3) Instead of using templates or macros for preprocessing, he argues why not just run the language code directly at preprocess time. You use a keyword that runs a given function at preprocess time, computes whatever it returns, and sticks it into the line which it is found in. Instead of playing around with ugly template syntax and messing around with a different language like macros, we just run the language we are using directly but during compilation.
4) Jonathan argues to remove header files and just have one type of file like in java. The idea here is to remove the busy work of defining a function in one file, then jumping to another file to to write its implementation. He also wants to get rid of the #pragma once keywords and the order of function definitions and just have the compiler sort out dependencies itself like in many high level languages (again, getting rid of needless busy work).
... And more.
He talks about a lot more stuff and the best part is all of these features and many more have already been implemented by him and hes constantly refining it, seeing what works and what doesn't. The stuff I talked about is mostly from his 2 lecture videos but he does a whole lot more in the demo videos and its pretty amazing stuff. In my opinion, his language is a really good alternative to C++ for gamers since it focuses on getting rid of lots of busy work and spending more time coding in easy to understand syntax so that you don't feel uneasy about what you just wrote (and it seems to succeed in its goals).
In his latest source code he doesn't have the cubemap as part of the main render loop and his pdf is the old technique which he was using.
I also don't really use a cubemap. I just identify for each screen-ray which side of the cube it would correspond. I've just drawn a link between my and bcmpinc's old algorithm in saying that I project the octree on a cubemap, what I effectively do.
which he succeeded, no perspective division at all in the code
That's impressive, but I can't seem to find where bcmpinc is mentioning this.
I'm not entirely sure why you are raycasting when attempting to mimic his algorithm
I don't try to mimic his algorithm. I just got inspiration of it for this gpu-raycaster. The benefit I see with it, is that the ray voxel intersection is much simpler and the front to back sorting of the octree is automatically given. This should theoretically be a benifit, but in practice it isn't. The question is: Why is that?
From what I understand, he basically projects the screen on to each corner and finds how far in each axis (l,r,t,b) it is from the screen bounds to the corner. He then uses this to interpolate in world space how these bounds change when he subdivides the octree to its children (this is all linear since its done in world space). At this point, he checks his quadtree if the current octree node is inside the frustum bounds of the current quadtree node. If it is, he subdivides the quadtree into 4 children and creates a sub frustum for each new quadtree node. He then calls the rendering function recursively on each of the quadtree children with each of the 4 sub frustum's and continues rendering. If he ever finds the octree node to be larger than the current frustum, he recursively splits the octree into its 8 children and calls the rendering function recursively on the octree children. Once he hits a leaf node for the quadtree, if the current octree node is inside the leafs frustum then he just sets the pixels color to the octree nodes color.
It basically becomes a method of splitting up the view frustum in world space until eventually each sub frustum is a pixel in size. At that point if a node is inside the frustum bounds, he just sets the pixel in the quadtree leaf. He doesn't have to do any division because all of this is done in worldspace (the frustum checks and what not).
As to the benefits of his algorithm, in complete software with simd and a single thread, he gets between 5-10 fps so this isn't a really fast algorithm unfortunately. On a GPU, I guess you can skip the quadtree entirely by just doing the frustum checks for each pixel in parallel but then that just amounts to raycasting and you retraverse the octree for each pixel. His algorithm does all of this hierarchally so it traverses the octree once but that makes it very linear. I haven't ever worked with GPU volume rendering so I'm not sure why your approach is slow but I did my own version of bcmpinc's renderer in software so if you know how his algorithm works, it might help identifying why your approach is slow.
I'm not sure if it will make a difference but bcmpinc changed his method part way through to ditch using the cube map and instead, simply project the screen on to the octree and do frustum checks using a quadtree hierarchal z buffer. In his latest source code he doesn't have the cubemap as part of the main render loop and his pdf is the old technique which he was using. He stated that the new method of just projecting the screen on to the octree was a lot faster than creating the cube map so it might help to see what his current method is doing unless you specifically want to use the cube map and trace rays.
Lastly, the main premise of his algorithm was to get rid of divisions in the code and to not use the concept of rays (which he succeeded, no perspective division at all in the code) so I'm not entirely sure why you are raycasting when attempting to mimic his algorithm. He has a couple of posts on getting his algorithm to the GPU that might be worth checking out if your interested.
This approach should be easily extendable (from "vertices-on-outline") to "front-facing-quads" or "front-facing-triangle-strips", etc.
But it can't be propogated down an octree. For example, the vertices that make up the outline for the root of the octree are not the same vertices that make up the outline for the children of that root. Unfortunatly, it looks like i have to recalculate the outline on each traversal until my cube is in either the top left, top right, bot left, bot right (can't be in 2 parts at the same time). Unfortunatly this is probably going to be a lot more performance intensive but ill give it an implementation.
Also, i dont want to approximate a cube with a quad that encompases in screenspace because im tryingn to minimize my traversals and using quads covers more area than the object actually takes up (creating many false positives).
Sorry for leaving for a while, had some personal stuff and work eat up all my time.