Voxel Rendering Engine Tips
Your basic idea sounds valid.
I want to have only enough loaded so that it all fits within lvl 1 and 2 cpu cache since it needs to be accessed frequently for rendering
Voxel data are notorishly large and trying to put everything into cpu cache isn't really necessary (if it works at all). A basic rendering approach would be static batching, that is, you create a mesh from each chunk, saves it on videocard (eg in OGL you would use vertex buffer objects) and only recreated it, once you manipulate it. This way you will often only create a mesh one time and renders it really quickly without further CPU work.
I render my foilage in my game like this and it works perfectly. If you use multithreading and buffer objects, then you can build yet unseen meshes in the background. If you utilize a LRU cache , then you don't even need some additional loading/unloading logic. Just define two LRU caches, one for loaded chunks and one for created meshes.
The sizes of cpu caches are pretty limiting. I'm thinking I may be able to reuse some of the voxel world data that is loaded somehow
Accessing the voxels in memory isn't the issue compared to the overhead you will have by calling the graphics API. You are over-optimizing the wrong end
If anyone has some ideas of how to better handle it, I'd like to hear them.
Well...you are trying to press an elephant through a cat-flap with the argument, that the room infront of the cat-flap is large enough...
Voxel batching & caching would be a better way of handling it (see above) vs directly rendering the voxels...
No, not really. Polygon based voxel renderers get really slow or too complex when you start trying to render all of them with a higher resolution than something like minecraft. Also, my main goal is to remove GPU dependency. So if I can get a complete fast CPU version working at a moderate voxel and screen resolution, with plenty of spare room for other game component processing, then my only problem I have to solve is how to best get my render buffers into video memory, which may result in forgoing API's like OpenGL and Directx completely.
This is not quite true, honestly.
Polygonbased voxel rendereres are still the fastest you can get at the moment with the graphics pipeline beeing optimized for such.
Sure, if you have tiny tiny cubes you've got a lot of stuff to render, but you can batch and optimize. Merge larger faces etc. That isnt too complex, especially if you use appropriate datastructes like an octree. In fact with an octree you got the "merged faces" as a bonus to the (in case of a sparse tree) really compact (comparably) memory usage.
If you want to get smooth you could as well roll with something like marching cubes (beeing the most known) or move to even more sophisticated algorithms like surface nets or other dual algorithms which optimize mesh topology to some extend.
With my (arguably modest) testing on this subject i found Polygonbased Voxel rendering to be a lot faster than the given alternatives like raytracing/marching etc.