volumetric flood filling

Started by
6 comments, last by rouncer 12 years, 3 months ago
Ive got this voxel csg algorythm, and it works volumetricly, that is so easy to write, thats why I did it that way. You just simply switch on and off bits due to volume takeup. The problem is I store my world using a surface representation of a simple point list every 256^3 volume in a 3d grid.

I got around this by flood filling the surface points to fill in the volume for the csg to act on it, and it works - the only problem is because of the cpu taking so long, especially since I have to traverse from the cpu to video ram after its done per chunk, im waiting 10 seconds for a smallish operation to finish.


Anyway, So I then got back to theorizing some method that could possibly work on the video card, and I imagine some kind of mip by mip approach could be the answer, but the solution eludes me... anyone could possibly help me out at all?



thanks for reading.
Advertisement
If it's taking 10 seconds to do that then I'm guessing you are using a really inefficient algorithm. I've done a realtime light-propagation for voxels (26-directions), it could handle 20x20x20 fills well within a single frame on a single core (and rebuilding and uploading the mesh) after tuning it.

Something that you may have fallen for is to put all adjacent voxels onto the queue when the current voxel is hollow... rather than first check if each adjacent voxel is hollow and then put it in the queue. Depending on how you traverse, this can considerably shorten the number of iterations. Also, make sure that your voxel checks are inlined and fast.


well the fill itself takes 1.5 seconds per chunk, all added together everything it comes to 10 seconds (which would be more than one chunk) so you think you could fill a 256x256x256 chunk in less than 1.5 seconds? I mean the closer it gets to 10 milliseconds the more id be happy...
You could do it completly on the GPU.
256^3 could be placed on a single 4096x4096 texture (16x16 patches of 256x256 texel). You need two of these textures and switch them as render source/target between render frames. The shader will check per texel all 26 neighbor texel(when diagonal filling is allowed) or 6 neighbor texel and decides if the target texel should be filled. [s]The worst case should be 512 render frames, which should be no problem on a current hardware.[/s]

Edit: the worst case could be higher....

well the fill itself takes 1.5 seconds per chunk, all added together everything it comes to 10 seconds (which would be more than one chunk) so you think you could fill a 256x256x256 chunk in less than 1.5 seconds? I mean the closer it gets to 10 milliseconds the more id be happy...


Well, filling a 256x256x256 chunk it's hard to say, I mean, we are talking 16.7m voxels... all I have to go on is the performance of my minecraft-like lighting engine, I can create spherical holes 50x50x50 and propagate light into them in a single frame, if even that, I can easily do it 10 times per second and the FPS drops from ~500 to somewhere around ~350... and like I mentioned, that also includes mesh rebuild and upload. Some basic math on those rather sketchy numbers would indicate that I could repeat it 1000+ times a second if I avoid rendering. And since mine is more than 5x5x5 times (125+) smaller than yours, logic would indicate that it indeed it is possible to bring it down to somewhere around 0.2-0.3s if you flood fill something really complicated spanning the entire volume. But those numbers may also be way off, I can't really benchmark it in any meaningful way.

And light-propagation is a tad bit more complicated as I actually propagate light values and not just a single bit... and more importantly, I also do this in all 26 directions (!) which I know for a fact significantly slows it down, even though I've written some really tricky code to speed it up. But it depends on what you are interfacing with, if it's some OO class linked from a DLL with virtual methods and such, there might not be all that much you can do unless you can are able to have direct access into memory.

But really, if you want to speed it up, you need direct access and some serious fine-tuning, even switching an if-else-statement around can significantly increase performance. And I should also add that, my code is also significantly slowed down because the world is divded into blocks and not a single continous chunk of data... meaning, there is "significant" overhead in determining boundaries of blocks and computing block local coordinates for indexing.


You could do it completly on the GPU.
256^3 could be placed on a single 4096x4096 texture (16x16 patches of 256x256 texel). You need two of these textures and switch them as render source/target between render frames. The shader will check per texel all 26 neighbor texel(when diagonal filling is allowed) or 6 neighbor texel and decides if the target texel should be filled. [s]The worst case should be 512 render frames, which should be no problem on a current hardware.[/s]

Edit: the worst case could be higher....


Indeed, worst case would be a single tile wide "snake" going back and forth through the entire cube, that would not be pretty... but then again not really something that would occur either.


Thanks for the suggestions, Ive been away thinking about other things, but im back to this program now and I still need a solution to this.

Actually, that snaking tunnel isnt a problem for me, cause I only need to either fill space or solid, so I can make a filling element behind every single solid voxel, so the snake fill would happen in a single draw call.

I was thinking, (although its a little tricky to program) I could maybe take the first mip, and develop a filled mip from that at half the resolution, then develop a filled mip from that at half the resolution... and I could get it done in less batches, if you could get your head around that idea.
Ok, I got it from 10 seconds to about 1 second, i sorta do it with brute volume passes on the gpu, thats not all it is tho, if thats all it was it would probably go just as slow if not slower than the cpu version, i actually speed it up by filling a bit, then converting to a lower mip, then filling some more, then converting lower again... its got some bugs and its probably not the quickest it can be, but i guess its resolved for now, at least I can get on with the rest of the program now.

To proove it, heres a shot of a 256x256x256 swiss cheese cube, note the program actually works with infinite volumes, not just 256^3 smile.png.

speedcsg.png
slightly cooler shot, with bigger plot, the floor was a single plot, in about 8 seconds.
yes2q.png

This topic is closed to new replies.

Advertisement