Jump to content
  • Advertisement
Sign in to follow this  
Backwater

Marching Cubes and GPU-CPU communication

This topic is 3679 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

Greetings everyone, as I'm new on this forum I'm implementing MarchingCubes alike algorithm and ended up with some problems concerning efficiency, batching and communication between GPU and CPU. 1) The whole model/world I'd like to triangulate is encoded in Oct-Tree data structure. Many algorithms just assume regular grid in a single level of detail and (for example) encode Voxel configurations into 3D Texture which shaders fetch from. Octree is used because: a) I have to: project specification :) b) Large portions of volume are empty, so they can be discarded early on (main reason). c) LOD is easy to perform on different depths of octtree nodes. Unfortunately, the whole Octree is: a) to large to entirely fit into GPU memory b) even if whole geometry was stored in GPU, plethora of unnecessary data would sit in graphics memory, which would not be displayed in current frame. I'm planning algorithm as follows: - Frustrum culling on specific node level (octree depth). This would be on voxels of rather big size. - Simple visibility test. There will be many full Voxels (entirely covering volume) which I'm going use for discarding voxels behind in terms of camera. - LOD - remaining voxels will be splited into 3 groups of LOD. LOD means no more no less than fixed depth level in octal tree. There come serious amount of problems: * Where to generate triangles? On CPU (easier) or somehow pass voxels size, position and configuration index to GPU and generate on GPU * Whether to pack all potentially visible geometry in current frame into single buffer and sent it to GPU or maybe pre-generate something and store in constant buffers (kind of caching, and then the algorithm would only tell which buffers should be treated as visible). Reducing number of Draw(...) calls is esential, but on the other side to big batches are also not welcome. Tested and referenced in: developer.nvidia.com/docs/IO/8230/BatchBatchBatch.pdf Locking, accessing, modyfing such big buffer would not be good either, I think. * When triangulating, there will be many voxels on certain octree level which will contribute into triangulations, so traversing from octree root everytime in order to find voxels of depth coresponing to actual LOD is not a good idea, is it? Here again, uniform grid would be simpler in managing, but I can't be used in my project. Are there any solved, traditional solutions to this problem? "How to remove bottle-necks from octree driven Marching Cubes". What would you do entirely on CPU? What on GPU? And how would you minimize communication in each frame? Any helps, links (although, probably I digged whole google), references will be welcome. Thanks:)

Share this post


Link to post
Share on other sites
Advertisement
This is a lot of stuff to digest, and yet is somewhat vague, which I guess is why you didn't get an answer.

I tried to google "octree shader", and though I didn't look at any of the pages, the search results looked like there may be something useful there.

Regarding a caching scheme, I think that'd depend on how much data you have compared to card RAM. I don't have enough information about your data to make an informed suggestion. My typical suggestion is to design your program such that you could easily switch a caching scheme later. First make it work in the simplest way you can, then try various caching strategies.

Share this post


Link to post
Share on other sites
Odd, I'm working on something that sounds quite similar to your project, also using octrees/voxels & marching cubes.

- I'm generating my triangles on the CPU, first I generate the voxel grid and then I run MC on it. I did this because running MC on the GPU, as far as I know requires geometry shaders and this way I can conceivably post process the model and run an algorithm on it to reduce poly count. I run it on a job thread so it doesn't slow down my app. Also generating the voxel data is far slower then MC anyway(well for me since I'm using 3D noise).

-Not sure about single or multiple buffers.. currently I'm using separate VBO's for each chunk. They all have the same format and as far as I know switching between VBO's of the same format isn't so terribly slow that you must avoid it.
I reuse the buffers and just tell the GPU to discard it when writing new data, so no locking issues. I'm also using OGL not DX9 so batching isn't as much of a concern for me.

- Not sure what you mean with Q3. My problem is I need to find a method to interpolate between the different depths in the octree, haven't gotten to this part yet so can't help you.

Share this post


Link to post
Share on other sites
Sign in to follow this  

  • Advertisement
×

Important Information

By using GameDev.net, you agree to our community Guidelines, Terms of Use, and Privacy Policy.

We are the game development community.

Whether you are an indie, hobbyist, AAA developer, or just trying to learn, GameDev.net is the place for you to learn, share, and connect with the games industry. Learn more About Us or sign up!

Sign me up!