Home » Community » Forums » » Computation of Bounding Primitives on the GPU
  Intel sponsors gamedev.net search:   
[Control Panel] [Register] [Bookmarks] [Who's Online] [Active Topics] [Stats] [FAQ] [Search]

Add Forum to Favorites |  Send Topic To a Friend | View Forum FAQ | Track this topic


 Last Thread Next Thread 
 Computation of Bounding Primitives on the GPU
Post Reply 
Very interesting snippet, thanks for that.

 User Rating: 1743   |  Rate This User  Send Private MessageView Profile Report this Post to a Moderator | Link

^^ Seconded. Although it just seems too convoluted IMHO. In this case I think a more naive implementation makes more sense-- you just keep a list of the min/max value of each coordinate and iterate over all the vertices in the mesh, storing what you need. Hell, might even be faster in some cases.

 User Rating: 1162   |  Rate This User  Send Private MessageView Profile Report this Post to a Moderator | Link

I think i'm more confused as to why you'd ever do this at runtime except on things that are animated. Anything else you could just transform that AABB of the base object and then min() on only 8 verts of that transformed BB. And on an animated mesh you could just key frame the bounding box along with everything else. At that point I would think it is much faster to not have to render the object.


 User Rating: 1565   |  Rate This User  Send Private MessageView Profile Report this Post to a Moderator | Link

Just a thought, but you can do this in one pass if you have 10.1 features available, you can use 2 RTs and specify MIN for one and MAX for the other. Am I right?

Te point of this is to leverage the GPU.

 User Rating: 1110   |  Rate This User  Send Private MessageView Profile Report this Post to a Moderator | Link

Don't you need to know the bounds (or an approximation) so you can place your "camera" accordingly to render it in the viewport?? Perhaps I missed something?

 User Rating: 1124   |  Rate This User  Send Private MessageView Profile Report this Post to a Moderator | Link

Quote:
Original post by Rompa
Don't you need to know the bounds (or an approximation) so you can place your "camera" accordingly to render it in the viewport?? Perhaps I missed something?

No
Take a look at the shader.
It tells to render all vertices at (0,0,0,1) which means all vertices will be rendered very close to the camera.
However he still saves the true position and passes it to the pixel shader, and he stores the true position as a colour data to the pixel.
This way the camera's position is meaningless. Setting World View and projection matrices is also meaningless.

Nice article by the way

 User Rating: 1327   |  Rate This User  Send Private MessageView Profile Report this Post to a Moderator | Link

Thanks for the feedback everyone!

InvalidPointer -- What you're proposing sounds like a CPU algorithm. Shading languages do not allow values to persist from one vertex to another because of parallel execution.

KulSeran -- Right, this typically wouldn't be done at every frame of animation. More likely, it's a pre-processing step. There are many uses for finding a bounding box, and the article does not attempt to cover them.

darkelf2k5 -- That's a great idea. Since 10.1 allows a unique blend mode per render target, you could write to two 1x1 RT's in only one pass, rather than writing to a single 2x1 RT in two passes. Nice!

Rompa -- There's no need for a camera matrix because the vertex shader is so trivial. Note that the article covers two methods: a "depth buffer" method, and a more efficient "blending" method. The latter always sets the position to (0,0,0); the former would set the position to (0,0,?), where "?" is one of the following: +X, -X, +Y, -Y, +Z, or -Z.

 User Rating: 1015   |  Rate This User  Send Private MessageView Profile Report this Post to a Moderator | Link

Would this work with DX9 systems?

 User Rating: 1015   |  Rate This User  Send Private MessageView Profile Report this Post to a Moderator | Link

Yes, DX9 also supports MIN and MAX blending operations.

 User Rating: 1015   |  Rate This User  Send Private MessageView Profile Report this Post to a Moderator | Link

I think nobody will disagree that computing the bounding box on the GPU should be a lot faster (for the actual calculation) but I'm surprised you're not speaking of what is IMO the real problem: bandwidth (the time to upload your mesh to the gpu), and latency (time to read back this 1x1 buffer). Because if you take those 2 factors into account, I'm not at all convinced it is still faster than on the CPU. Then there are unknown factors, like the fact that your vertices (even if they're just points) are projected from 3D to 2D (transformed), while on the cpu version they're processed as is, but even if that's lightning fast it's still additional overhead compared to the cpu that can process the vertices directly..

Have you done some real benchmarks of cpu vs gpu ?

Y.

 User Rating: 1731   |  Rate This User  Send Private MessageView ProfileView Journal Report this Post to a Moderator | Link

Quote:
Original post by Philip Rideout
Thanks for the feedback everyone!

darkelf2k5 -- That's a great idea. Since 10.1 allows a unique blend mode per render target, you could write to two 1x1 RT's in only one pass, rather than writing to a single 2x1 RT in two passes. Nice!


You can do it single-pass in DX9 as well by computing min(x) in one MRT and min(-x) in the other (then negating it after readback). -min(-x) == max(x)


 User Rating: 1110   |  Rate This User  Send Private MessageView Profile Report this Post to a Moderator | Link

Quote:
Original post by Ysaneya
I think nobody will disagree that computing the bounding box on the GPU should be a lot faster (for the actual calculation)

Y.


Im not sure if I would agree with this, at least not using the method proposed. The reason GPUs can be faster than CPUs are their use of massive parallelism and writing everything to a single (or two) pixels wouldn't allow the GPU to use that. In order for that to be possible the approach would probably have to be changed to have vertices written to different pixels and then merging them down to a single one in extra passes.

 User Rating: 1015   |  Rate This User  Send Private MessageView Profile Report this Post to a Moderator | Link

I don't think the actual usefullness of this approach (compared to using the CPU) is the main point of the article, as it heavily depends on the number and complexity of meshes for which bounding boxes need to be computed, the sources of these meshes, whether they're dynamic or not, the actual hardware.. etc. It discusses a few techniques and "tricks" that are very useful for GPGPU.

 User Rating: 1031   |  Rate This User  Send Private MessageView Profile Report this Post to a Moderator | Link

Ysaneya -- Fair enough, the article is perhaps too academic. The intention was to inspire more ideas for GPGPU. The bandwidth issue might be less important in scenarios where you need to upload the mesh anyway, for actual rendering. The latency issue should be minimal due to the fact that 1x1 is so small.

jbarcz1 -- Genius!

Isokron_ -- Good point, to better exploit parallelism, you need to successively render into smaller and smaller RT's, rather than jumping straight to the 1x1 target.


 User Rating: 1015   |  Rate This User  Send Private MessageView Profile Report this Post to a Moderator | Link

All times are ET (US)

Post Reply
 Last Thread Next Thread 
Forum Rules:
You may not post new threads
You may post replies
You may not edit your posts
You may not use HTML in your posts
Jump To:
Administrative Options: