|
||||||||||||||||||
Add Forum to Favorites | Send Topic To a Friend | View Forum FAQ | Track this topic |
Last Thread Next Thread ![]() |
| Computation of Bounding Primitives on the GPU |
|
![]() swiftcoder Member since: 7/3/2003 From: Boston, MA, United States |
||||
|
|
||||
| Very interesting snippet, thanks for that. |
||||
|
||||
![]() InvalidPointer Member since: 4/4/2007 From: Chesterland, OH, United States |
||||
|
|
||||
| ^^ Seconded. Although it just seems too convoluted IMHO. In this case I think a more naive implementation makes more sense-- you just keep a list of the min/max value of each coordinate and iterate over all the vertices in the mesh, storing what you need. Hell, might even be faster in some cases. |
||||
|
||||
![]() KulSeran Member since: 12/9/2003 |
||||
|
|
||||
| I think i'm more confused as to why you'd ever do this at runtime except on things that are animated. Anything else you could just transform that AABB of the base object and then min() on only 8 verts of that transformed BB. And on an animated mesh you could just key frame the bounding box along with everything else. At that point I would think it is much faster to not have to render the object. |
||||
|
||||
![]() darkelf2k5 Member since: 9/27/2005 From: Kobe, Japan |
||||
|
|
||||
| Just a thought, but you can do this in one pass if you have 10.1 features available, you can use 2 RTs and specify MIN for one and MAX for the other. Am I right? Te point of this is to leverage the GPU. |
||||
|
||||
![]() Rompa Member since: 10/26/2003 |
||||
|
|
||||
| Don't you need to know the bounds (or an approximation) so you can place your "camera" accordingly to render it in the viewport?? Perhaps I missed something? |
||||
|
||||
![]() Matias Goldberg Member since: 7/2/2006 From: Mar del Plata, Argentina |
||||
|
|
||||
Quote: No Take a look at the shader. It tells to render all vertices at (0,0,0,1) which means all vertices will be rendered very close to the camera. However he still saves the true position and passes it to the pixel shader, and he stores the true position as a colour data to the pixel. This way the camera's position is meaningless. Setting World View and projection matrices is also meaningless. Nice article by the way |
||||
|
||||
![]() Philip Rideout Member since: 7/26/2008 From: Fort Collins, CO, United States |
||||
|
|
||||
| Thanks for the feedback everyone! InvalidPointer -- What you're proposing sounds like a CPU algorithm. Shading languages do not allow values to persist from one vertex to another because of parallel execution. KulSeran -- Right, this typically wouldn't be done at every frame of animation. More likely, it's a pre-processing step. There are many uses for finding a bounding box, and the article does not attempt to cover them. darkelf2k5 -- That's a great idea. Since 10.1 allows a unique blend mode per render target, you could write to two 1x1 RT's in only one pass, rather than writing to a single 2x1 RT in two passes. Nice! Rompa -- There's no need for a camera matrix because the vertex shader is so trivial. Note that the article covers two methods: a "depth buffer" method, and a more efficient "blending" method. The latter always sets the position to (0,0,0); the former would set the position to (0,0,?), where "?" is one of the following: +X, -X, +Y, -Y, +Z, or -Z. |
||||
|
||||
![]() ayanami0 Member since: 10/25/2005 From: Taipei, Taiwan, Province of China |
||||
|
|
||||
| Would this work with DX9 systems? |
||||
|
||||
![]() Philip Rideout Member since: 7/26/2008 From: Fort Collins, CO, United States |
||||
|
|
||||
| Yes, DX9 also supports MIN and MAX blending operations. |
||||
|
||||
![]() Ysaneya GDNet+ Member since: 1/7/2000 From: Brussels, Belgium |
||||
|
|
||||
| I think nobody will disagree that computing the bounding box on the GPU should be a lot faster (for the actual calculation) but I'm surprised you're not speaking of what is IMO the real problem: bandwidth (the time to upload your mesh to the gpu), and latency (time to read back this 1x1 buffer). Because if you take those 2 factors into account, I'm not at all convinced it is still faster than on the CPU. Then there are unknown factors, like the fact that your vertices (even if they're just points) are projected from 3D to 2D (transformed), while on the cpu version they're processed as is, but even if that's lightning fast it's still additional overhead compared to the cpu that can process the vertices directly.. Have you done some real benchmarks of cpu vs gpu ? Y. |
||||
|
||||
![]() jbarcz1 Member since: 7/9/2002 From: Marlborough, MA, United States |
||||
|
|
||||
Quote: You can do it single-pass in DX9 as well by computing min(x) in one MRT and min(-x) in the other (then negating it after readback). -min(-x) == max(x) |
||||
|
||||
![]() Isokron_ Member since: 11/17/2008 |
||||
|
|
||||
Quote: Im not sure if I would agree with this, at least not using the method proposed. The reason GPUs can be faster than CPUs are their use of massive parallelism and writing everything to a single (or two) pixels wouldn't allow the GPU to use that. In order for that to be possible the approach would probably have to be changed to have vertices written to different pixels and then merging them down to a single one in extra passes. |
||||
|
||||
![]() hikikomori-san Member since: 3/19/2007 |
||||
|
|
||||
| I don't think the actual usefullness of this approach (compared to using the CPU) is the main point of the article, as it heavily depends on the number and complexity of meshes for which bounding boxes need to be computed, the sources of these meshes, whether they're dynamic or not, the actual hardware.. etc. It discusses a few techniques and "tricks" that are very useful for GPGPU. |
||||
|
||||
![]() Philip Rideout Member since: 7/26/2008 From: Fort Collins, CO, United States |
||||
|
|
||||
| Ysaneya -- Fair enough, the article is perhaps too academic. The intention was to inspire more ideas for GPGPU. The bandwidth issue might be less important in scenarios where you need to upload the mesh anyway, for actual rendering. The latency issue should be minimal due to the fact that 1x1 is so small. jbarcz1 -- Genius! Isokron_ -- Good point, to better exploit parallelism, you need to successively render into smaller and smaller RT's, rather than jumping straight to the 1x1 target. |
||||
|
||||
All times are ET (US)![]() |
Last Thread Next Thread ![]() |
|