Sign in to follow this  

Predictive Ray Tracing?

Recommended Posts

This is a subject I've been thinking about for a few weeks now. From my experience, the reason why traditional ray tracing is so slow is because it takes a lot of time to determine the intersection of a ray with geometry (either for the purpose of rendering or lighting). You have a broad phase trace, with takes time to iterate through the nodes of the particular structure, and then the narrow phase trace, which takes time to intersect against individual polygons. Usually you have a trade-off between the number of nodes and the amount of geometry in each node, with a preference for more nodes since the broad phase trace can be optimized depending on the structure (octrees, BSP trees, kd-trees, etc.). But on top of all that, you’re doing these traces potentially hundreds of thousands of times per frame. So it follows intuitively (for me at least) that if you want to optimize the ray tracing process, you target either the broad or narrow phase trace, or you cut down on the number of rays cast per frame. It's hard to optimize the actual tracing though, since there are only a limited number of ways you can trace through nodes in a fixed data structure, or intersect a ray with polygons or possibly implicit surfaces. But what about cutting down on the number or rays cast per frame? To reduce the number of rays you cast per frame, I've been playing around with the notion of "predicting" what geometry a ray will hit without actually casting a ray, using "query" rays. Typically in a lot of graphics scenes (especially indoor), the image-space is dominated by only a handful of large surfaces - walls, ceiling(s), floor(s), etc. - each taking up thousands of pixels, and a much smaller number of surfaces only taking up a few dozen pixels. Why is it that we should be casting those thousands of rays when they all hit the same surface? If we could only cast a single ray, and use information from the trace and what surface it hit to predict all the other pixels that would also trace to this surface, then we would’t need to actually trace the rays and we could just use interpolating and rasterization techniques to render those other pixels. The biggest hurdle is this one-time prediction cost per surface. For this idea to be feasible, not only does it need to be robust, but it also has to be less than the cost of actually casting the predicted rays. The highest cost is going to be the prediction, which determines which pixels correspond to a given surface after an initial "query" cast. In other words, if you have a ray casting cost of CR (per pixel), a prediction cost of CP (per contiguous surface), and an interpolation/rasterization cost of CI (per pixel), then sum(CR + CP + (Ni - 1)(CI)) < sum(Ni(CR)), both sides summed over all screen-space surfaces, with each surface having Ni screen-space pixels. The interpolation cost CI should be much less than the ray casting cost CR, since it's simply a set of linear equations and doesn't have to iterate any data structures or determine intersections. The prediction cost CP could be quite high, but I'm hoping that it could be amortized over the decrease in cost we achieve from using interpolation rather than real casting on the rest of the surface pixels, with some net decrease in cost left over that contributes to raw performance. I've had a few ideas how this could be implemented, using BSP trees in particular. I haven't had a chance to flesh out all the details, but it involves coalescing adjacent polygons with equivalent texture properties into single, large, convex surfaces. Then you pre-process portal information between leaves. As the "query" ray is being traced, you back-project portals into the image-plane and determine the intersecting pixel set. Once you hit a surface, you know that all the pixels in this intersecting set will also hit that surface, so you've effectively predicted them. My biggest concern is that the portal projection will be too restrictive, in the case that the ray passes through a very small empty leaf with a very small portal that causes too much of a restriction on the intersection, which increases the number of surfaces on the left side of the equation. I'm working on a few ways around that. As for the other luxuries of ray tracing (reflection, refraction, transparency, lighting), I have a few ideas on how you could accomplish this predictively but it will probably add another two or three more paragraphs to the post. I was wondering if anyone has had any experience with any kind of predictive ideology with regards to ray tracing, either through research or application, or knows of any studies that have been or are being done in this area? I did a Google search but didn't come up with anything quite like what I was looking for. My ideas are largely theoretical, but this is an area that perks my curiosity [smile]

Share this post

Link to post
Share on other sites
This sounds to me like a typical adaptive subsampling scheme, with a somewhat non-traditional adaptation heuristic. I think it might work, but at sacrifices of quality and accuracy. As usual, the tradeoff in raytracing systems is speed vs. high-frequency detail/correctness; if your scene involves a lot of tiny, discrete objects (like, say, the inside of a blimp filled with flying ping pong balls) then you'll end up missing a lot of detail inadvertently compared to a naive reference render.

Of course, you're really actually only attacking a very specific problem: first-hit intersection testing. In reality, first-hit intersection tests are only a bottleneck for extremely boring raytracers. For any nontrivial render (i.e. anything involving GI) the first-hit costs approach insignificance. The notion of using accelerated scanline conversion to achieve first-hit intersection approximation has been done, many years ago. In fact, scanline conversion sounds like a slightly different approach to the same problem you're addressing, except it works much better - it is deterministically guaranteed to converge to a correct first-hit solution. So really this isn't the right problem to attack at all IMO.

The real issue with interesting renders is arbitrary-hit tests. These come in many flavors, but the big ones are recursive rays (from reflections or refractions) and illumination simulation traces (photons in photon mapping, samples in path tracing, et. al.). Neither case will ever benefit from coherence-bound approaches like the one you describe - since the traces are, by nature, from (and to) arbitrary locations in the worldspace, you cannot exploit any statistical trend to make them more efficient. With first-hit rays you have the benefit that everything must hit the viewplane and the eye point (for certain camera types at least, but you get the idea in general) which means you can use coherence-bound techniques like scanline conversion to achieve good results.

My personal feeling is that the solution will not come from cutting away work in the tracing process, but from doing the same amount of work in a smarter manner. Just as scanline conversion engines didn't get from Wolfenstein 3D to Half Life 2 by cutting back on work, I don't think we'll go from slow raytracing to realtime, practically applicable raytracing-based rendering methods by trying to find ways to fudge on the work we do. (Actually, I don't think the issue is nearly so black and white once you get into areas like GI, but that's a totally separate discussion.)

If I had to guess, I'd say the breakthrough will come from exploiting massive parallelism. Rasterization has an interesting algorithmic property: it is highly state-bound and coherence-bound. Given a static "camera" (i.e. set of transformations from polygons into screen space) and some simple differentiations, you can do each incremental step of scanline conversion very easily. That's what made it highly suitable for hardware acceleration.

Raytracing does not have coherence, at least not for arbitrary rays (which are, arguably, the thing that makes raytracing interesting in the first place). However, it does have a much more useful algorithmic property, and that is that it is remarkably well-suited to parallelization. You can check for intersections in many nodes of a bounding volume hierarchy simultaneously, for instance, and reduce the cost of traversing the hierarchy to near-constant time given appropriate representations. You can check all of the geometry in a volume of space for ray intersections independently, which means all of it can be done in parallel; instead of your intersection-test cost growing as the sum of the cost of each individual primitive's test time, the cost stays capped at the cost of the most expensive test (assuming you parallelize sufficiently) or, in a more realistic case, at a small logarithm of the original cost.

Parallelism also works in other areas; you can light and texture a ray in parallel. Individual components of various computations can be done in parallel - 3 coordinates of a 3-space vector, 3 colors of an RGB triplet, and so on. Raytracing is parallelizable across many scales.

In my opinion, that trait is where the potential breakthrough for more performant ray-based rendering lies.

Share this post

Link to post
Share on other sites
My idea was more of an academic curiosity, than it was a suggestion for how we could improve ray tracing performance in general. Obviously the future of computing is massive parallelism and ray tracing can take advantage of that directly. I'm in completely agreement with you there.

In light of what you said, the idea I posted sounds like a a kind of scanline conversion algorithm, only I didn't really restrict it to scanlines. However it wouldn't generate approximate solutions, it would generate the correct intersection just by interpolating a linear surface. Detail wouldn't be missed, however the responsibility of determining where the ping pong balls appeared in front of the blimp (to use your example) would be left up to the prediction phase I mentioned, and the efficiency of that is somewhat dependent on the particular data structure used. If a portion of a surface goes behind a ping pong ball, then don't count those pixels as being predicted. As the number of surfaces increases ("predicted regions" perhaps?), the performance drops, so a blimp which has many curved surfaces occupied by round objects would be a worst-case scenario. But I was considering static scenes you typically find in games, which have a lot of large flat surfaces. They also can have a lot of small flat surfaces for outdoor scenes or terrain, which might exhibit worse performance than naive tracing, but I haven't actually implemented anything to know for sure.

I also wasn't considering using this method for GI, or any other stochastic-based process, because obviously you wouldn't be able to "predict" anything. However direct illumination might be possible. Also, from my understanding, reflection and refraction are not necessarily arbitrary. They depend on surface normal, angle of incidence, and index of refraction, and while I haven't sat down and done the math I don't see why you couldn't define a mapping from one surface to another surface (one that's being reflected or refracted onto) that allowed interpolation of both surfaces independently and combined the results.

So you must be wondering why I even thought of this idea at all. Like I said before, academic curiosity and trying to think out of box, and from my experience the number of rays you cast is a huge bottleneck. So in lieu of an increased throughput solution such as parallelism, I was trying to come up with ways in which we could "fudge" the work as you put it, by transferring it from a brute force process to more efficient or knowledgeable data structures.

I learned a lot from your post as well, so thank you [smile] I've admittedly only been on the ray tracing scene for about a year, so bear with me.

Share this post

Link to post
Share on other sites
I think such an approach may be possible, but only as a method of rejecting collision tests that cannot possibly succeed. For example, in the same way that a kd-tree helps us reject ray-object tests against primitives not in cells the ray passes through, I think there is a possibility for such prediction to help reject other tests.

I don't see this working as a means to predict with any real precision. The problem is, if you can predict with 100% precision, you can just render. If the 100%-accurate prediction is faster than a ray trace, why trace rays in the first place? Just use the prediction. Chances are the prediction model would converge to something that is, for all intents and purposes, essentially a modified scanline rasterizer. So in that sense I think the real benefit to be had from a predictive model is in complementing spatial subdivision methods to help minimize the number of ray-primitive checks, as opposed to eliminating them at some phase in the rendering process (first-hit, most notably).

The reason that recursive rays are "arbitrary" in this sense is that they do not have strong coherence properties. Consider a scene showing the interior of a room, Cornell Box style, where one wall is a perfect mirror. If you plot out all of the rays that are traced through the scene, you'll see two main groups: the first-hit "eye" rays, which are all coherent in that they come from a single point and move in a trivially predictable direction (inside a frustum); and you'll also see the recursive rays bouncing off the mirror. Those reflection-tracing rays are not nearly as coherent. Each ray comes from a different origin point along the mirror's surface, and if the mirror has some groovy curves in it (say it's a funhouse mirror), then the reflected direction is not trivially predictable - it doesn't follow a narrow frustum like eye rays do. Worse, even if I can define some kind of shape that contains the reflected rays, I cannot accurately predict which rays will be picked in sequence, so I can't exploit that knowledge for optimization at all.

Suppose that we use the prediction model in this scene, assuming a funhouse mirror on one wall. Prediction from the camera/eye will be fast, because I can reuse the same prediction data for each eye ray that goes out through the view plane. However, when I go to trace the reflected rays from the mirror, I no longer have that luxury; I must recompute the prediction data for every ray. That amounts to recomputing the predictions for each pixel that shows the mirror wall, which for sake of argument we'll assume is a fairly large portion of the scene. The net result is that, if the prediction computation is not virtually instantaneous, I have a net performance loss by doing prediction for arbitrary reflected rays. And the prediction will most certainly not be cheap enough to escape that problem.

It's effective for eye rays because the prediction cost is amortized across all rays traced through the view plane. Such amortization is not available for arbitrary rays, and therefore the cost of prediction is almost guaranteed to be prohibitively high.

An existing (and highly efficient) "prediction" mechanism that operates on this test-rejection principle is the bounding volume hierarchy. For simple clumps of primitives, you create an imaginary box or sphere that contains the primitives entirely. It is cheaper to check that a ray doesn't hit the bounding volume than it is to check if the ray hits any one of the contained primitives. The first thing you do is build a group of these bounding volumes by comparing the cost of a bounding-volume check with the cost of checking each primitive in the volume. Once all primitives are bounded (or are found to be not worth bounding), you recursively apply the algorithm, treating your bounding volumes as primitives. You repeat this until you hit a point of diminishing returns.

The BVH model is known to be roughly equivalent to kd-trees in the general case; for certain classes of scenes, one will of course outperform the other, largely dependent on implementation. But in general BVH is very effective, and lends itself very naturally to parallelization, since it is both recursive and free of computational dependencies (i.e. intersection tests can be done at any two points in the BVH without needing data from other areas).

Now, BVH, like prediction, is often more expensive than naive tracing in extremely simple scenes. But as soon as complexity rises, any test-rejection mechanism will start to pay off, with increasing gains the more geometry is added to a scene. Past a certain point, using fast-reject methods will pay off for first-hit tests; that point is often fairly low. Past another point, you'll see a payoff for arbitrary rays. That point is much higher, but definitely exists, and would actually be fairly "low" when considering the amount of geometry likely to be in any given interesting scene.

So, really, what you're looking for is a way to know when not to cast rays, as opposed to knowing when a given ray is more likely to hit a given surface. (The difference is subtle but critical - think about it for a bit if it isn't clear.) In that field, spatial subdivision - particularly BVH - is already a very well-understood solution. If you're interested in pursuing an alternative, where do you see the benefits coming from? What do you envision being the source of improved performance in an alternate prediction model as opposed to BVH? How can you demonstrate that the optimal performance of your new method is a genuine gain over existing methods?

Share this post

Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this