How fast is hardware-accelerated ray-tracing these days?

Started by
30 comments, last by Hodgman 8 years, 1 month ago
raytracing games have been made for quite a while. it strongly depends what advantages you want to use from tracing over rasterization.
nowadays it nearly became common to use "screenspace reflections" instead of envmaps and reflection rendering. This is achieved by tracing in screenspace.
tracing large scale ambient occlusion in voxel space worked for me on a GTX460 (if I recall correctly) http://twitpic.com/8iohd5 in 720p.

at half framerate, you could probably raytrace most of nowadays games at the same visual fidelity. but from the other perspective, is there something that would be visually only doable with raytracing that is appealing enough to reject any other alternative that fakes it?
Advertisement

While that's common opinion in graphics community, i disagree.

Say you have 10000 dynamic objects: Prebuild a tree per object, and at runtime build a tree from only 10000 nodes, that's <1 ms on GPU.

Research projects rebuild the entire tree every frame, thus they often show similar time for building and tracing.

For simple linear transformations that is pretty straight forward, but what do you do about more complex animations? You'd have to perform the vertex animation, apply it to the mesh, then take the mesh triangles and build an oct-tree/BVH heirarchy/binary-tree/sort into grid cells/whatever? In all the papers I've read this part is very slow even when performed in parallel. It also undermines your raytracing time complexity, doing an O(n) prepass before an O(log(n)) final trace still yields an O(n) algorithm (granted there's more to algorithms than just big O time complexity...).

Course if you have any papers, evidence, or experience to prove me wrong (and don't take my tone wrong, I'm not trying to be argumentative, I really enjoy reading the latest up-to-date papers and seeing videos of this stuff) I would LOVE to read/see them :) I love raytracing and I have a few ideas of my own I want to try when I get some time.

You'd have to perform the vertex animation, apply it to the mesh, then take the mesh triangles and build an oct-tree/BVH heirarchy/binary-tree/sort into grid cells/whatever?


Like a character? Precompute a static BVH for the character in T-Pose. At runtime keep the tree structure but update the bounding boxes.
The animated tree might not be as good as a completely rebuild tree but still pretty good.
If you have 100 Characters you only need to build a tree for those 100 root nodes.
I'm using it for a realtime GI solution i'm working on for many years now.
Don't know any papers but i'm sure i'm not the inventor of this simple idea smile.png

You'd have to perform the vertex animation, apply it to the mesh, then take the mesh triangles and build an oct-tree/BVH heirarchy/binary-tree/sort into grid cells/whatever?


Like a character? Precompute a static BVH for the character in T-Pose. At runtime keep the tree structure but update the bounding boxes.
The animated tree might not be as good as a completely rebuild tree but still pretty good.
If you have 100 Characters you only need to build a tree for those 100 root nodes.
I'm using it for a realtime GI solution i'm working on for many years now.
Don't know any papers but i'm sure i'm not the inventor of this simple idea smile.png

Interesting... You don't find there are too many triangles in your leaf BVH nodes?

You'd have to perform the vertex animation, apply it to the mesh, then take the mesh triangles and build an oct-tree/BVH heirarchy/binary-tree/sort into grid cells/whatever?


Like a character? Precompute a static BVH for the character in T-Pose. At runtime keep the tree structure but update the bounding boxes.
The animated tree might not be as good as a completely rebuild tree but still pretty good.
If you have 100 Characters you only need to build a tree for those 100 root nodes.
I'm using it for a realtime GI solution i'm working on for many years now.
Don't know any papers but i'm sure i'm not the inventor of this simple idea smile.png

This is called BVH refitting and is commonly used for animated scenes. Optionally, you can rebuild the entire tree every N frames or if there is a big change in the scene to maintain decent ray tracing performance under deformations.

Most of the compute in rendering global illumination is spent on indirect light. It's pretty easy to write a real-time ray caster that just handles direct light/shadows. When you add GI the number of ray casts required goes up by a factor of at least 10-100, plus they are incoherent rays which are less able to be accelerated using ray packet traversal.

Also, it is not accurate to state that the cost for ray tracing is O(logn) vs. O(n) for rasterization. Ray tracing is O(logn) per pixel, but you have >1,000,000 pixels, especially with antialiasing/supersampling. That comparison is only correct if your framebuffer is 1x1 pixels or if every triangle completely fills the viewport.

another approach for animated stuff is to create a tree for every keyframe of the animation and blend the trees just like you'd blend bones. that saves you from re-fitting and re-construction. as long as you don't deal with something like cltohs simulation, it works pretty well.

[edit]I'm talking about something more complex like OBB hierarchies, that have some expensive fitting per key-frame. you cannot alter the hierarchy, of course[/quote]

You'd have to perform the vertex animation, apply it to the mesh, then take the mesh triangles and build an oct-tree/BVH heirarchy/binary-tree/sort into grid cells/whatever?


Like a character? Precompute a static BVH for the character in T-Pose. At runtime keep the tree structure but update the bounding boxes.
The animated tree might not be as good as a completely rebuild tree but still pretty good.
If you have 100 Characters you only need to build a tree for those 100 root nodes.
I'm using it for a realtime GI solution i'm working on for many years now.
Don't know any papers but i'm sure i'm not the inventor of this simple idea smile.png

This is called BVH refitting and is commonly used for animated scenes. Optionally, you can rebuild the entire tree every N frames or if there is a big change in the scene to maintain decent ray tracing performance under deformations.

Most of the compute in rendering global illumination is spent on indirect light. It's pretty easy to write a real-time ray caster that just handles direct light/shadows. When you add GI the number of ray casts required goes up by a factor of at least 10-100, plus they are incoherent rays which are less able to be accelerated using ray packet traversal.

Also, it is not accurate to state that the cost for ray tracing is O(logn) vs. O(n) for rasterization. Ray tracing is O(logn) per pixel, but you have >1,000,000 pixels, especially with antialiasing/supersampling. That comparison is only correct if your framebuffer is 1x1 pixels or if every triangle completely fills the viewport.

For any given frame your pixel count is constant. Its a very large constant, but it is a constant. Like I said big O isn't everything...

I don't see animation as a huge bottleneck at all, the upcoming Dreams managed a rough version in realtime on the PS4 just fine. Besides, offline can rebuild acceleration structures without getting bottlenecked by that, what offline gets bottlenecked by is simply brute force, eg indirect tracing and raymarching, pretty much the same thing realtime stuff is going to get bottlenecked by. Raymarching requires a ton of samples, but is used for volumetric stuff today in like, quarter res buffers to amortize the cost.

The worse part is indirect lighting where you have incoherent rays. Your wave fronts are going to end up incoherent and useless, and your going to get killed by latency when you end up chasing pointers all around your non uniform acceleration structure, either that or end up with too many samples and taking up too much ram with a uniform acceleration structure. Still, there isn't really a different way to do GI well other than tracing. Oh you can hack it if you have largely pre-computed stuff, but if you want realtime it seems to be a no go. It's been a dream to get realtime GI into something that doesn't involve tracing for years now, but every solution (and there's been tons of them) has ended up with far too many tradeoffs after creating far too complex of a system to be particularly useful. It's why both Crytek and Epic have just gone "screw it, we'll do tracing as cleverly as we can and brute force it for what's left" and so far it actually works! (Though it's still quite expensive).

Try to give a example for that dilemma, say we wanna calc indirect lighting:

Rasterize the Scene for primary rays.
For each pixel in framebuffer create 100 secondary rays in random directions.
Divide the screen to smaller squares, and for each square group rays with similar start position and direction.

For each group make a wavefrront and do the tracing -> good chance each ray reads the same memory for a while because they follow a similar path, so data coherent.

Downside: Even if we manage to fill every wavefront completely some rays will terminate early, so the GPU has not enough work.
We could try another approach where we build a work queue. If a thread terminates it can get a new ray from the queue.
But this adds another expensive resource to global memory and kills data coherency.

On CPU it's similar - we need to group only 4 or 8 rays for simd, but the rest is similar.

Memory bandwith will bite us on either side.

So, raytracing is faster nowadays than yesterdays, but it is not fast enough for a general solution.
The reason are not technical limitations, it's simply a slow algorithm by definition and that may never change.
It may appear simple, elegant and attractive, but under the hood it does all the things we don't want: Heavy branching and unordered memory access.

We simple need to combine ALL available technics and accept growing complexity to move on.

So exactly how fast is ray-tracing on modern GPUs? Would making a game in it be viable? Or are we just not to that point yet?

In theory rasterization is O(n) where n is number of triangles, when you assume that rasterization of single triangle is O(1). And basicaly during rasterization fragments are processed in parallel so this assumption is not far from reality. This is how rasterization based libraries work like OpenGL/DirectX. In practice every game does top level optimizations to make it possible to render huge amount of geometry. There are several basic optimizations: LOD, Frustum culling, Occlusion culling, ... which helps a lot.
In Ray tracing works totally different way but main problem for using it in games is acceleretion structures and dynamic geometry. You have some kind of acceleration structure on your geometry and if something changes frame to frame you have to update this structure which makes ray traced games so impossible. Currentlly fastest builder is reffiting, LBVH, TrBVH those are linear and highest quality builder is SBVH. So game engines have to use them based on type of geometry. Currently brigade showed that they can render hd image with 30 fps and rebuild BVH with 150k triangles per frame on two GTX titan as I remember. If you are really interested you should reed J.bikker thesis "Path tracing in Real-Time games".

This topic is closed to new replies.

Advertisement