• Create Account

## Any options for affordable ray tracing?

Old topic!

Guest, the last post of this topic is over 60 days old and at this point you may not reply in this topic. If you wish to continue this conversation start a new topic.

31 replies to this topic

Posted 13 February 2014 - 12:26 AM

So as someone who has dealt with real time graphics instead of prerendered for his entire career, when I look at this..

...I feel like it's still a distant dream. Everything I'm learning is about how mirrors are too hard and how Portal's whole system was simplified to work with multiple cameras/physics in the game. That just doesn't seem right to me. There must be some way, especially with all these new consoles rolling in, to handle bending light, right?

My ideal would be something like these:

http://devrajniwas.files.wordpress.com/2013/12/crystal_wine_glasses_tif.gif

http://www.homewetbar.com/images/prod/w-Crystal-Glass-Set-133769.jpg

Reflections and curved/bent light, that's really all I'm looking for. Now, that might be asking for a whole new Google/Facebook (or some equally absurd and impossible amount of work), but I really think there's a workaround I just haven't heard of yet.

If all I want is these two things, do I have any options? Maybe even ones with a camera that moves?

I'd love to have a world the player can navigate where light bends in these beautiful ways. Are there any options at all for that right now?

### #2Hodgman  Moderators

Posted 13 February 2014 - 12:39 AM

POPULAR

Raytracing is getting closer to being feasible.

https://developer.nvidia.com/optix-interactive-examples

If you just have a small scene of a few glasses, it's probably already feasible.

### #3ddn3  Members

Posted 13 February 2014 - 01:48 AM

Have you seen

http://arauna2.nhtv.nl/

There are a few realtime raytracers out there. Some CPU based some GPU based demos from the chipmakers.. Intel was pushing it a few years back for their defunct Larabee chips..

Posted 13 February 2014 - 11:52 AM

Hodgman: fantastic! What wonderful examples - I'm really keen on that Nvidia one.

ddn3: Windows only! I was unable to open the exe, but working with Unity sounds fantastic.

Seeing Arauna's list though: Do I need any special features to have the light curve in the glass? I have no experience with that, just a lot of curiosity.

### #5Ingenu  Members

Posted 13 February 2014 - 12:32 PM

You also have those cards :

https://store.caustic.com/

-* So many things to do, so little time to spend. *-

### #6ddn3  Members

Posted 13 February 2014 - 01:19 PM

Hodgman: fantastic! What wonderful examples - I'm really keen on that Nvidia one.

ddn3: Windows only! I was unable to open the exe, but working with Unity sounds fantastic.

Seeing Arauna's list though: Do I need any special features to have the light curve in the glass? I have no experience with that, just a lot of curiosity.

Yep Arauna is a full path tracer which will reproduce the proper caustics of glass. That is the fun thing about raytracers, no tricks required to reproduce these complex light - object interactions and reflections. There was an older version of Arauna, see if its still out there.

Good Luck!

### #7kalle_h  Members

Posted 15 February 2014 - 02:26 PM

Great video and articles, thanks kalle_h! Any idea if he plans to make the code public?

### #9/ fir   Members

Posted 15 February 2014 - 03:57 PM

this brigade 3 demo is something, i never yet had seen some thing like this (heat in the city)

though there appears one wuestion there, this large noise -

(i understand it is because of randomization of sending performance limited amount of environment light rays) couldnt

it be averaged or something.. how much hardware yet needs to grow in speed 5x 10x

is there maybe some chance that hardvare companies will turn to support raytracing in consumer electronics, imo this is good way to go, this is the future,

### #10agleed  Members

Posted 16 February 2014 - 12:25 PM

Proper real time ray-tracing, which by the way is just an umbrella term for more specific techniques, like bidirectional path tracing, are still at least 15-20 years away from becoming generalized solutions fast enough to finally replace rasterization entirely, simply due to the lack of performance.  But I'm completely sure that we'll see hybrid solutions, with rasterization based engines using ray tracing for secondary effects this gen. At least on PC, it's basically a certain thing in my opinion, on next gen consoles I'm not so sure.

I think the first obvious thing we're going to have within the next couple of years are proper single bounce specular reflections for dynamic geometry & light sources, without having to rely on pre baked stuff with lots of artist fiddling involved like localized cube maps. We're already seeing techniques based on this. For example, the voxel-based cone tracing demo that made the rounds about a year or two ago is essentially a ray tracer, that marches through a voxel volume and the farther a ray travels, the lower resolution the mip-level that you sample your data from is, which kind of mimics the behavior of a "cone", hence cone tracing instead of ray tracing.

Actually I think I read a paper on something like this a couple of weeks ago, but I'm not sure if I'm dreaming that up right now.

Either way, one thing's clear: 30 years from now nobody will be using rasterization anymore and graphics programming will be simpler in many aspects for it.

### #11/ fir   Members

Posted 16 February 2014 - 12:47 PM

maybe faster, i think its a matter of doing hardvare acceleration for it and thats all (probably)

### #12kalle_h  Members

Posted 16 February 2014 - 06:34 PM

maybe faster, i think its a matter of doing hardvare acceleration for it and thats all (probably)

If there would be any easy solutions like that why those are not already out there?

### #13/ fir   Members

Posted 17 February 2014 - 01:23 AM

maybe faster, i think its a matter of doing hardvare acceleration for it and thats all (probably)

If there would be any easy solutions like that why those are not already out there?

dont know, they are maybe not "so easy" because heavy hardware must be done and it is not easy maybe for small company, and big ones maybe are not interested maybe they prefer to sold what they have as long as they got it (really idont know in details )

### #14Hodgman  Moderators

Posted 17 February 2014 - 04:28 AM

POPULAR

ray-tracing [is] still at least 15-20 years away from becoming generalized solutions fast enough to finally replace rasterization entirely,
Either way, one thing's clear: 30 years from now nobody will be using rasterization anymore and graphics programming will be simpler in many aspects for it.

I'm sure there were people saying the same thing 10, 15, 20 & 30 years ago

If real-time follows the same path as film VFX, we'll probably see micropolygon renderers / Reyes-style rendering in real-time sooner than full-blown ray-tracing.

The problem with ray-tracing, which keeps it from being "just a few more years away" is that it has terrible memory access patterns. Rasterization on the other hand can be very predictable with it's memory access patterns. This is a big deal because memory is slow, and the main methods for speeding it up rely on coherent access patterns. Many of the tricks employed by real-time ray-tracers are focussed on improving the coherency of rays, or other techniques to reduce random memory accesses.

This problem never gets better.
Say that your program is slow because you can't deliver data from the RAM to the processor fast enough -- the solution: build faster RAM!
Ok, you wait a few years until our RAM 2x is now faster, however, in the same period of time our processors have gotten another 10x faster!
Now, after waiting those few years, the problem is now actually much worse than it was before. The amount of data that you can deliver to the CPU per operation, or bytes/FLOP has actually decreased.

Every year, this performance figure gets lower and lower (blue line divided by red line is bytes per op):

For people to completely ditch rasterisation and fully switch to ray-tracing methods, we need some huge breakthroughs in efficiency, in efficient scene traversal schemes with predictable memory access patterns for groups of rays.

Film guys can get away with this terrible efficiency because they aren't constrained by the size of their computers, the heat output, the cost to run, or the latency from input to image... If you need 100 computers using 100KW of power, inside a climate controlled data-center with 3 hours of latency, so be it!

On the other hand, the main focus of the GPU manufacturers these days seems to be in decreasing the number of joules required for any operation, especially operations involved in physically moving bytes of data from one part of the chip to another... because most consumers can't justify buying a 1KW desktop computer, and they also want 12Hr battery life from their mobile computers... so efficiency is kind of important.

But I'm completely sure that we'll see hybrid solutions, with rasterization based engines using ray tracing for secondary effects this gen. At least on PC, it's basically a certain thing in my opinion, on next gen consoles I'm not so sure.
You're right about this though. The appearance of SSAO in prev-gen was the beginning of a huge "2.5D ray-tracing" explosion, which will continue on next-gen with screen-space reflections and the like.

Other games are using more advanced 3D/world-space kinds of ray-tracing for other AO techniques on prev-gen, however, they're only tracing against extremely simple representations of the scene... e.g. ignoring everything except one character, and representing the character as less than a dozen spheroids...

Fully 3D ray-marching has shown up in other places, such as good fog rendering (shadow and fog interaction), or pre-baked specular reflection (e.g. cube-map) masking by ray-marching against a simplified version of the environment.

The voxel cone tracing stuff might ship on a few next-gen console games that require fancy specular dynamic reflections, but even with their modern GPU's it's a pretty damn costly technique.

### #15Krypt0n  Members

Posted 17 February 2014 - 05:36 PM

The voxel cone tracing stuff might ship on a few next-gen console games that require fancy specular dynamic reflections, but even with their modern GPU's it's a pretty damn costly technique.

the problem with voxel cone tracing is exactly this:

Every year, this performance figure gets lower and lower (blue line divided by red line is bytes per op):

I was playing with voxel tracing for more than 20y now (commanche and later outcast made it so appealing), but it was always memory bound. it works like 10x faster on GPUs solely cause they push 300GB/s instead of 20GB/s that the average cpu does. yet it has a bad scaling if you extrapolate your graph to the future. I remember profiling my animated voxels ( http://twitpic.com/3rm2sa ) and literally 50% of the time when running on one core was in the instruction that was fetching the voxel. running it on 6cores/12threads with perfect algorithmical scaling just made it about 2-3x as  fast, that one instruction got to 80%+ as soon as your voxel working-set exceeded the cache size. (yes, that was cpu, but my GPU version are similarly sensitive to memory size)

For people to completely ditch rasterisation and fully switch to ray-tracing methods, we need some huge breakthroughs in efficiency, in efficient scene traversal schemes with predictable memory access patterns for groups of rays.

that's one way to research, but a way more promising direction is to make better use of the rays you can already cast. In my research of realtime path tracing, even problematic materials like chrome ( http://twitpic.com/a2v276 ) ended up quite nice with about 25samples/pixel, mostly you get away with ~5 - ~10 spp. once you reach 100MRay/s+ it really becomes useable.

if someone would invest the money to make a path tracing game, it would be possible today on high end machines. sure it would be a purpose made game, just like quake, doom3 or rage were, but it would work out nowadays.

the reason we don't have ray tracing games is more of a business than tech issue. games like doom2, quake, unreal were running on high end machines "kind of" smooth. that was a small market coverage, but tech sold it.

nowadays, nobody would invest in a game for 'just' 1Million units in sale for the top tear of PC users (aka gamer elite race).

it's really sad, I'm sure I'm not the only one who could create this tech.

### #16/ fir   Members

Posted 18 February 2014 - 02:37 AM

ray-tracing [is] still at least 15-20 years away from becoming generalized solutions fast enough to finally replace rasterization entirely,
Either way, one thing's clear: 30 years from now nobody will be using rasterization anymore and graphics programming will be simpler in many aspects for it.

I'm sure there were people saying the same thing 10, 15, 20 & 30 years ago

If real-time follows the same path as film VFX, we'll probably see micropolygon renderers / Reyes-style rendering in real-time sooner than full-blown ray-tracing.

The problem with ray-tracing, which keeps it from being "just a few more years away" is that it has terrible memory access patterns. Rasterization on the other hand can be very predictable with it's memory access patterns. This is a big deal because memory is slow, and the main methods for speeding it up rely on coherent access patterns. Many of the tricks employed by real-time ray-tracers are focussed on improving the coherency of rays, or other techniques to reduce random memory accesses.

This problem never gets better.
Say that your program is slow because you can't deliver data from the RAM to the processor fast enough -- the solution: build faster RAM!
Ok, you wait a few years until our RAM 2x is now faster, however, in the same period of time our processors have gotten another 10x faster!
Now, after waiting those few years, the problem is now actually much worse than it was before. The amount of data that you can deliver to the CPU per operation, or bytes/FLOP has actually decreased.

Every year, this performance figure gets lower and lower (blue line divided by red line is bytes per op):

For people to completely ditch rasterisation and fully switch to ray-tracing methods, we need some huge breakthroughs in efficiency, in efficient scene traversal schemes with predictable memory access patterns for groups of rays.

Film guys can get away with this terrible efficiency because they aren't constrained by the size of their computers, the heat output, the cost to run, or the latency from input to image... If you need 100 computers using 100KW of power, inside a climate controlled data-center with 3 hours of latency, so be it!

On the other hand, the main focus of the GPU manufacturers these days seems to be in decreasing the number of joules required for any operation, especially operations involved in physically moving bytes of data from one part of the chip to another... because most consumers can't justify buying a 1KW desktop computer, and they also want 12Hr battery life from their mobile computers... so efficiency is kind of important.

But I'm completely sure that we'll see hybrid solutions, with rasterization based engines using ray tracing for secondary effects this gen. At least on PC, it's basically a certain thing in my opinion, on next gen consoles I'm not so sure.
You're right about this though. The appearance of SSAO in prev-gen was the beginning of a huge "2.5D ray-tracing" explosion, which will continue on next-gen with screen-space reflections and the like.

Other games are using more advanced 3D/world-space kinds of ray-tracing for other AO techniques on prev-gen, however, they're only tracing against extremely simple representations of the scene... e.g. ignoring everything except one character, and representing the character as less than a dozen spheroids...

Fully 3D ray-marching has shown up in other places, such as good fog rendering (shadow and fog interaction), or pre-baked specular reflection (e.g. cube-map) masking by ray-marching against a simplified version of the environment.

The voxel cone tracing stuff might ship on a few next-gen console games that require fancy specular dynamic reflections, but even with their modern GPU's it's a pretty damn costly technique.

Im not sure if this is this way...

Isnt raytracing better suited for paralisation than rasterization?

You can trace all the ray independant on another, this is probably no write ram collisions (or I am wrong?) you only need common ram reads

- for rasterization it seem to me it is less nice

Also - I dont know how it looks like today but as someone said

today you can already really do quake game on pathtracer (as far as i know pathtracer is much heavier than simple raytracer so for just raytracer you could  get much faster rendering)

(also check up how far was quake2 rasterization engine optymization done by carmack - this engine had no framebuffer pixel overvriting at all, and also a terribly amount of other crazy optymizations - and gone terribly far in optymizations- I am not sure if todays pathtracer people are going so far  - though I know you people are good anyways)

If so this is not to far to realy use it - isnt doing tracing-directiona hardware acceleration for this speeded up it yet a couple of times

Edited by fir, 18 February 2014 - 02:40 AM.

### #17Hodgman  Moderators

Posted 18 February 2014 - 04:46 AM

Im not sure if this is this way...

Isnt raytracing better suited for paralisation than rasterization?

You can trace all the ray independant on another, this is probably no write ram collisions (or I am wrong?) you only need common ram reads

They're both embarrassingly parallel. A common technique to split rasterization over different "cores" and avoid conflicting writes is to use tiled rendering.

My post above wasn't about parallisation though - it was about memory bandwidth, and alleviating the memory-bandwidth-bottleneck via predictable memory access patterns.

As a practical example of the importance of memory access patterns, try this test using a very large value for LARGE_NUMBER (preferably, many times larger than your CPU's L2 or L3 cache size).

// initialization
vector<int> indicesA(LARGE_NUMBER);
vector<int> indicesB(LARGE_NUMBER);
for( int i=0; i!=LARGE_NUMBER; ++i )
{
indicesA[i] = i;
indicesB[i] = i;
}
std::random_shuffle( indicesB.begin(), indicesB.end() );

//indicesA contains 1,2,3,4...
//indicesB contains 2,4,3,1... (same as indicesB, but in a random order)

float* values = new float[LARGE_NUMBER];
//fill values with stuff
// test #1
float totalA = 0
for( int i=0; i!=LARGE_NUMBER; ++i )
{
totalA += sqrtf(values[indicesA[i]]);
}
// test #2
float totalB = 0
for( int i=0; i!=LARGE_NUMBER; ++i )
{
totalB += sqrtf(values[indicesB[i]]);
}

Both these tests should produce the same value -- both totalA and totalB contain the sum of the square-root of every item inside values.
The differences is that test#1 accesses the values in a predictable, linear order, whereas test#2 accesses the values in a completely random order. The CPU cache will be able to optimize the memory fetches for test #1, but it will provide very minimal help for test #2. Test #1 should be much faster.

The super-high-level view of ray-tracing's memory accesses is:
* For each pixel (predictable access), test for ray collision with scene acceleration structure (e.g. BVH/etc) and with scene objects themselves (random access), write to pixel.

* Also, neighbouring pixels may take completely different paths through the scene (low contiguity), defeating caching.

The super-high-level view of rasterization's memory accesses is:
* For each triangle in the scene (predictable access), compute the covered pixels, write to those pixels (predictable access).

* Also, each triangle generates a list of pixels that are all neighbours (high contiguity) and can share cached values about their triangle.

i.e. the Achilles heel of ray-tracing, is that ray-vs-scene collision testing does not have predictable memory access patterns. This means that your algorithms are extremely likely to be bottlenecked by RAM speeds rather than processor speeds. Increases in processor speeds do very little to speed up your algorithms -- instead you depend on increases in RAM speeds (and your processors spend most of their time sitting idle, waiting for data to arrive from RAM).

### #18/ fir   Members

Posted 18 February 2014 - 05:11 AM

Allright but if so this slowdowns should not apply probably to scenes that has not so terribly big ram footprint and are just resonable in ram size

(todays cache is probably about 10MB ? what if a whole scene

would comprise in such ram (this would be relative simple scenes of 200k triangles or something but should be raytraced quickly - or not?)

Are raytraced scenes had damn gigabytes large memory footprint?

Does this bhv structures so large ram footprint (I was not doing this and i am not sure how it works - Is this some kind of spacial 3d grid of boxes with ray/box intersection test routine and some bressenham kind of traversal thru this 3d grid used? This stuff consumer this ram?

### #19Krypt0n  Members

Posted 18 February 2014 - 06:06 AM

just tracing small scenes that fit into your cache, especially the L1 cache, will probably be fast, you could fit some Quake1 level into the cache and it would work ok.

I remember someone traced star trek elite force ages ago, there must be a vid on youtube... there:

but it's still memory limited...

you can do math with SSE on 4 elements and AVX on 8 elements, so in theory you could work on 8 rays at the same time, yet once you come to memory fetching, it's one fetch at a time. it's even worse, if you process 4 or 8 elements at a time, there will be always also 4 or 8 reads in a row. so while your out-of-order cpu can hide some of the fetch latency by processing independent instructions, once you queue 4 or 8 fetches, there is just nothing to do for the other units than to wait for your memory requests.

and while L1 requests are kind of hidden by the instruction decoding etc. if you start queuing up 4 or 8 reads and those go to the L2 cache, with ~12cycle latency on hit, you have ~50 or ~100cycles until you can work on the results.

http://embree.github.io/data/embree-siggraph-2013-final.pdf

SSE is barely doing anything to the performance as you can see.

on the other side, rasterization is just a very specialized form of ray tracing. you transform the triangles into the ray space, so the rays end up being 2d dots that you can evaluate before you do the actually intersection. and the intersection is done by interpolating coherent data (usually UVs and Z) and projection. you also exploit the fact that you can limit the primitive vs ray test by a small bounding rect, skipping most of the pixels (or 2d dots). the coherent interpolation also exploits the fact that you can keep your triangle data for a long time in registers (so you don't need to fetch data from memory) and that you don't need to fetch the rays either, as you know they're in a regular grid order and calculating their positions is just natural and fast. I'm not talking about scanline rasterization, but halfspace or even more homogenous rasterization

if you could order rays in some way to be grouped like in rasterization, you'd get close to be as fast with tracing as you are with rasterizing. indeed, there is quite some research going on in how to cluster rays and deferring their processing until enough of them try to touch similar regions. there is research in speculative tracing, e.g. you do triangle tests not with just one ray, but with 4 or 8 at a time and although it's bogus to test random rays, it's also free with SSE/AVX and if your rays are somehow coherent (e.g. for shadow rays or primary rays), it end ups somehow faster.

as I said in my first post here, there is already so much research regarding faster tracing, but there is really a lot room to improve on what you do with those rays. you can assume you'll get about 100-200MRay/s. that's what the caustics RT hardware does, that's what you get with optix, that's what intel can achive with embree. and even if you'd magically get 4x -> 800MRay/s, you'd just reduce monte carlo path tracing noise by 50%. on the other side, you can add importance sampling, improve your random number or ray generator with a few lines of code (and a lot of thinking) and you'll get suddenly to 5% of the noise (it's really in those big steps).

further, ray tracing (and especially path tracing) is still quite an academic area. for rasterized games, we just ignore mostly the proof of correctness and some theories that we violate. texture sampling, mipmapping etc.  was done properly done 20years ago in offline renderern, yet even the latest and greatest hardware will produce blurry or noisy results at some angles and while we know exactly how to solve it academically correct, we rather switch on 64x AF and it also solves the problem to some degree.

that's how game devs should also approach tracing. e.g. not being biased is nice in theory (a theory that just hold in theory, as floats are biased per definition), you can get far better/faster results doing biased algorithms like photonmapping or canceling ray depth after a fixed amount of reflections.

all this talk makes me want to work tonight on my path tracer again, .... you guys !

### #20spinningcube  Members

Posted 18 February 2014 - 07:14 AM

Use less memory. Use compressed containers. Access per thread/core in a sub-tree or coherently. Use less pointers.

My 10 000 USD ;-)

spinningcube

PS - oh do I miss the heydays of ompf...

Edited by jbadams, 13 April 2014 - 01:38 AM.
Restored post contents from history.

Old topic!

Guest, the last post of this topic is over 60 days old and at this point you may not reply in this topic. If you wish to continue this conversation start a new topic.