# Ray tracing in realtime with PS3's

## Recommended Posts

Thought you all would find this interesting. PS3 Raytracing

##### Share on other sites
... looks like a long way to a racing game :-)

##### Share on other sites
Is it just the crappy Youtube quality, or does the performance of that demo really suck that hard? Very disappointing if these guys can't get any faster than that on such (theoretically) potent hardware.

##### Share on other sites
Cool trick, but the PS3 hardware is laughably ill-suited for the demands of real-time raytracing. It's not going to happen in the next five years except on an ASIC.

##### Share on other sites
If your scene is small-ish, you can use all the SPUs for ray tracing. However, if your scene is real-size, the local memory on the SPU isn't enough.

1) You only get access to 256 MB (because the other 256 MB is a bank shared with graphics)
2) You don't get access to the 3D hardware, only a "dumb" 2D frame buffer

If you could use some features of the 3D hardware (say, blitting), you could do really cool things!

##### Share on other sites
Really? I would have thought the stream-based processing concept would allow some interesting scaling tricks for raytracing. Or are you referring to the terrible memory access architecture (which I can see being a massive problem if coherency tricks are not exploited well)?

 Never mind, hplus0603 ninja-answered my question [smile]

##### Share on other sites
Now if the PS3 architecture is ill-suited to raytracing, it makes me wonder how well one of those 80-core intel chips would handle raytracing. (The intel chips are not yet release of course - still a few years down the road, if they ever do come out).

##### Share on other sites
The problem isn't having enough CPU core time to throw at it. The problem is keeping memory throughput high enough. Raytracing has very nasty memory access patterns which thrash the living hell out of current CPU cache architectures, and generally strangle the memory bus. Special considerations for this must be made.

There are some coherency patterns which can be exploited to try and control the memory access patterns, but they only generally work on first-order rays, so anything with reflections, dynamic light, etc. (i.e. the stuff we want to use raytracing for in the first place) is not going to benefit much.

We are most likely to see performant raytracing appear on a GPU-style architecture with a dedicated memory bus and massive parallelization (i.e. not just separate cores running different blocks of the image, but parallel processing on many scales all the way down to SIMD-style operations). It is not yet clear whether or not current GPUs will be able to hybridize sufficiently to do both scanline rasterization and practical raytracing on single hardware, but definitely expect RTRT (and, far more importantly, RTGI) to appear from the video card people, not the CPU people.

As Sneftel alluded to, an ASIC is also a strong option, but the expense and difficulty of marketing such a product are immense, so chances are we will not see any such ASIC product for consumer use. There have been some academic efforts in this direction (i.e. SaarCOR's project) but frankly they made some poor decisions up-front and will probably never have a marketable product. I haven't heard anything from them in years, so it's thoroughly possible that they're already gone.

##### Share on other sites
I also noticed that the performance didn't seem that great, especially since they had to connect three machines. If we assume that a "typical game model" is 15K polys (which is an absurd figure), then their model is around 1.125 million. 1 million or so polys is doable by RSX in realtime at full speed, while still doing HD and shading and everything.

##### Share on other sites
Very bad example scene/object has been chosen. Similar effects have been accomplished years ago using fixed-function pipeline. Since PS3 is a gaming platform, you`ll rarely see the cars from such a small distance. Besides, the current shader techniques are more than adequate qulity for this type of effect.

They could have demonstrated some other effect (that is hard to do on current generation of gfx cards) or some extremely complex scene which would make current graphics crawl, but a car ?

Maybe next time.

##### Share on other sites
Even if I'm not able to see the video for some obscure reason, I think it's the raytracer I see at Siggraph last year. It wasn't on the PS3 but on a cell workstation. The performance aren't impressive and the quality of the rendering either. At Siggraph there was evident artifact due to some error in the geometry.

##### Share on other sites
Quote:
 Original post by ApochPiQIt is not yet clear whether or not current GPUs will be able to hybridize sufficiently to do both scanline rasterization and practical raytracing on single hardware, but definitely expect RTRT (and, far more importantly, RTGI) to appear from the video card people, not the CPU people.

It is pretty clear on what is going to happen next with raytracing.
The largest hurdle and timesink when it comes to raytracing is the raytest, the actual code/hardware that tests a ray against the geometry, this is the only thing that differs between scanline rasterisation and raytracing, and there is no problem combining them(just look at the lightwave renderer).
In fact this raytest stage can be achieved with a surprisingly small amount of code on the GF8800, even though it will be faster than on any cpu(besides perhaps cell) it would still be slow.
That is unless you have specialized hardware for it.
If you look at what the SaarCOR people did, they used a modified memory design with many small sub processors to do this, let's call them SPPU(Single Purpose Processing Unit).
Now if you take thousands of these tiny raytest SPPU's, each with a few bytes of memory(to store geometry) and put them in a massive parallel grid processing array, then maybe you could get enough speed so that a raytest would sort of equal the speed of a texture lookup, and if you have that tied to a fragment shader then you have a true raytracing capable card.

##### Share on other sites
Awesome, another 2 PS3s and they might have enough power to play a game

##### Share on other sites
Quote:
 Original post by lc_overlordIt is pretty clear on what is going to happen next with raytracing.The largest hurdle and timesink when it comes to raytracing is the raytest, the actual code/hardware that tests a ray against the geometry, this is the only thing that differs between scanline rasterisation and raytracing, and there is no problem combining them(just look at the lightwave renderer).In fact this raytest stage can be achieved with a surprisingly small amount of code on the GF8800, even though it will be faster than on any cpu(besides perhaps cell) it would still be slow.That is unless you have specialized hardware for it.If you look at what the SaarCOR people did, they used a modified memory design with many small sub processors to do this, let's call them SPPU(Single Purpose Processing Unit).Now if you take thousands of these tiny raytest SPPU's, each with a few bytes of memory(to store geometry) and put them in a massive parallel grid processing array, then maybe you could get enough speed so that a raytest would sort of equal the speed of a texture lookup, and if you have that tied to a fragment shader then you have a true raytracing capable card.

Too many people focus on simple geometry intersection as if it mattered. Geometry hit tests aren't a big deal; there have been reliable RT-on-GPU implementations since the first Shader Model 2 hardware became available. Obviously raytracing can be done on GPUs with the current structure. Simple ray-casting techniques are already done as a matter of course in programmable shaders. There are even rudimentary global illumination solutions available.

Raytracing can also be done on my calculator and my cell-phone. That does not mean that those architectures are optimal.

It is certainly not yet clear whether or not the existing GPU architecture is optimal. A competitive, practical, marketable raytracing product would need to be heavily programmable, and support far more than simple Whitted-style tracing. In fact I would go so far as to say that without a realtime global illumination solution there is no point in deploying raytracing-centric hardware at all, because scanline rasterization in combination with programmable shaders is sufficiently powerful to outdo any Whitted-style renderer.

The problem is that RTGI is still a fairly unexplored domain. Raytracing itself is quite well understood, but we have a rather less thorough understanding of the possible GI algorithms and how to make them scale both in space and time.

Raytracing-based photon mapping methods have been done on the GPU for years already, but one of the persistent questions is if the memory requirements (in terms of access patterns and throughput) of RTGI and rasterization are fundamentally incompatible. That is, nobody yet knows for sure if a good raytracing CPU can be built on the same fundamental architecture as a good scanline rasterization CPU.

My gut feeling is that the two architectures are in fact incompatible, at least if one is genuinely interested in deploying raytracing hardware as a competitive replacement for scanline methods, and not merely a curiosity.

##### Share on other sites
Quote:
 Original post by ApochPiQMy gut feeling is that the two architectures are in fact incompatible, at least if one is genuinely interested in deploying raytracing hardware as a competitive replacement for scanline methods, and not merely a curiosity.

No they are not incompatible.
Take the first ray intersection test from the camera point of view as an example, compared to scanline rendering there is no difference, they both produce exactly the same result and the same data (just in different ways), but scanline does it a lot faster, even with the right RT hardware.
It is the same thing with stencil shadows where both methods produce the same result.

The difference comes first after the second ray intersection where raytracing can continue testing transparency without any real effort, but with scanline you either need to do some tedious sorting or use an advanced method like depth pealing, none of which works that well.
It's entirely possible to do the first pass with scanline and the rest with RT if needed, thus reducing the usage of RT

Quote:
 Original post by ApochPiQGeometry hit tests aren't a big deal

Yes they are, even though they are pretty fast in todays GPU programs, it's the shear number of intersect tests needed that call for heavy optimization.
A modern game today can have about 100000 polys in a single frame.
A 1080P screen has 2073600 pixels.
that means a relatively simple intersection test(let's say for shadows) needs 207360000000 ray-polygon intersection tests(that is without any optimization), or 207.36 billion tests per frame, and anything that is going to be run 200 billion times does matter.

##### Share on other sites
Well this demostration is just a interesting novelty now it does lend credibilty to the idea that the playstation 4, xbox720 or what ever consols may exist 7-10 years from now (hell could even be a wii2) could see the first real ray traced commerical games.

Of course if you consider this to be the case its likely we'll be messing around with raytracing on the pc alot more real time in like 3-5 years. Sence top of the line pc hardware always outpaces the consols.

I'm not suggesting anything like replacing rasterization but hardware that could do both reasonably.

##### Share on other sites
Quote:
 Original post by lc_overlordthat means a relatively simple intersection test(let's say for shadows) needs 207360000000 ray-polygon intersection tests(that is without any optimization), or 207.36 billion tests per frame, and anything that is going to be run 200 billion times does matter.

Right. That's why nobody who knew what they were doing would do something like that without "optimization" (read: better algorithms).

##### Share on other sites
Quote:
 Original post by lc_overlordNo they are not incompatible.Take the first ray intersection test from the camera point of view as an example, compared to scanline rendering there is no difference, they both produce exactly the same result and the same data (just in different ways), but scanline does it a lot faster, even with the right RT hardware.It is the same thing with stencil shadows where both methods produce the same result.

That's not what I'm talking about.

Like I noted several times, there are many well-known implementations of raytracing and even global illumination via raytracing on real GPUs, right now, today.

I'm not saying that raytracing can't be done on a GPU. I'm saying that the GPU memory architecture may not be the best arrangement for dedicated ray-tracing hardware. In other words, if you want to make a raytracing "card", the best way to do it is probably to invent a new design, not to base it on existing GPU architecture.

Quote:
 Original post by lc_overlordThe difference comes first after the second ray intersection where raytracing can continue testing transparency without any real effort, but with scanline you either need to do some tedious sorting or use an advanced method like depth pealing, none of which works that well.It's entirely possible to do the first pass with scanline and the rest with RT if needed, thus reducing the usage of RT

And hybrid scanline/raytracing rendering is a waste of time if you're talking about dedicated hardware. Raytracing is predominantly interesting for visual phenomena beyond the first bounce; that is, raytracing is a powerful technique because of its recursive nature. In any real-world render, the number of recursive rays typically outnumbers first-hit rays; often times there are an order of magnitude more secondary rays than primary rays.

Eliminating the first hit is only a tiny speedup. If you want dedicated RT hardware, doing a hybrid of scanline and RT means you have to spend a lot of expensive chip space on the scanline support; it would be more efficient in many ways to just do pure RT and take the (negligible) performance hit. But that's a totally different rabbit-trail [smile]

Quote:
Original post by lc_overlord
Quote:
 Original post by ApochPiQGeometry hit tests aren't a big deal

Yes they are, even though they are pretty fast in todays GPU programs, it's the shear number of intersect tests needed that call for heavy optimization.
A modern game today can have about 100000 polys in a single frame.
A 1080P screen has 2073600 pixels.
that means a relatively simple intersection test(let's say for shadows) needs 207360000000 ray-polygon intersection tests(that is without any optimization), or 207.36 billion tests per frame, and anything that is going to be run 200 billion times does matter.

Geometry hit tests are not the bottleneck. In theory, you can parallelize the hit tests without bound, until you're doing millions of hit tests simultaneously per ray.

In practice, however, this is not really possible, because it isn't possible with current memory bus designs to feed that much data into a chip. The challenge of RTRT in hardware is, as I said, not a matter of making ray-geometry tests fast. The challenge of RTRT in hardware is memory bandwidth.

##### Share on other sites
Quote:
 Original post by ApochPiQThe challenge of RTRT in hardware is memory bandwidth.

As with all other graphics hardware memory bandwidth is always the big limiter, current GPUs does have the same problem with texture sampling, i could use 50+ texture reads for each texture sample to get some nice filtering, but on current hardware i have to use a bad form of mipmaping with only 4-8 reads per sample.
The raytest grid processor is one way to reduce some of that bandwidth need as it doesn't need to transfer all that data for each test.

Quote:
 Original post by ApochPiQIf you want dedicated RT hardware, doing a hybrid of scanline and RT means you have to spend a lot of expensive chip space on the scanline support

Sort of, but only if you use an older architecture.
The main portion in the common GPU is the programmable shader units, and you would still need those on a RTRT card, otherwise nothing will be rendered (in either of the two rendering methods), and because you now with SM4 have a unified shader architecture where the vertex, geometry and fragment processors run (or can run) on the same shader units.
So currently the only scanline rasterising specific circuitry is the actual rasteriser and that is not a big part of the chip (in theory this to could be run on a shader unit or two).
Anyway, once they figure out how to do a fast RTRT card, i do believe they will find a way to do scanline rasterasing at the same time, because that is what is needed to make these cards accepted, cheap and backwards compatible.

##### Share on other sites
Quote:
 Original post by SneftelCool trick, but the PS3 hardware is laughably ill-suited for the demands of real-time raytracing. It's not going to happen in the next five years except on an ASIC.

what about a top of the line $1000+ intel cpu? FWIW cell slaughters the$1000+ cpu WRT raytracing
cell illsuited, hmmmm, ild be pointing the finger more at other players :)

true though we wont be seeing any major solely raytraced games coming to the ps3, but raytracing is used in practically all games, eg particle collision tests, line of sight tests + for more advanced stuff eg radiosity, gathering etc

##### Share on other sites
Quote:
Original post by zedz
Quote:
 Original post by SneftelCool trick, but the PS3 hardware is laughably ill-suited for the demands of real-time raytracing. It's not going to happen in the next five years except on an ASIC.

what about a top of the line $1000+ intel cpu? FWIW cell slaughters the$1000+ cpu WRT raytracing
cell illsuited, hmmmm, ild be pointing the finger more at other players :)
O...k...

What does that have to do with a custom raytracing ASIC?

##### Share on other sites
Quote:
Original post by zedz
Quote:
 Original post by SneftelCool trick, but the PS3 hardware is laughably ill-suited for the demands of real-time raytracing. It's not going to happen in the next five years except on an ASIC.

what about a top of the line $1000+ intel cpu? FWIW cell slaughters the$1000+ cpu WRT raytracing
cell illsuited, hmmmm, ild be pointing the finger more at other players :)

true though we wont be seeing any major solely raytraced games coming to the ps3, but raytracing is used in practically all games, eg particle collision tests, line of sight tests + for more advanced stuff eg radiosity, gathering etc

Why isnt it ill suited? The cell uses an asymmetric processor architecture, which means that the SPE's do not have direct access to the main RAM. If you wanted to do ray tracing tests on the SPE's, you would be severly limited with the amount of geometry you could test against in 256KB of memory. In a typical game scene, your geometry would easily number 100k polygons. Obviously you cant fit all of that in the SPE's memory. A single ray test could require multiple iterations of filling the memory with different geometry from the scene. This will introduce additional overheads to the PPE which is already going to be heavily taxed since it is the only real capable General Purpose CPU which will be handling most of the conventional game code and also happens to be underpowered when compared to a Pentium 4 or G5.

In my opinion, theoretically the cell is extremely powerful but in practice the architecture is just flawed for games or almost any other general purpose software.

##### Share on other sites
Quote:
 Original post by zedzwhat about a top of the line $1000+ intel cpu?FWIW cell slaughters the$1000+ cpu WRT raytracingcell illsuited, hmmmm, ild be pointing the finger more at other players :)true though we wont be seeing any major solely raytraced games coming to the ps3, but raytracing is used in practically all games, eg particle collision tests, line of sight tests + for more advanced stuff eg radiosity, gathering etc

Are you drunk?

##### Share on other sites
Quote:
 Original post by GamerSgWhy isnt it ill suited? The cell uses an asymmetric processor architecture, which means that the SPE's do not have direct access to the main RAM. If you wanted to do ray tracing tests on the SPE's, you would be severly limited with the amount of geometry you could test against in 256KB of memory. In a typical game scene, your geometry would easily number 100k polygons. Obviously you cant fit all of that in the SPE's memory. A single ray test could require multiple iterations of filling the memory with different geometry from the scene. This will introduce additional overheads to the PPE which is already going to be heavily taxed since it is the only real capable General Purpose CPU which will be handling most of the conventional game code and also happens to be underpowered when compared to a Pentium 4 or G5. In my opinion, theoretically the cell is extremely powerful but in practice the architecture is just flawed for games or almost any other general purpose software.

Well i wouldn't say ill suited, just not optimal, while it's true the cell SPUs doesn't have direct access to the ram it does have indirect access, and it also excels at streaming data from one processor to another, so instead of working on one pixel/ray at the time it just does several at once and then streams the geometry data between the 5 spu's available, so you can in fact use all that processing power without running into the memory bandwidth limit if you just think for a way around that problem.
Although, cell is still underpowered(for RTRT), and you would need 5-10 cells to get any kind of reasonable RTRT.

##### Share on other sites
Quote:
 Why isnt it ill suited?

u miss the point of my post, A/ cell is a lot lot better at this than a top of the line pc cpu, ie the \$1000+ is a lot more illsuited than cell

btw WRT the above demo, true framerate looks about 5fps (which is pants)
but the car consists of 1.6million polygons (hmm someone mention cell will struggle at 100k, well i guess not)
the resolution is 1920x1080 pixels
theres 96 ray cast per pixel

see here for a summary
http://www.gametomorrow.com/minor/barry/iRT-Sumary.pdf

(why has it suddenly gone quiet :) )

the quality is far higher than any existing game, sure they could make the quality lower and the framerate will improve, but im guessing they wanna show u can do movie quality cgi at interactive framerates, on the fastest pc doing the same scene yould be looking at seconds per frame not frames per second

## Create an account

Register a new account

• ### Forum Statistics

• Total Topics
628301
• Total Posts
2981913

• 10
• 11
• 11
• 10
• 10