Path tracing: CUDA or CPU?

Started by
3 comments, last by forsandifs 13 years, 6 months ago
I am writing a path tracer on the CPU at the moment, but I am highly concerned about performance, as I am aiming for real time rendering.

I understand that not only does CUDA allow massively parallel processing it also allows storage of information in the shape of arrays on the GPU. Would this not allow one to perform parallel per pixel cacluations with access to the scene geometry? Hence would it not be a good candidate for fast path tracers?

What are the pitfalls / major dificulties of using the CPU or CUDA for path tracing? What results in terms of quality and frame rates have people gotten so far for each? Basically, how should I implement my path tracer; using the CPU or using CUDA?
Advertisement
You can always combine both as Jacco Bikker did. Look here. XD

Implementing path tracing in the CPU should be quite straightforward but slow. On the other hand, the GPU provides more raw processing power but it is limited in functionality. Namely, it is difficult to implement recursive functions on them, which are a natural part of global illumination, and the performance of GPUs drops a lot by the use of conditionals, random accesses to the memory or any other kind of incoherent processing. So using the GPU does not necessarly guarantee more performance.
You should search on the google by "GPU ray tracing" to see what I'm talking about. There is already a lot of academic research on that field.
Nvidia has a GPU ray tracing api called Optix that you might want to look into. It handles all the details that come with working with Cuda (like scheduling, recursion, ray packaging), and lets you focus on the actual implementation of a path tracer. It does have an overhead compared to raw Cuda code that is roughly 20%.

[Edited by - glaeken on October 12, 2010 6:37:04 PM]
Thank you very much guys! :)

From what I've been reading it seems that its not completely clear cut whether one or the other would be faster. The CPU is much more powerful per thread, but CUDA has so many more threads available. The difficulty of implementing an efficient CUDA path tracer in its limited environment is to some extent balanced by the difficulty of optimising an implementation on a CPU sufficiently.

One factor to consider though I think, is that if one is aiming to do more than just a path tracer, i.e. have some utility or entertainment in the application beyond pretty moving graphics, then one might want to have some processing power left over for the implementation of said utility. In which case, making full use of the hardware available seems like a sensible choice. And naively, since the GPU is called the Graphics Processing Unit it might make sense to actually use it for graphics, though on the other hand CUDA was originally developed for physics and maths calculations, but then again the ray tracing algorithm is just that..

About recursion, if I'm not mistaken, CUDA, with version 3.2 RC, supports recursion now? http://developer.nvidia.com/object/cuda_3_2_toolkit_rc.html . "Interval Computing, demonstrating the use of interval arithmetic operators using C++ templates and recursion".
I have looked in more detail at Optix and Brigade, (downladed the demos :P ). They've done some fantastic work imo, however, I will try to implement my own version of a path tracer.

Optix, whilst it renders high quality images, is far too slow on my very modest GPU, (a GTS 250 and getting about 1 to 5 fps), and I don't have the cash atm to invest in a Fermi GPU, or a GTS 4xx which I think they would recommend.

Brigade seems to give great frame rates but the images are barely recognisable imho.

I think, given that I want more than just a path tracer, it would be silly of me to develop it for the CPU and thus swamp all my processing power leaving nothing for anything else. Hence I must develop it using CUDA. I hope to differ from the Optix implementation via a lesser image quality whilst keeping images recognisable and somewhat pleasent to look at, and a more simple implementation, with the overall aim of higher frame rates.

The object may be very/too ambitious/impossible but atm I think it at least possible and worth a try. Worst case scenario is that I give it up for unattainable but having gained a lot of knowledge.

Beyond that I shall look forward to the day when there is no distinction between CPU and GPU, being contained within a single massively parallel processing unit, (kinda like the PS3 is atm or perhaps when Intel releases its mppu?)

This topic is closed to new replies.

Advertisement