CUDA questions

Started by
3 comments, last by macnihilist 13 years, 6 months ago
From: http://llpanorama.wordpress.com/2008/05/21/my-first-cuda-program/ , on the basics of a CUDA program.

"
1. The host initializes an array with data.
2. The array is copied from the host to the memory on the CUDA device.
3. The CUDA device operates on the data in the array.
4. The array is copied back to the host.
"

The author then goes on to showcase a simple example of this, where he intialises a small array on the host memory, copies the array to GPU memory, operates on the GPU version of the array using the GPU parallel processing (specifically squaring the value of every element in the array), and then copies the modified array back to host memory for printout.

Questions:

1. Does this mean that for example I could initialize an array containing the geometry information of all the triangles in my scene, copy it to the GPU using CUDA, access the information on that array to perform calculations using parallel programming, say for every pixel, and then send it back to the CPU?

2. I was talking about this with someone and they said something like ~"accessing global memory would be really expensive performance wise". But in the scheme described in my question number 1, would I not only be accessing memory stored on the GPU and therefore not accessing global memory, and therefore not incurring said performance cost?

3. Can I have more than one array intialized on the CPU and copied to the GPU?
Advertisement
1. Yes. That's exactly what the demo does, actually. Not sure where the uncertainty arises here?

2. The biggest cost is readback from the GPU to the CPU side. There are ways to do really slow operations on the GPU side, but you can find the most common pitfalls in the CUDA SDK documentation.

3. Sure.

Wielder of the Sacred Wands
[Work - ArenaNet] [Epoch Language] [Scribblings]

Thank you very much.

A follow on question:

At the moment I am writing a path tracer using a purely CPU algorithm. I am very concerned about speed however, (to the point where the aim is a real time path tracer), and I am wondering whether it would be advisable to put my efforts into a CUDA path tracer instead? It seems from the above answers to my original questions that such a project would be possible.

(The reason I am cautious, and ask questions where perhaps it seems there is little doubt, is that any such project requires a lot of time, and I want to be as sure as I can be before changing path).

(Also, given my most recent question, perhaps this thread would now be better placed in the Graphics Programming section?)
I don't see any reason why you couldn't do a path tracer via CUDA; arranged properly, path tracing lends itself highly to parallel processing which can be done fairly easily in a CUDA-style environment.

Keep in mind that the killer in doing realtime raytracing of any type is algorithmic level performance. General path tracing is very, very slow. Using something like photon mapping or Monte Carlo subsampling will help you a lot. It's been a few years since I was in the realtime raytracing game, so my techniques are rusty and likely outdated, but from what I recall CUDA would be a great way to build such an application.

Just be careful in your architecture to follow the best-practices guidelines from the SDK docs, so you get good memory throughput and such. Otherwise you can easily end up slower than a good vectorized CPU implementation, due to the way memory access and conditional branching work on contemporary GPUs.

Wielder of the Sacred Wands
[Work - ArenaNet] [Epoch Language] [Scribblings]

To drop a few links:

The Optix SDK has a simple path-tracing example you could play around with. This would also let you focus on the higher-level stuff and save you the pain of implementing an efficient CUDA ray tracing engine.

And then there is the impressive Brigade-Demo which (if I understand it correctly) uses both the GPU and CPU for ERPT. (Look at Jacco Bikker's youtube videos!!)

This topic is closed to new replies.

Advertisement