I am trying to write a basic ray tracer with CUDA.
What I have implemented now is simply 1 sample per pixel, and each sample is assigned to a cuda thread.
And each thread traces it's own ray.
I am writing to ask more advanced and efficient ways of doing this.
For example, what's the best the strategy for parallelizing all the tasks? When Multiple samples are used, should I assign each thread all samples in one pixel, or should I only parallelize computation within a thread and sequentially render all pixels?
And, I have also heard that it's better to trace ray in a breadth-first mannar? Why? any tutorial of how to do this?
Anyway, I will appreciate any advice and idea, thank you~
There's an article in GPU Pro 3 about ray-tracing with compute shaders, which should be relevant for you. I would also try and see you can dig up some implementation details on Nvidia's Optix, since at this point it's a pretty mature library and is bound to have a lot of Cuda-specific optimizations. I'm not sure if they have public implementation details, but perhaps you could email one of the developers.
Nvidia revealed a little of OptiX in a talk at GTC in 2010 online available here. Also, there is the paper from Parker et al. presented at Siggraph in 2010 (though it is more explaining the general architecture of the engine). I guess, there is more on the internet, perhaps more recent material.