So I decided to implement the exact Cg code presented in the RM paper and see if their approach works better. It does a bit, but still not a huge amount better than my attempt and it's a LOT slower. The way I designed my code had opportunities for early-out in the loops (thus reducing the number of samples) whereas theirs doesn't.
I'll give it another crack again tomorrow, but for now:
10 Linear Search Samples
5 Binary Search Samples
10.61 seconds per frame
20 Linear Search Samples
10 Binary Search Samples
40.16 seconds per frame
40 Linear Search Samples
20 Binary Search Samples
78.59 seconds per frame
80 Linear Search Samples
40 Binary Search Samples
152.96 seconds per frame
No, those 'seconds per frame' measurements are not a joke. To get the high quality image with 120 samples per pixel you need to wait over TWO MINUTES for the results [wow]
Someone give me a GeForce 8800 NOW!
*Punches REF Device*
Seems to be working better though, that's good news.
- Dan