First post here, heard that this is a good forum to get help on these sort of issues.
Basically, I'm building a raytracer/pathtracer which is coming along nicely, but I wanted to speed it up using SIMD intrinsics. I have unit tested each part and it produces the correct results - but it has slowed my raytracer to a crawl.
This seems to be an issue on heavy triangle based scenes (so memory transfer intensive I guess) - my scene of implicit spheres is roughly the same performance (which is still disappointing), but my scene with a mesh tank is now very very slow.
Fired up vTune and I discovered that I'm getting some horrendous bottlenecks in areas of code which make no sense!
So running it on my triangle based scene for about 25 seconds, according to vTune grabbing the second indice is taking almost 10 seconds in total! Yet the other indices are fine! It also shows some assembly to the left side, but I don't know anything about assembly... I have similar bottlenecks elsewhere in my code, which are not there in my non simd version.
What am I doing wrong? Is there something I should look out for in my code or is there something that I shouldn't be doing with SIMD code?
Any help would be much appreciated!