Do you actually split the work between the threads or do you simply multiply the work with every thread?
Not sure what you mean, but i tried multiple approaches.
The one i wrote above, and this one too :
for(int m_prime = 0; i < N; i++)
threads.push_back(std::thread(&Ocean::DoFFT, this, 1, m_prime * N));
if(threads.size() == 4)
//.. join all 4 threads before moving to the next batch.
But like Hogman mentioned, i am spawning 64 threads even with this approach, i'll just create some kind of threadpool of 4 permanent threads instead.
Side question though, assuming i make this thing work properly with expected fps boost, would i get better performance with OpenMP (so code still executed on CPU), or should i jump directly to a OpenCL implementation ?
Thanks for your help !