Jump to content

  • Log In with Google      Sign In   
  • Create Account


#ActualErik Rufelt

Posted 23 November 2012 - 12:42 PM

It won't matter, if you have lots of work-items, which you should have. The threads that complete the kernel just restart and run it again for another work-item, so if you have a million work-items and a thousand threads it won't even be noticeable if one of those threads takes 100x longer than the others, as the others will run 1000 times anyway. If you only have as many work-items as there are cores, then yes all the others will wait for the slowest one, but if you have such a complex kernel that's run so few times then OpenCL probably isn't the right tool for the job. In that case, either do it on the CPU instead or try to change your kernel into a smaller kernel that runs 10x as many times and does part of the work each time.

#3Erik Rufelt

Posted 23 November 2012 - 12:40 PM

It won't matter, if you have lots of work-items, which you should have. The threads that complete the kernel just restart and run it again for another work-item, so if you have a million work-items and a thousand threads it won't even be noticeable if one of those threads takes 100x longer than the others, as the others will run 1000 times anyway. If you only have as many work-items as there are cores, then yes all the others will wait for the slowest one, but if you have such a complex kernel that's run so few times then OpenCL probably isn't the right tool for the job. In that case, either do it on the CPU instead or try to divide your kernel into a smaller kernel that run 10x as many times and do part of the work at a time.

#2Erik Rufelt

Posted 23 November 2012 - 12:39 PM

It won't matter, if you have lots of work-items, which you should have. The threads that complete the kernel just restart and run it again for another work-item, so if you have a million work-items and a thousand threads it won't even be noticeable if one of those threads takes 100x longer than the others, as the others will run 1000 times anyway. If you only have as many work-items as there are cores, then yes all the others will wait for the slowest one, but if you have such a complex kernel that's run so few times then OpenCL probably isn't the right tool for the job. In that case, either do it on the CPU instead or try to divide your kernel into 10 smaller kernels that's run 10x as many times.

#1Erik Rufelt

Posted 23 November 2012 - 12:38 PM

It won't matter, if you have lots of work-items, which you should have. The threads that complete the kernel just restart and run it again for another work-item, so if you have a million work-items and a thousand threads it won't even be noticeable if one of those threads takes 100x longer than the others, as the others will run 1000 times anyway. If you only have as many work-items as there are cores, then yes all the others will wait for the slowest one, but if you have such a complex kernel that's run such a few times then OpenCL probably isn't the right tool for the job. In that case, either do it on the CPU instead or try to divide your kernel into 10 smaller kernels that's run 10x as many times.

PARTNERS