Back to General and Gameplay Programming

OpenCL is very slow comparing to cpu.

General and Gameplay Programming Programming

Started by _Flame1_ January 31, 2013 02:48 PM

10 comments, last by WhiskyJoe 11 years, 2 months ago

user0

787

February 01, 2013 03:26 AM

Of course the problem is the memory to pci-e bus then process then back up the pci-e bus and back into memory

One thing that I believe your over looking is modern processors can also process 4 float operations at once so that probably accounts for quite a bit of it as well. The cpu will be limited by memory in this scenario as well.

Between those two issues you could do it on the cpu before it even transfers down the pci-e bus.

_Flame1_

Author

February 01, 2013 02:29 PM

I agree. After i've done this.

clGetDeviceInfo(device[0], CL_DEVICE_MAX_WORK_GROUP_SIZE, sizeof(size_t), &work_group, &ret_work_group);local_ws = work_group;global_ws = var_size / work_group;

then cpu was faster anyway.
But after i've changed function on:

c[iGID] = a[iGID] + sqrt(b[iGID] * b[iGID]);

then gpu was faster in about 3.5 times. So it works. Thanks for your help guys. But i'm wondering about global and local groups. If i have that right the local group is amount of gpu processors which run kernel function simultaneously. And global is count of such passes. Is it right? How should i calculate global group properly?

WhiskyJoe

1,795

February 06, 2013 11:47 AM

Take a look at this: http://3dgep.com/?p=2192

It's an introduction to opencl and explains some of the concepts, the other cuda related articles might also be of use in terms of understanding how the GPU "works".

My portfolio | My personal blog

OpenCL is very slow comparing to cpu.

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

OpenCL is very slow comparing to cpu.

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

Reticulating splines