OpenCL is very slow comparing to cpu.

Igor Spiridonov · 2013-01-31T14:48:41

Hello. I've created my first program with opencl. __kernel void vector_add_gpu (__global const float* a, __global const float* b, __global float* c, int iNumElements){ // get index into global data array int iGID = get_global_id(0); // bound check (equivalent to the limit on a 'for' loop for standard/serial C code if (iGID < iNumElements) { // add the vector elements c[iGID] = a[iGID] + b[iGID]; }} I have a quite big buffer with numbers(about 240 mbyte). Opencl spends in 5 time more then a cpu loop. Is it ok or something is wrong? If i have more complicated function(c[iGID] = a[iGID] + sqrt(b[iGID] * b[iGID]);) than difference is much bigger(in 150 times) :)Thank you. P.S. sorry my previous case was wrong i forget to put opencl file to the folder. :)

General and Gameplay Programming Programming

Started by _Flame1_ January 31, 2013 02:48 PM

10 comments, last by WhiskyJoe 11 years, 8 months ago

user0

787

February 01, 2013 03:26 AM

Of course the problem is the memory to pci-e bus then process then back up the pci-e bus and back into memory

One thing that I believe your over looking is modern processors can also process 4 float operations at once so that probably accounts for quite a bit of it as well. The cpu will be limited by memory in this scenario as well.

Between those two issues you could do it on the cpu before it even transfers down the pci-e bus.

_Flame1_

Author

99

February 01, 2013 02:29 PM

I agree. After i've done this.

clGetDeviceInfo(device[0], CL_DEVICE_MAX_WORK_GROUP_SIZE, sizeof(size_t), &work_group, &ret_work_group);local_ws = work_group;global_ws = var_size / work_group;

then cpu was faster anyway.
But after i've changed function on:

c[iGID] = a[iGID] + sqrt(b[iGID] * b[iGID]);

then gpu was faster in about 3.5 times. So it works. Thanks for your help guys. But i'm wondering about global and local groups. If i have that right the local group is amount of gpu processors which run kernel function simultaneously. And global is count of such passes. Is it right? How should i calculate global group properly?

WhiskyJoe

1,795

February 06, 2013 11:47 AM

Take a look at this: http://3dgep.com/?p=2192

It's an introduction to opencl and explains some of the concepts, the other cuda related articles might also be of use in terms of understanding how the GPU "works".

My portfolio | My personal blog

OpenCL is very slow comparing to cpu.

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

OpenCL is very slow comparing to cpu.

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

Reticulating splines