clGetDeviceInfo(device[0], CL_DEVICE_MAX_WORK_GROUP_SIZE, sizeof(size_t), &work_group, &ret_work_group);local_ws = work_group;global_ws = var_size / work_group;then cpu was faster anyway.
But after i've changed function on:
c[iGID] = a[iGID] + sqrt(b[iGID] * b[iGID]);then gpu was faster in about 3.5 times. So it works. Thanks for your help guys. But i'm wondering about global and local groups. If i have that right the local group is amount of gpu processors which run kernel function simultaneously. And global is count of such passes. Is it right? How should i calculate global group properly?