Hi,
I'm trying to get a little bit of experience with OpenCL. I've successfully made a kernel and it does work, however I do have some questions. I've watch and read many tutorials, however there are stuff that's not clear.
so here's the kernel, its supposed to add together two number and output them:
__kernel void add(__global float* a, __global float* b, __global float* result)
{
int id = get_global_id(0);
result[id] = a[id] + b[id];
}
and here's how I use it:
const int size = 512;
float* a = new float[size];
float* b = new float[size];
float* result = new float[size];
for(int c = 0; c < size; c++)
{
a[c] = (float)(c * 2); //outputs 0, 2, 4, 6, 8, .... 1024
b[c] = (float)((size - c) * 2); //outputs 1024, 1022, 1020 ... 0
}
cl_int error;
const int mem_size = sizeof(float) * size;
cl_mem a_buffer = clCreateBuffer(objs::get()->the_compute_context.the_context, CL_MEM_READ_ONLY | CL_MEM_COPY_HOST_PTR, mem_size, a, &error);
assert(error == CL_SUCCESS);
cl_mem b_buffer = clCreateBuffer(objs::get()->the_compute_context.the_context, CL_MEM_READ_ONLY | CL_MEM_COPY_HOST_PTR, mem_size, b, &error);
assert(error == CL_SUCCESS);
cl_mem result_buffer = clCreateBuffer(objs::get()->the_compute_context.the_context, CL_MEM_WRITE_ONLY, mem_size, NULL, &error);
assert(error == CL_SUCCESS);
the_render.compute.set_kernel_arg("add", 0, sizeof(cl_mem), &a_buffer);
the_render.compute.set_kernel_arg("add", 1, sizeof(cl_mem), &b_buffer);
the_render.compute.set_kernel_arg("add", 2, sizeof(cl_mem), &result_buffer);
const size_t local_ws = 64;
const size_t global_ws = 512;
the_render.compute.execute("add", objs::get()->the_compute_context.the_queue, 1, &global_ws, &local_ws);
the_render.compute.read_back(objs::get()->the_compute_context.the_queue, result_buffer, CL_TRUE, 0, mem_size, result);
for(int c = 0; c < size; c++)
{
std::cout << result[c] << "\n";
}
so what happens is when I set the size to 512 (as now) it outputs "1024" 512 times, this is the normal behaviour. When I change size to 513 it should output "1024" 513 times, right? Now it doesn't. It outputs some random value... To add I can't change global_ws (global work size) to more than 512, because it makes my app crash. Why does that happen?
Best regards,
Yours3!f
Here I come.
"It outputs some random value...": it outputs random values which you not expected to see because it doesn't even executed your kernel. Why?...
Because OpenCL always do checking for valid data before sending to GPU. What data must be a valid?...
check your variables ("global_ws" (when you put there 513) and "local_ws").
At first, global_ws % local_ws must be 0.
Secondly, according to OpenCL limits, local_ws must be greater 0 and less (or equal) value returned from clGetDevice with param CL_DEVICE_MAX_WORK_ITEM_SIZES (with element work_dim id you are using)..
There are other rules but you don't have a problem with them yet.
this outputs limits of local work groups:
size_t dim;
size_t lws[256]={0},s;// I think 256 is enough size
::clGetDeviceInfo(device_id,CL_DEVICE_MAX_WORK_ITEM_DIMENSIONS,4,&dim,&s);
::clGetDeviceInfo(device_id,CL_DEVICE_MAX_WORK_ITEM_SIZES,256*4,lws,&s);
for(int i = 0; i<dim; i++)
printf("local work size [%d] : %d\n", i, lws);
In your code you are using id 0.
When these rules which I said above are failing OpenCL would always return one of these error codes:
CL_INVALID_WORK_DIMENSION
CL_INVALID_WORK_GROUP_SIZE
CL_INVALID_WORK_ITEM_SIZE
When you said "crashes application" what kind of crash was it? (Like "assertion failed" with error code which is not equal to CL_SUCCESS?)
If you would set local_ws to 1 it should work any way but speed of computation always would (slower?) than CPU. (In this case what graphics card you are using... On my pc - slower)
OpenCL Specification: (clEnqueueNDRangeKernel)
arg. local_work_size can also be NULL in which case the OpenCL implementation will determine how to be break the global work-items into appropriate work-group instances.
end.
So, If you would set local_work_size arg of clEnqueueNDRangeKernel (you are using this function aren't you?) to NULL you don't need to worry about how to divide global work to elements (OpenCL will do it).
In this topic was an question "what difference between OpenCL and OpenGL?":
It's easier to ask "what things of OpenCL can be similar to OpenGL?" - GPU and it's memory.
With OpenGL using OpenCL you can do:
To have a memory access to VBOs and also process them by OpenCL kernel (see
clCreateFromGLBuffer)
Also you can do the same to textures and renderbuffers (see
clCreateFromGLTexture2D,
clCreateFromGLTexture3D,
clCreateFromGLRenderbuffer)
I'm successfully integrated OpenCL into my engine and worked with a large amount of data which were proceed by GPU. OpenCL is worth to know. Good luck in work to all you, guys!
Best wishes, FXACE.