__kernel void debug(__write_only image2d_t resultTexture)
{
int2 imgCoords = (int2)(get_global_id(0), get_global_id(1));
int2 imgDims = (int2)(get_image_width(resultTexture), get_image_height(resultTexture));
float4 imgVal = (float4)((float)imgCoords.x / (float)imgDims.x, (float)imgCoords.y / (float)imgDims.y, 0.0f, 1.0f);
write_imagef(resultTexture, imgCoords, imgVal);
}My video card is an Nvidia Geforce 285M GTX, with this kernel running in a release build (C++) I'm getting ~750 FPS. That's not low...but it's not as high as I was expecting. I figure calculating this gradient on this card in GLSL would probably give me quite a bit more. Now I know that GLSL is optimized for this sort of thing whereas raw OpenCL is not so it could just be that, but I thought I should make sure before I get into more complex things since I have plans to really tax this card once I figure out the intricacies of OpenCL. Also here is the code I'm using each frame to execute the kernel:
void CLContext::runKernelForScreen(int screenWidth, int screenHeight) {
cl_int result = CL_SUCCESS;
cl::Event ev;
cl::NDRange localRange = cl::NDRange(32, 16);
cl::NDRange globalRange = cl::NDRange(screenWidth, screenHeight);
//make sure OpenGL isn't using anything
glFlush();
//get the OpenGL shared objects
result = _commandQueue.enqueueAcquireGLObjects(&_glObjects, 0, &ev);
ev.wait();
if(result != CL_SUCCESS) {
throw OCException(LookupErrorString(result));
}
//set the argument to be the image
_primaryKernel.setArg(0, _screenTextureImage);
//enqueue operations to perform on the texture
result = _commandQueue.enqueueNDRangeKernel(_primaryKernel, cl::NullRange, globalRange, cl::NullRange, 0, &ev);
ev.wait();
if (result != CL_SUCCESS) {
throw OCException(LookupErrorString(result));
}
result = _commandQueue.enqueueReleaseGLObjects(&_glObjects, 0, &ev);
ev.wait();
if (result != CL_SUCCESS) {
throw OCException(LookupErrorString(result));
}
_commandQueue.finish();
}I profiled this and found that the bulk of the time is spent on the ev.wait() lines, and commenting those out doesn't do any direct harm but only yields around a 100 FPS gain, also at that point the execution time is almost entirely in _commandQueue.finish() for obvious reasons.
If it matters at all, I'm initializing the OpenGL texture as such:
glTexImage2D(GL_TEXTURE_2D, 0, GL_RGBA16F, screenWidth, screenHeight, 0, GL_RGBA, GL_FLOAT, NULL);
And the respective OpenCL texture object is created with:
_screenTextureImage = cl::Image2DGL(_context, CL_MEM_WRITE_ONLY, GL_TEXTURE_2D, 0, textureId, &err);
Lastly in addition to profiling from the host side, I've also used gDebugger to try and see where the issue is, but the tool (at least as I'm capable of using it) doesn't yield much performance data other than to say that on average the kernel uses around
Hopefully I've preempted most of the questions concerning how I'm doing things and someone can make some sense of all this. Is my head on straight here? I don't think I'll be surprised either way if I hear that this is the kind of performance I should or should not expect from OpenCL on this hardware, but like I said I feel like I'd be getting a bit more at this stage from GLSL.






