1) clSetKernelArg(kernel, 0, sizeof(vbo), &vbo) should be: clSetKernelArg(kernel, 0, sizeof(cl_mem), &PositionArray)
2) remove clWaitForEvents(1, &wait) (it is not necessary)
3) clEnqueueNDRangeKernel can take NULL as a local size argument
You usually convert the BVH construction problem to a sorting problem. This can be done with the use of morton codes, see for example this paper http://mgarland.org/...pers/gpubvh.pdf Bitonic sort and Radix sorts can be implemented very efficiently on modern GPUs, see here for example for an OpenCL implementation: http://www.bealto.co...ting_intro.html