S M T W T F S
1
2345678
9101112131415
16171819202122
23242526 27 2829
30

# Realtime raytracing with OpenCL II

Hello,

In the first chapter of Realtime raytracing with OpenCL, we talked about how to use the OpenCL API for general computations and I wrote a program to sum 2 arrays on the GPU.

Now I will explain how to write the raytracer itself.
A couple of things first :
I will not go into too much detail about the theory behind a raytracer. That's outside the scope of this entry - the point here is to show how to use OpenCL to accelerate raytracing.
If you want to know how this stuff works,follow the excellent article by Jacco Bikker (Phantom on these forums)

Also, in order to be as concise as possible, I will not explain the C++ side of things.
I will post the full source at the end, so you can check it if you have any trouble undestanding how everything fits together, but if you read Part I you should be able to do that on your own.
Let's begin.

Data structures

OpenCL allows you to create structs (as in the C language).
We are going to need several of them to organize the code a bit :

struct Material{
/* 0 - Standard diffuse color, 1 - Compute 'Chessboard' texture */
int computeColorType;
float4 color;
float reflectivity;
float refractivity;
};

struct Material createMaterial()
{
struct Material m;
m.color = (float4)(1,1,1,1);
m.computeColorType = 0;
m.reflectivity = 0;
m.refractivity = 0;
return m;
}

struct Sphere{
struct Material* m;
float3 pos;
};

struct Plane{
struct Material* m;
float3 normal;
float3 point;
};

struct Ray{
float3 origin;
float3 dir;
};

struct Light{
float3 pos;
float3 dir;
bool directional;
float4 color;
};

Not the most pretty way to do it, but works. You can't have constructors by the way, so I created the above "createMaterial" function that just creates new materials and fills them with defaults.

struct Scene{
struct Sphere spheres[10];
int spheresCount;

struct Plane planes[10];
int planesCount;

struct Light lights[10];
int lightsCount;

struct Material standardMaterial;
};

The scene just contains all our spheres, planes and lights. The standard material is applied to geometries with no materials (where the m pointer is null).

The kernel

Now I'm going to skip all the mambo-jambo and jump right into the kernel function to show you how to use the data structures and setup the pipeline. Then I will explain all the peripheral methods that every raytracer has.

Important note : I'm building the whole scene in the OpenCL code and not on the C++ side.
This was specifically allowed by my lecturer, so I used it. If you want the raytracer to be reusable, you need to move the scene creation to the C++ program and pass it to the OpenCL program. It will probably be faster as well (if you organize it properly).

__kernel void main( __global float4 *dst, uint width, uint height, __global float* viewTransform, __global float* worldTransforms )

Let me explain the kernel function's parameters
dst - the output buffer to which we write our rendered image. Its of size width*height
width - the width of the output buffer / resolution of rendering
height - the height of the output buffer / resolution of rendering
viewTransform - the camera matrix
worldTransforms - an array of objects transform (could be more than one).

So lets create our materials first

struct Scene scene;

scene.standardMaterial = createMaterial();
scene.standardMaterial.reflectivity = 0;
scene.standardMaterial.computeColorType = 1;

struct Material floorMaterial = createMaterial();
floorMaterial.reflectivity = 0.5;
floorMaterial.computeColorType = 1;

struct Material ballMaterial1 = createMaterial();
ballMaterial1.reflectivity = 1;
ballMaterial1.color = (float4)(1,0,0,1);
struct Material ballMaterial2 = createMaterial();
ballMaterial2.reflectivity = 1;
ballMaterial2.color = (float4)(0,0,1,1);
struct Material ballMaterial3 = createMaterial();
ballMaterial3.reflectivity = 1;
ballMaterial3.color = (float4)(1,1,1,1);

struct Material refractMaterial = createMaterial();
refractMaterial.refractivity = 1;

Now fill in the geometry. Not too much to explain there.

scene.spheresCount = 2;
scene.spheres[0].pos = (float3)(0,0,0);
scene.spheres[0].m = &ballMaterial1;
scene.spheres[1].pos = (float3)(0,0,-0);
scene.spheres[1].m = &ballMaterial2;

scene.planesCount = 5;
scene.planes[0].point = (float3)(0,-5,0);
scene.planes[0].normal = (float3)(0,1,0);
scene.planes[0].m	  = &floorMaterial;
scene.planes[1].point = (float3)(0,40,0);
scene.planes[1].normal = normalize((float3)(0,-1,0));
scene.planes[2].point = (float3)(-40,-5,0);
scene.planes[2].normal = (float3)(1,1,0);
scene.planes[3].point = (float3)(40,-5,0);
scene.planes[3].normal = normalize((float3)(-1,1,0));

scene.planes[4].point = (float3)(0,0,0);
scene.planes[4].normal = normalize((float3)(0,0,-1));
scene.planes[4].m = &refractMaterial;

scene.lightsCount = 2;
scene.lights[0].pos = (float3)(0,30,-20);
scene.lights[0].directional = false;
scene.lights[0].color = (float4)(1,1,1,1);
scene.lights[1].pos = (float3)(0,30,20);
scene.lights[1].dir = normalize((float3)(0,1,1));
scene.lights[1].directional = false;
scene.lights[1].color = (float4)(1,1,1,1);

Now, since in our demo we have 2 spheres moving we want to transform their positions by the worldTransforms
scene.spheres[0].pos = matrixVectorMultiply(worldTransforms, &scene.spheres[0].pos);
scene.spheres[1].pos = matrixVectorMultiply(worldTransforms+16, &scene.spheres[1].pos);

If you build your scene on the c++ side out of triangles for example, you could specify all your coordinates in world coordinates, which would make this step unnecessary.

Finally do the raytracing (+antialiasing) and store the result pixel color in the buffer

float dx = 1.0f / (float)(width);
float dy = 1.0f / (float)(height);
float aspect = (float)(width) / (float)(height);

dst[get_global_id(0)] = (float4)(0,0,0,0);
for(int i = 0; i < kAntiAliasingSamples; i++){
for(int j = 0; j < kAntiAliasingSamples; j++){
float x = (float)(get_global_id(0) % width) / (float)(width) + dx*i/kAntiAliasingSamples;
float y = (float)(get_global_id(0) / width) / (float)(height) + dy*j/kAntiAliasingSamples;

x = (x -0.5f)*aspect;
y = y -0.5f;

struct Ray r;
r.origin = matrixVectorMultiply(viewTransform, &(float3)(0, 0, -1));
r.dir	= normalize(matrixVectorMultiply(viewTransform, &(float3)(x, y, 0)) - r.origin);
float4 color = raytrace(&r, &scene, 0);
dst[get_global_id(0)] += color / (kAntiAliasingSamples*kAntiAliasingSamples) ;
}
}

Now we need the following perihperal functions : matrixVectorMultiply and raytrace.
float3 matrixVectorMultiply(__global float* matrix, float3* vector){
float3 result;
result.x = matrix[0]*((*vector).x)+matrix[4]*((*vector).y)+matrix[8]*((*vector).z)+matrix[12];
result.y = matrix[1]*((*vector).x)+matrix[5]*((*vector).y)+matrix[9]*((*vector).z)+matrix[13];
result.z = matrix[2]*((*vector).x)+matrix[6]*((*vector).y)+matrix[10]*((*vector).z)+matrix[14];
return result;
}
}

Raytrace is the function where the actual raytracing happes, so we might want to look at it in more detail :

float4 raytrace(struct Ray* ray, struct Scene* scene,int traceDepth)

We accept a ray, the scene and the depth at which we currently are in recursive tracing.

The following code :
void* intersectObj = 0;
int intersectObjType = 0;
float t = intersect( ray, scene, &intersectObj, &intersectObjType);

finds the first intersection of the ray in the scene and returns a pointer to the object, as well the type of this object.
There is no polymorphism in OpenCL, so we need this to differentiate between the objects.

Now compute the normal based on the object type and get its material.

float4 color = (float4)(0,0,0,0);
if ( t < kMaxRenderDist ){
float3 intersectPos = ray->origin+ray->dir*t ;
float3 normal;

struct Material* m = 0;

if ( intersectObjType == 1 ){
normal = normalize(intersectPos-((struct Sphere*)intersectObj)->pos);
m = ((struct Sphere*)intersectObj)->m;
}
else if (intersectObjType == 2 ){
normal = ((struct Plane*)intersectObj)->normal;
m = ((struct Plane*)intersectObj)->m;
}

if ( !m ){
m = &scene->standardMaterial;
}

If there is no material we use the "standard material"

Time to compute the color. I used a procedural checkboard texture for some of the planes, so we need to check the field "computeColorType".
This is a good place to plug in any texturing code you might want to add. You could, for example use ""computeColorType = 2" for textured materials and supply a texture id.

float4 diffuseColor = m->color;

if ( m->computeColorType == 1){
if ( (int)(intersectPos.x/5.0f) % 2 == 0 ){
if ( (int)(intersectPos.z/5.0f) % 2 == 0 ){
diffuseColor = (float4)(0,0,0,0);
}
}
else{
if ( (int)(intersectPos.z/5.0f) % 2 != 0 ){
diffuseColor = (float4)(0,0,0,0);
}
}
}

Reflection and refraction. We use raytrace recursively and increase the recursion depth :

if ( traceDepth < kMaxTraceDepth && m->reflectivity > 0 ){
struct Ray reflectRay;
float3 R = reflect(ray->dir, normal);
reflectRay.origin = intersectPos + R*0.001;
reflectRay.dir	= R;
diffuseColor += m->reflectivity*raytrace(&reflectRay, scene, traceDepth+1);
}

if ( traceDepth < kMaxTraceDepth && m->refractivity > 0 ){
struct Ray refractRay;
float3 R = refract(ray->dir, normal, 0.6);
if ( dot(R,normal) < 0 ){
refractRay.origin = intersectPos + R*0.001;
refractRay.dir	= R;
diffuseColor = m->refractivity*raytrace(&refractRay, scene, traceDepth+1);
}
}

Next add lights contribution for this ray. Note that there is some room for optimization here :
We could have computed the light's contribution first (by adding pointLit*scene->lights[i].color*max(0.0f,dot(normal, L)) to color ).
Then if color was close to black we could skip the diffuseColor computation althogether (including reflection and refraction).

for(int i = 0; i < scene->lightsCount; i++){
float3 L = scene->lights[i].dir;
float lightDist = kMaxRenderDist;
if ( !scene->lights[i].directional ){
L = scene->lights[i].pos - intersectPos ;
lightDist = length(L);
L = normalize(L);
}

float pointLit = 1;
shadowRay.origin = intersectPos + L*0.001;
t = intersect( &shadowRay, scene, &intersectObj, &intersectObjType);
if ( t < lightDist ){
pointLit = 0;
}
color += pointLit*diffuseColor*scene->lights[i].color*max(0.0f,dot(normal, L));
}
}
return clamp(color,0,1);
We also shoot the shadow rays here. It might be a good idea to add some indication that this is a shadow ray to the intersect routine, because we might make additional optimization : we don't need to find the closest intersetction, but the first intersection that's closer than the light (if there is one).

Finally we return the color and clamp each component between [0,1]

Now we need 3 more functions : reflect, refract and intersect.

float3 reflect(float3 V, float3 N){
return V - 2.0f * dot( V, N ) * N;
}

float3 refract(float3 V, float3 N, float refrIndex)
{
float cosI = -dot( N, V );
float cosT2 = 1.0f - refrIndex * refrIndex * (1.0f - cosI * cosI);
return (refrIndex * V) + (refrIndex * cosI - sqrt( cosT2 )) * N;
}

Intersection is pretty straightforward. We look for the closest intersection and save the object and its type.

float intersect(struct Ray* ray, struct Scene* scene, void** object, int* type)
{
float minT = kMaxRenderDist;

for(int i = 0; i < scene->spheresCount; i++){
float t;
if ( raySphere( &scene->spheres[i], ray, &t ) ){
if ( t < minT ){
minT = t;
*type = 1;
*object = &scene->spheres[i];
}
}
}

for(int i = 0; i < scene->planesCount; i++){
float t;
if ( rayPlane( &scene->planes[i], ray, &t ) ){
if ( t < minT ){
minT = t;
*type = 2;
*object = &scene->planes[i];
}
}
}

return minT;
}

Finally, the functions to intersect ray with plane and sphere :
bool raySphere(struct Sphere* s, struct Ray* r, float* t)
{
float3 rayToCenter = s->pos - r->origin ;
float dotProduct = dot(r->dir,rayToCenter);

if ( d < 0)
return false;

*t = (dotProduct - sqrt(d) );

if ( *t < 0 ){
*t = (dotProduct + sqrt(d) ) ;
if ( *t < 0){
return false;
}
}

return true;
}

bool rayPlane(struct Plane* p, struct Ray* r, float* t)
{
float dotProduct = dot(r->dir,p->normal);
if ( dotProduct == 0){
return false;
}
*t = dot(p->normal,p->point-r->origin) / dotProduct ;

return *t >= 0;
}

The full source of the OpenCL program

const int kAntiAliasingSamples  = 2;
const int kMaxTraceDepth = 2;
const float kMaxRenderDist = 1000.0f;

struct Material{
/* 0 - Standard diffuse color, 1 - Compute 'Chessboard' texture */
int computeColorType;
float4 color;
float reflectivity;
float refractivity;
};

struct Material createMaterial()
{
struct Material m;
m.color = (float4)(1,1,1,1);
m.computeColorType = 0;
m.reflectivity = 0;
m.refractivity = 0;
return m;
}

struct Sphere{
struct Material* m;
float3 pos;
};

struct Plane{
struct Material* m;
float3 normal;
float3 point;
};

struct Ray{
float3 origin;
float3 dir;
};

struct Light{
float3 pos;
float3 dir;
bool directional;
float4 color;
};

struct Scene{
struct Sphere spheres[10];
int spheresCount;

struct Plane planes[10];
int planesCount;

struct Light lights[10];
int lightsCount;

struct Material standardMaterial;
};

float3 reflect(float3 V, float3 N){
return V - 2.0f * dot( V, N ) * N;
}

float3 refract(float3 V, float3 N, float refrIndex)
{
float cosI = -dot( N, V );
float cosT2 = 1.0f - refrIndex * refrIndex * (1.0f - cosI * cosI);
return (refrIndex * V) + (refrIndex * cosI - sqrt( cosT2 )) * N;
}

bool raySphere(struct Sphere* s, struct Ray* r, float* t)
{
float3 rayToCenter = s->pos - r->origin ;
float dotProduct = dot(r->dir,rayToCenter);

if ( d < 0)
return false;

*t = (dotProduct - sqrt(d) );

if ( *t < 0 ){
*t = (dotProduct + sqrt(d) ) ;
if ( *t < 0){
return false;
}
}

return true;
}

bool rayPlane(struct Plane* p, struct Ray* r, float* t)
{
float dotProduct = dot(r->dir,p->normal);
if ( dotProduct == 0){
return false;
}
*t = dot(p->normal,p->point-r->origin) / dotProduct ;

return *t >= 0;
}

float intersect(struct Ray* ray, struct Scene* scene, void** object, int* type)
{
float minT = kMaxRenderDist;

for(int i = 0; i < scene->spheresCount; i++){
float t;
if ( raySphere( &scene->spheres[i], ray, &t ) ){
if ( t < minT ){
minT = t;
*type = 1;
*object = &scene->spheres[i];
}
}
}

for(int i = 0; i < scene->planesCount; i++){
float t;
if ( rayPlane( &scene->planes[i], ray, &t ) ){
if ( t < minT ){
minT = t;
*type = 2;
*object = &scene->planes[i];
}
}
}

return minT;
}

float4 raytrace(struct Ray* ray, struct Scene* scene,int traceDepth)
{
void* intersectObj = 0;
int intersectObjType = 0;
float t = intersect( ray, scene, &intersectObj, &intersectObjType);

float4 color = (float4)(0,0,0,0);
if ( t < kMaxRenderDist ){
float3 intersectPos = ray->origin+ray->dir*t ;
float3 normal;

struct Material* m = 0;

if ( intersectObjType == 1 ){
normal = normalize(intersectPos-((struct Sphere*)intersectObj)->pos);
m = ((struct Sphere*)intersectObj)->m;
}
else if (intersectObjType == 2 ){
normal = ((struct Plane*)intersectObj)->normal;
m = ((struct Plane*)intersectObj)->m;
}

if ( !m ){
m = &scene->standardMaterial;
}

float4 diffuseColor = m->color;

if ( m->computeColorType == 1){
if ( (int)(intersectPos.x/5.0f) % 2 == 0 ){
if ( (int)(intersectPos.z/5.0f) % 2 == 0 ){
diffuseColor = (float4)(0,0,0,0);
}
}
else{
if ( (int)(intersectPos.z/5.0f) % 2 != 0 ){
diffuseColor = (float4)(0,0,0,0);
}
}
}
if ( traceDepth < kMaxTraceDepth && m->reflectivity > 0 ){
struct Ray reflectRay;
float3 R = reflect(ray->dir, normal);
reflectRay.origin = intersectPos + R*0.001;
reflectRay.dir    = R;
diffuseColor += m->reflectivity*raytrace(&reflectRay, scene, traceDepth+1);
}

if ( traceDepth < kMaxTraceDepth && m->refractivity > 0 ){
struct Ray refractRay;
float3 R = refract(ray->dir, normal, 0.6);
if ( dot(R,normal) < 0 ){
refractRay.origin = intersectPos + R*0.001;
refractRay.dir    = R;
diffuseColor = m->refractivity*raytrace(&refractRay, scene, traceDepth+1);
}
}

for(int i = 0; i < scene->lightsCount; i++){
float3 L = scene->lights[i].dir;
float lightDist = kMaxRenderDist;
if ( !scene->lights[i].directional ){
L = scene->lights[i].pos - intersectPos ;
lightDist = length(L);
L = normalize(L);
}

float pointLit = 1;
shadowRay.origin = intersectPos + L*0.001;
t = intersect( &shadowRay, scene, &intersectObj, &intersectObjType);
if ( t < lightDist ){
pointLit = 0;
}
color += pointLit*diffuseColor*scene->lights[i].color*max(0.0f,dot(normal, L));
}
}
return clamp(color,0,1);
}

float3 matrixVectorMultiply(__global float* matrix, float3* vector){
float3 result;
result.x = matrix[0]*((*vector).x)+matrix[4]*((*vector).y)+matrix[8]*((*vector).z)+matrix[12];
result.y = matrix[1]*((*vector).x)+matrix[5]*((*vector).y)+matrix[9]*((*vector).z)+matrix[13];
result.z = matrix[2]*((*vector).x)+matrix[6]*((*vector).y)+matrix[10]*((*vector).z)+matrix[14];
return result;
}

__kernel void main( __global float4 *dst, uint width, uint height, __global float* viewTransform, __global float* worldTransforms )
{
struct Scene scene;

scene.standardMaterial = createMaterial();
scene.standardMaterial.reflectivity = 0;
scene.standardMaterial.computeColorType = 1;

struct Material floorMaterial = createMaterial();
floorMaterial.reflectivity = 0.5;
floorMaterial.computeColorType = 1;

struct Material ballMaterial1 = createMaterial();
ballMaterial1.reflectivity = 1;
ballMaterial1.color = (float4)(1,0,0,1);
struct Material ballMaterial2 = createMaterial();
ballMaterial2.reflectivity = 1;
ballMaterial2.color = (float4)(0,0,1,1);
struct Material ballMaterial3 = createMaterial();
ballMaterial3.reflectivity = 1;
ballMaterial3.color = (float4)(1,1,1,1);

struct Material refractMaterial = createMaterial();
refractMaterial.refractivity = 1;

scene.spheresCount = 2;
scene.spheres[0].pos = (float3)(0,0,0);
scene.spheres[0].m = &ballMaterial1;
scene.spheres[1].pos = (float3)(0,0,-0);
scene.spheres[1].m = &ballMaterial2;

scene.planesCount = 5;
scene.planes[0].point = (float3)(0,-5,0);
scene.planes[0].normal = (float3)(0,1,0);
scene.planes[0].m      = &floorMaterial;
scene.planes[1].point = (float3)(0,40,0);
scene.planes[1].normal = normalize((float3)(0,-1,0));
scene.planes[2].point = (float3)(-40,-5,0);
scene.planes[2].normal = (float3)(1,1,0);
scene.planes[3].point = (float3)(40,-5,0);
scene.planes[3].normal = normalize((float3)(-1,1,0));

scene.planes[4].point = (float3)(0,0,0);
scene.planes[4].normal = normalize((float3)(0,0,-1));
scene.planes[4].m = &refractMaterial;

scene.lightsCount = 2;
scene.lights[0].pos = (float3)(0,30,-20);
scene.lights[0].directional = false;
scene.lights[0].color = (float4)(1,1,1,1);
scene.lights[1].pos = (float3)(0,30,20);
scene.lights[1].dir = normalize((float3)(0,1,1));
scene.lights[1].directional = false;
scene.lights[1].color = (float4)(1,1,1,1);

scene.spheres[0].pos = matrixVectorMultiply(worldTransforms, &scene.spheres[0].pos);
scene.spheres[1].pos = matrixVectorMultiply(worldTransforms+16, &scene.spheres[1].pos);

float dx = 1.0f / (float)(width);
float dy = 1.0f / (float)(height);
float aspect = (float)(width) / (float)(height);

dst[get_global_id(0)] = (float4)(0,0,0,0);
for(int i = 0; i < kAntiAliasingSamples; i++){
for(int j = 0; j < kAntiAliasingSamples; j++){
float x = (float)(get_global_id(0) % width) / (float)(width) + dx*i/kAntiAliasingSamples;
float y = (float)(get_global_id(0) / width) / (float)(height) + dy*j/kAntiAliasingSamples;

x = (x -0.5f)*aspect;
y = y -0.5f;

struct Ray r;
r.origin = matrixVectorMultiply(viewTransform, &(float3)(0, 0, -1));
r.dir    = normalize(matrixVectorMultiply(viewTransform, &(float3)(x, y, 0)) - r.origin);
float4 color = raytrace(&r, &scene, 0);
dst[get_global_id(0)] += color / (kAntiAliasingSamples*kAntiAliasingSamples) ;
}
}

}

The full source of the C++ program

#include <CL/cl.h>
#include <iostream>
#include <fstream>
#include <sstream>

#include <SDL/SDL.h>
#include <SDL/SDL_opengl.h>

const int kWidth = 1366;
const int kHeight = 768;
const bool kFullscreen = true;

size_t global_work_size = kWidth * kHeight;

float viewMatrix[16];

float sphere1Pos[3] = {0,0,10};
float sphere2Pos[3] = {0,0,-10};
float sphereVelocity = 1;
float sphereTransforms[2][16];

cl_command_queue queue;
cl_kernel kernel;
cl_mem buffer, viewTransform, worldTransforms;

void InitOpenCL()
{
// 1. Get a platform.
cl_platform_id platform;

clGetPlatformIDs( 1, &platform, NULL );
// 2. Find a gpu device.
cl_device_id device;

clGetDeviceIDs( platform, CL_DEVICE_TYPE_GPU,
1,
&device,
NULL);
// 3. Create a context and command queue on that device.
cl_context context = clCreateContext( NULL,
1,
&device,
NULL, NULL, NULL);
queue = clCreateCommandQueue( context,
device,
0, NULL );
// 4. Perform runtime source compilation, and obtain kernel entry point.
std::ifstream file("kernel.txt");
std::string source;
while(!file.eof()){
char line[256];
file.getline(line,255);
source += line;
}

cl_ulong maxSize;
clGetDeviceInfo(device, CL_DEVICE_MAX_MEM_ALLOC_SIZE , sizeof(cl_ulong), &maxSize, 0);

const char* str = source.c_str();
cl_program program = clCreateProgramWithSource( context,
1,
&str,
NULL, NULL );
cl_int result = clBuildProgram( program, 1, &device, NULL, NULL, NULL );
if ( result ){
std::cout << "Error during compilation! (" << result << ")" << std::endl;
}
kernel = clCreateKernel( program, "main", NULL );
// 5. Create a data buffer.
buffer        = clCreateBuffer( context,
CL_MEM_WRITE_ONLY,
kWidth * kHeight *sizeof(cl_float4),
NULL, 0 );
viewTransform = clCreateBuffer( context,
16 *sizeof(cl_float),
NULL, 0 );

worldTransforms = clCreateBuffer( context,
16 *sizeof(cl_float)*2,
NULL, 0 );

clSetKernelArg(kernel, 0, sizeof(buffer), (void*) &buffer);
clSetKernelArg(kernel, 1, sizeof(cl_uint), (void*) &kWidth);
clSetKernelArg(kernel, 2, sizeof(cl_uint), (void*) &kWidth);
clSetKernelArg(kernel, 3, sizeof(viewTransform), (void*) &viewTransform);
clSetKernelArg(kernel, 4, sizeof(worldTransforms), (void*) &worldTransforms);
}

void Render(int delta)
{

clEnqueueNDRangeKernel(   queue,
kernel,
1,
NULL,
&global_work_size,
NULL, 0, NULL, NULL);

// 7. Look at the results via synchronous buffer map.
cl_float4 *ptr = (cl_float4 *) clEnqueueMapBuffer( queue,
buffer,
CL_TRUE,
0,
kWidth * kHeight * sizeof(cl_float4),
0, NULL, NULL, NULL );

cl_float *viewTransformPtr = (cl_float *) clEnqueueMapBuffer( queue,
viewTransform,
CL_TRUE,
CL_MAP_WRITE,
0,
16 * sizeof(cl_float),
0, NULL, NULL, NULL );

cl_float *worldTransformsPtr = (cl_float *) clEnqueueMapBuffer( queue,
worldTransforms,
CL_TRUE,
CL_MAP_WRITE,
0,
16 * sizeof(cl_float)*2,
0, NULL, NULL, NULL );

memcpy(viewTransformPtr, viewMatrix, sizeof(float)*16);
memcpy(worldTransformsPtr, sphereTransforms[0], sizeof(float)*16);
memcpy(worldTransformsPtr+16, sphereTransforms[1], sizeof(float)*16);

clEnqueueUnmapMemObject(queue, viewTransform, viewTransformPtr, 0, 0, 0);
clEnqueueUnmapMemObject(queue, worldTransforms, worldTransformsPtr, 0, 0, 0);

unsigned char* pixels = new unsigned char[kWidth*kHeight*4];
for(int i=0; i <  kWidth * kHeight; i++){
pixels[i*4] = ptr[i].s[0]*255;
pixels[i*4+1] = ptr[i].s[1]*255;
pixels[i*4+2] = ptr[i].s[2]*255;
pixels[i*4+3] = 1;
}

glBindTexture(GL_TEXTURE_2D, 1);
glTexParameterf( GL_TEXTURE_2D, GL_TEXTURE_MIN_FILTER, GL_LINEAR );
glTexParameterf( GL_TEXTURE_2D, GL_TEXTURE_MAG_FILTER, GL_LINEAR );
glTexImage2D(GL_TEXTURE_2D, 0, 4, kWidth, kHeight, 0, GL_RGBA, GL_UNSIGNED_BYTE, pixels);
delete [] pixels;

glClearColor(1,1,1,1);
glClear(GL_COLOR_BUFFER_BIT | GL_DEPTH_BUFFER_BIT);

glMatrixMode(GL_PROJECTION);
glOrtho(-1,1,1,-1,1,100);
glMatrixMode(GL_MODELVIEW);

glTexCoord2f(0,1);
glVertex3f(-1,-1,-1);
glTexCoord2f(0,0);
glVertex3f(-1,1,-1);
glTexCoord2f(1,0);
glVertex3f(1,1,-1);
glTexCoord2f(1,1);
glVertex3f(1,-1,-1);
glEnd();

SDL_GL_SwapBuffers();
clFinish( queue );
}

void Update(int delta)
{
int count;
Uint8* keys = SDL_GetKeyState(&count);

float translate[3] = {0,0,0};
if ( keys[SDLK_DOWN] ){
translate[2] = -0.01*delta;
}
if ( keys[SDLK_UP] ){
translate[2] = 0.01*delta;
}
if ( keys[SDLK_LEFT] ){
translate[0] =- 0.01*delta;
}
if ( keys[SDLK_RIGHT] ){
translate[0] = 0.01*delta;
}

int x,y;
SDL_GetMouseState(&x,&y);
int relX = (kWidth/2.0f - x)*delta;
int relY = (kHeight/2.0f - y)*delta;
SDL_WarpMouse(kWidth/2.0f, kHeight/2.0f);

glMatrixMode(GL_MODELVIEW);

glMultMatrixf(viewMatrix);
glTranslatef(translate[0],translate[1],translate[2]);

if ( relX != 0){
glRotatef(-relX/200.0f, 0, 1, 0);
}
if ( relY != 0){
glRotatef(-relY/200.0f, 1, 0, 0);
}

glGetFloatv(GL_MODELVIEW_MATRIX, viewMatrix);

// Sphere Transforms
glTranslatef(0, 0, sphere1Pos[2]);
glGetFloatv(GL_MODELVIEW_MATRIX, sphereTransforms[0]);

glTranslatef(0, 0, sphere2Pos[2]);
glGetFloatv(GL_MODELVIEW_MATRIX, sphereTransforms[1]);

sphere1Pos[2] += sphereVelocity*delta/30.0f;
sphere2Pos[2] += sphereVelocity*(-1)*delta/30.0f;

if ( sphere1Pos[2] > 50 ){
sphereVelocity = -1;
}
else if ( sphere1Pos[2] < -50 ){
sphereVelocity = 1;
}
}

int main(int argc, char* argv[])
{
InitOpenCL();

memset(viewMatrix, 0, sizeof(float)*16);
viewMatrix[0] = viewMatrix[5] = viewMatrix[10] = viewMatrix[15] = 1;

SDL_Init(SDL_INIT_EVERYTHING);

Uint32 flags = SDL_OPENGL;
if ( kFullscreen ){
flags |= SDL_FULLSCREEN;

SDL_ShowCursor(0);
}

SDL_SetVideoMode(kWidth, kHeight, 32, flags);

glEnable(GL_TEXTURE_2D);

bool loop = true;
int lastTicks = SDL_GetTicks();
while(loop){
int delta = SDL_GetTicks() - lastTicks;
lastTicks = SDL_GetTicks();
SDL_Event e;
while(SDL_PollEvent(&e)){
if ( e.type == SDL_QUIT ){
loop = false;
}
else if ( e.type == SDL_KEYDOWN && e.key.keysym.sym == SDLK_ESCAPE){
loop = false;
}
}

Update(delta);
Render(delta);

std::stringstream ss;
ss << 1000.0f / delta ;
SDL_WM_SetCaption(ss.str().c_str(), 0);
}

return 0;
}

That's it, hope you enjoyed the series
I will upload the VS 2010 project and executable later tonight.

#### Attached Files

Feb 19 2012 04:25 PM
one question: one wikipedia and some other sides it is stated that opencl c does not support recursion, which seems to be wrong, since you are using it and it seems to work ?!
Feb 20 2012 09:15 AM
I've done some research, and you are correct - every single source seems to point out that OpenCL does not support recursion.
However, in my code recursion definitely works (as seen from the reflection and refraction in the images).
My guess is either the latest version of OpenCL recently changed that or more likely the ATI drivers automatically emulates the recursion for you.
I'm pretty sure there is performance overhead in doing so, and even more so - the recursion rays are not computed in parallel .

So regardless if OpenCL support recursion or not, you probably should use non-recursive raytracing for optimum speed.
Oct 06 2012 07:52 AM
Nice, I recently wanted to do the same. After some googleing, i found this project.
I just wanted to add : Defenitily no recursion. I tried to compile your program on my machine (Arch Linux Kernel 3.5.4-1 with AMD Catalyst drivers) and the opencl compiler reported error -11 (or recusive function found!)

After commenting out the recursion, it runs fine with respectable 12 frames on 640x480 (AMD E-450 APU) and at 4-5 frames at 1280x768. Nice results for raytracing (in my opinion).
One Question : You mention in your article that it would be nicer to pass the scene from c++ to opencl.
Do you have links where it is explained how to do so? I whould be very interested in them (might wanna share them ?)
Feb 03 2013 04:48 PM
<p>Good post!</p>
<p>I found the same problem of @jopster the compiler give me error -11 running on the nvidea driver .</p>
<p>I will try to convert in a not recursive version, if someone already have the not recursive release it will be appreciated.</p>

Note: GameDev.net moderates comments.