Jump to content

  • Log In with Google      Sign In   
  • Create Account

One man show



Tetris in Ruby

Posted by , 24 February 2012 - - - - - - · 2,822 views

Hello,

My exams are over (finally).
One of the courses I took this semester was introduction to the Ruby language.
For the final exam, we had to make some project of moderate complexity.
The result was like 90% web apps (written using Rails), several console apps and several games (most of which were chess simulators, they seem to be beloved among the academia).

So I figured I would do a very basic game I never did before - Tetris.
Attached Image

Ruby is a really cool language for prototyping and its amazing how much time it could save you with its "blocks passed as arguments" and extensive Enumerable class methods.

For the graphics front-end/window creation and sound, I used gosu.
Its amazingly simple and fast enough for the job.

It took me a few hours to program the whole thing (+unit tests).
The result is not bad, but apparently not good enough (I got a 4 out of 6 score).

The biggest remark I got from my lecturer was that my rendering and game logic was too tightly coupled together.
Which is true, of course.

However, I believe that decoupling them would complicate things quite a bit (and thus make the project bigger, andwould take me more time).
I just don't get why the professors always think an over-engineered, but "extensible and maintainable" solution isbetter than a simple one. Its not like I'm going to bring my course work into a full blown product.

Source
require 'gosu'

class Block
  attr_accessor :falling
  attr_accessor :x, :y, :width, :height, :color

  @@image = nil

  def initialize(game)
    # Image is loaded only once for all blocks
  	if @@image == nil
		@@image = Gosu::Image.new(game, "block.png", false) 
	end
	
	@x = 0
	@y = 0
	@width  = @@image.width;
	@height = @@image.height
	@game = game
	@color = 0xffffffff
  end

  def draw
    @@image.draw(@x, @y, 0, 1, 1, @color)
  end

  def collide(block)
    # Two blocks collide only when they are at the same position, since the world is a grid
    return (block.x == @x && block.y == @y)
  end

  def collide_with_other_blocks
    @game.blocks.each do |block|
      if collide(block)
	    return block
	  end
	end
    nil
  end
end

class Shape
  attr_accessor :rotation
  def initialize(game)
    @game = game
    @last_fall_update = Gosu::milliseconds 
    @last_move_update = Gosu::milliseconds 
	
	@blocks = [Block.new(game), Block.new(game), Block.new(game), Block.new(game) ]

	@x = @y = 0
	@falling = true
	
	# Rotation is done about this block
	@rotation_block = @blocks[1]
	# How many rotations we can do before a full cycle?
	@rotation_cycle = 1
	# Current rotation state
	@rotation = 0
  end

  def apply_rotation
    # Each rotation is a 90 degree in the clockwise direction
    if @rotation_block != nil
		(1..@rotation.modulo(@rotation_cycle)).each do |i|
		  @blocks.each do |block|
			old_x = block.x
			old_y = block.y
			block.x = @rotation_block.x + (@rotation_block.y - old_y)
			block.y = @rotation_block.y - (@rotation_block.x - old_x)
		  end
		end
    end
  end

  # Note that the following function is defined properly only when the object is unrotated
  # Otherwise the line of symmetry will be misplaced and wrong results will be produced
  def reverse
    # Mirror the shape by the y axis, effectively creating shape counterparts such as 'L' and 'J'
    center = (get_bounds[2] + get_bounds[0]) / 2.0
    @blocks.each do |block|
	  block.x = 2*center - block.x - @game.block_width
	end
  end

  def get_bounds
    # Go throug all blocks to find the bounds of this shape
    x_min = []
	y_min = []
	x_max = []
	y_max = []
    @blocks.each do |block| 
	  x_min << block.x
	  y_min << block.y
	  
	  x_max << block.x + block.width
	  y_max << block.y + block.height
	end

	return [x_min.min, y_min.min, x_max.max, y_max.max]
  end

  # Updates to movement are done periodically to allow the player time for reaction
  def needs_fall_update?
    if ( @game.button_down?(Gosu::KbDown) )
      updateInterval = 100
	else
	  updateInterval = 500 - @game.level*50
	end
	if ( Gosu::milliseconds - @last_fall_update > updateInterval )
      @last_fall_update = Gosu::milliseconds 
	end
  end

  def needs_move_update?
	if ( Gosu::milliseconds - @last_move_update > 100 )
	  @last_move_update = Gosu::milliseconds 
    end
  end

  def draw
    get_blocks.each { |block| block.draw }
  end

  def update
    if ( @falling ) 
	  # After a movement or gravity update, we check if the moved shape collides with the world.
	  # If it does, we restore its position to the last known good position
	  old_x = @x
	  old_y = @y
	  
	  if needs_fall_update?
		@y = (@y + @game.block_height)
	  end
	  
	  # Important to note is that we do 2 collision checks - once we moved on the x axis and once we moved on the y axis
	  # This way we can determine which of the 2 movements is responisble for the collision and learn on which side of the colliding block
	  # the collision occured.
	  if ( collide )
	    @y = (old_y)
		@falling = false
		@game.spawn_next_shape
		@game.delete_lines_of(self)
	  else  
	    if needs_move_update?
		  if (@game.button_down?(Gosu::KbLeft))
		    @x =  (@x - @game.block_width)
		  end
		  if (@game.button_down?(Gosu::KbRight))
			@x = ( @x + @game.block_width)
		  end
		  
		  if ( collide )
		    @x = (old_x)
		  end 
		end  
	  end
	end
  end

  def collide
    get_blocks.each do |block|
	  collision = block.collide_with_other_blocks;
	  if (collision)
	    return true
	  end
    end

    bounds = get_bounds

    if ( bounds[3] > @game.height )
	  return true
    end

    if ( bounds[2] > @game.width )
	  return true
    end

    if ( bounds[0] < 0 )
	  return true
    end	
    return false
  end

end

class ShapeI < Shape
  def initialize(game)
    super(game)
	
	@rotation_block = @blocks[1]
	@rotation_cycle = 2
  end

  def get_blocks    
	@blocks[0].x = @x
	@blocks[1].x = @x
	@blocks[2].x = @x
	@blocks[3].x = @x
	@blocks[0].y = @y
  	@blocks[1].y = @blocks[0].y + @blocks[0].height
	@blocks[2].y = @blocks[1].y + @blocks[1].height
	@blocks[3].y = @blocks[2].y + @blocks[2].height
	
	apply_rotation
	
	@blocks.each { |block| block.color = 0xffb2ffff }
  end
end

class ShapeL < Shape
  def initialize(game)
    super(game)
	
	@rotation_block = @blocks[1]
	@rotation_cycle = 4
  end

  def get_blocks	
	@blocks[0].x = @x
	@blocks[1].x = @x
	@blocks[2].x = @x
	@blocks[3].x = @x + @game.block_width
	@blocks[0].y = @y
  	@blocks[1].y = @blocks[0].y + @game.block_height
	@blocks[2].y = @blocks[1].y + @game.block_height
	@blocks[3].y = @blocks[2].y
	
	apply_rotation
	
	@blocks.each { |block| block.color = 0xffff7f00 }
  end
end

class ShapeJ < ShapeL
  def get_blocks
    # Reverse will reverse also the direction of rotation that's applied in apply_rotation
	# This will temporary disable rotation in the super method, so we can handle the rotation here after the reverse
    old_rotation = @rotation
    @rotation = 0  
	
    super
	reverse
	
	@rotation = old_rotation
	apply_rotation
	
	@blocks.each { |block| block.color = 0xff0000ff}
  end
end

class ShapeCube < Shape
  def get_blocks
	@blocks[0].x = @x
	@blocks[1].x = @x
	@blocks[2].x = @x + @game.block_width
	@blocks[3].x = @x + @game.block_width
	@blocks[0].y = @y
  	@blocks[1].y = @blocks[0].y + @game.block_height
	@blocks[2].y = @blocks[0].y 
	@blocks[3].y = @blocks[2].y + @game.block_height
	
	@blocks.each { |block| block.color = 0xffffff00}
  end
end

class ShapeZ < Shape
  def initialize(game)
    super(game)
	
	@rotation_block = @blocks[1]
	@rotation_cycle = 2
  end

  def get_blocks
	@blocks[0].x = @x
	@blocks[1].x = @x + @game.block_width
	@blocks[2].x = @x + @game.block_width
	@blocks[3].x = @x + @game.block_width*2
	@blocks[0].y = @y
  	@blocks[1].y = @y
	@blocks[2].y = @y + @game.block_height
	@blocks[3].y = @y + @game.block_height
	
	apply_rotation
	@blocks.each { |block| block.color = 0xffff0000}
  end
end

class ShapeS < ShapeZ
  def get_blocks
    # Reverse will reverse also the direction of rotation that's applied in apply_rotation
	# This will temporary disable rotation in the super method, so we can handle the rotation here after the reverse
    old_rotation = @rotation
    @rotation = 0  
	
    super
	reverse
	
	@rotation = old_rotation
	apply_rotation
	
	@blocks.each { |block| block.color = 0xff00ff00}
  end
end

class ShapeT < Shape
  def initialize(game)
    super(game)
	
	@rotation_block = @blocks[1]
	@rotation_cycle = 4
  end

  def get_blocks	
	@blocks[0].x = @x
	@blocks[1].x = @x + @game.block_width
	@blocks[2].x = @x + @game.block_width*2
	@blocks[3].x = @x + @game.block_width
	@blocks[0].y = @y
  	@blocks[1].y = @y
	@blocks[2].y = @y
	@blocks[3].y = @y + @game.block_height
	
	apply_rotation
	@blocks.each { |block| block.color = 0xffff00ff}
  end
end

class TetrisGameWindow < Gosu::Window
  attr_accessor :blocks
  attr_reader :block_height, :block_width
  attr_reader :level
  attr_reader :falling_shape

  STATE_PLAY = 1
  STATE_GAMEOVER = 2

  def initialize
    super(320, 640, false)
	
	@block_width = 32
	@block_height = 32
	
	@blocks = []
	
	@state = STATE_PLAY
	
	spawn_next_shape
	
	@lines_cleared = 0
	@level = 0
	
	self.caption = "Tetris : #{@lines_cleared} lines"
	
    @song = Gosu::Song.new("TetrisB_8bit.ogg")
  end

  def update
    if ( @state == STATE_PLAY )
      if ( @falling_shape.collide )
        @state = STATE_GAMEOVER
	  else
        @falling_shape.update
	  end

	  @level = @lines_cleared / 10
	  self.caption = "Tetris : #{@lines_cleared} lines"
	else 
	  if ( button_down?(Gosu::KbSpace) )
	    @blocks = []
		@falling_shape = nil
		@level = 0
		@lines_cleared = 0
		spawn_next_shape
		
		@state = STATE_PLAY
	  end
	end
	
	if ( button_down?(Gosu::KbEscape) )
	  close
	end
	@song.play(true)
  end

  def draw
    @blocks.each { |block| block.draw }
	@falling_shape.draw
	
	if @state == STATE_GAMEOVER
	   text = Gosu::Image.from_text(self, "Game Over", "Arial", 40)
	   text.draw(width/2 - 90, height/2 - 20, 0, 1, 1)
	end
  end

  def button_down(id)
    # Rotate shape when space is pressed
    if ( id == Gosu::KbSpace && @falling_shape != nil )
      @falling_shape.rotation += 1
	  if ( @falling_shape.collide )
	    @falling_shape.rotation -= 1
	  end
	end
  end

  def spawn_next_shape
    # Spawn a random shape and add the current falling shape' blocks to the "static" blocks list
    if (@falling_shape != nil )
	  @blocks += @falling_shape.get_blocks 
	end
	 
	generator = Random.new
	shapes = [ShapeI.new(self), ShapeL.new(self), ShapeJ.new(self), ShapeCube.new(self), ShapeZ.new(self), ShapeT.new(self), ShapeS.new(self)]
	shape = generator.rand(0..(shapes.length-1))
    @falling_shape = shapes[shape]
  end

  def line_complete(y)
    # Important is that the screen resolution should be divisable by the block_width, otherwise there would be gap
	# If the count of blocks at a line is equal to the max possible blocks for any line - the line is complete
	i = @blocks.count{|item| item.y == y}
	if ( i == width / block_width )
		return true;
	end
	return false;
  end

  def delete_lines_of( shape )
    # Go through each block of the shape and check if the lines they are on are complete
    deleted_lines = []
    shape.get_blocks.each do |block|
		if ( line_complete(block.y) )
		   deleted_lines.push(block.y)
		   @blocks = @blocks.delete_if { |item| item.y == block.y }
		end
	end
	
	@lines_cleared += deleted_lines.length
	
	# This applies the standard gravity found in classic Tetris games - all blocks go down by the 
	# amount of lines cleared
	@blocks.each do |block|
	  i = deleted_lines.count{ |y| y > block.y }
	  block.y += i*block_height
	end
	
  end

end

# This global prevents creation of the window and start of the simulation when we are doing testing
if ( !$testing )
	window = TetrisGameWindow.new
	window.show
end

And unit tests (they were required)
$testing = true

require "./tetris.rb"
require "test/unit"

class TestTetris < Test::Unit::TestCase
  def setup
    @game = TetrisGameWindow.new
	@w = @game.block_width
	@h = @game.block_height
  end

  def test_shapes_construction
    assert_equal(4, ShapeI.new(@game).get_blocks.length, "ShapeI must be constructed of 4 blocks")
	assert_equal(4, ShapeT.new(@game).get_blocks.length, "ShapeT must be constructed of 4 blocks")
	assert_equal(4, ShapeJ.new(@game).get_blocks.length, "ShapeJ must be constructed of 4 blocks")
	assert_equal(4, ShapeZ.new(@game).get_blocks.length, "ShapeZ must be constructed of 4 blocks")
	assert_equal(4, ShapeCube.new(@game).get_blocks.length, "ShapeO must be constructed of 4 blocks")
	assert_equal(4, ShapeS.new(@game).get_blocks.length, "ShapeS must be constructed of 4 blocks")
	assert_equal(4, ShapeL.new(@game).get_blocks.length, "ShapeL must be constructed of 4 blocks")
	
	assert_not_equal(nil, @game.falling_shape, "Falling shape shoudn't be nil")
  end

  def test_shapes_rotation
    shape = ShapeI.new(@game)
	shape.rotation = 1
	assert(shape_contain_block(shape, -2*@w, @h), "Rotation of I failed!")
	assert(shape_contain_block(shape, -@w, @h), "Rotation of I failed!")
	assert(shape_contain_block(shape, 0, @h), "Rotation of I failed!")
	assert(shape_contain_block(shape, @w, @h), "Rotation of I failed!")
	
    shape = ShapeL.new(@game)
	shape.rotation = 2
	assert(shape_contain_block(shape, -@w, 0), "Rotation of L failed!")
	assert(shape_contain_block(shape, 0, 0), "Rotation of L failed!")
	assert(shape_contain_block(shape, 0, @h), "Rotation of L failed!")
	assert(shape_contain_block(shape, 0, 2*@h), "Rotation of L failed!")
	
    shape = ShapeJ.new(@game)
	shape.rotation = 2
	assert(shape_contain_block(shape, 2*@w, 0), "Rotation of J failed!")
	assert(shape_contain_block(shape, @w, 0), "Rotation of J failed!")
	assert(shape_contain_block(shape, @w, @h), "Rotation of J failed!")
	assert(shape_contain_block(shape, @w, 2*@h), "Rotation of J failed!")
	
	shape = ShapeZ.new(@game)
	shape.rotation = 2
	assert(shape_contain_block(shape, 0, 0), "Rotation of Z failed!")
	assert(shape_contain_block(shape, @w, 0), "Rotation of Z failed!")
	assert(shape_contain_block(shape, @w, @h), "Rotation of Z failed!")
	assert(shape_contain_block(shape, 2*@w, @h), "Rotation of Z failed!")
	
	shape = ShapeS.new(@game)
	shape.rotation = 1
	assert(shape_contain_block(shape, 0, -@h), "Rotation of S failed!")
	assert(shape_contain_block(shape, 0, 0), "Rotation of S failed!")
	assert(shape_contain_block(shape, @w, 0), "Rotation of S failed!")
	assert(shape_contain_block(shape, @w, @h), "Rotation of S failed!")
	
	shape = ShapeT.new(@game)
	shape.rotation = 3
	assert(shape_contain_block(shape, @w, -@h), "Rotation of T failed!")
	assert(shape_contain_block(shape, @w, 0), "Rotation of T failed!")
	assert(shape_contain_block(shape, 2*@w, 0), "Rotation of T failed!")
	assert(shape_contain_block(shape, @w, @h), "Rotation of T failed!")
  end

  def test_block_collision
    block1   = Block.new(@game)
	block2   = Block.new(@game)
	block2.x = @w
	block2.y = 0
	
	assert_equal(false, block1.collide(block2), "Blocks should not collide")
	
	block2.x = 0
	block2.y = 0
	
	assert_equal(true, block1.collide(block2), "Blocks should collide")
  end  

  def test_line_complete
    (0.. (@game.width/@w - 1)).each do |i|
      add_block(i*@w, 0)
	end
	
	(0.. (@game.width/@w - 2)).each do |i|
      add_block(i*@w, @h)
	end
	
	assert_equal(true, @game.line_complete(0), "Line should be complete")
	assert_equal(false, @game.line_complete(@h), "Line should not be complete")
	
	shapeI = ShapeI.new(@game)
	@game.delete_lines_of(shapeI)
	
	(0.. (@game.width/@w - 1)).each do |i|
      assert_equal(false, contain_block(@game.blocks, i*@w, 0), "Line 0 should be deleted ")
	end
	
  end

  def add_block(x,y)
    block = Block.new(@game)
	block.x = x
	block.y = y
    @game.blocks << block
  end

  def contain_block(array, x, y)
    array.index { |block| block.x == x && block.y == y } != nil
  end

  def shape_contain_block(shape, x, y)
    contain_block(shape.get_blocks,x, y)
  end

end


The game (the ruby source file) and a Windows .exe (allowing you to run the game without ruby or gosu installed) can be downloaded from here :

Attached File  tetris.rar (4.17MB)
downloads: 306

If you run the game in the ruby 1.9 interpreter, be sure to install gosu :
gem install gosu


Realtime raytracing with OpenCL II

Posted by , 17 February 2012 - - - - - - · 23,285 views

Hello,

In the first chapter of Realtime raytracing with OpenCL, we talked about how to use the OpenCL API for general computations and I wrote a program to sum 2 arrays on the GPU.

Attached Image

Now I will explain how to write the raytracer itself.
A couple of things first :
I will not go into too much detail about the theory behind a raytracer. That's outside the scope of this entry - the point here is to show how to use OpenCL to accelerate raytracing.
If you want to know how this stuff works,follow the excellent article by Jacco Bikker (Phantom on these forums)

Also, in order to be as concise as possible, I will not explain the C++ side of things.
I will post the full source at the end, so you can check it if you have any trouble undestanding how everything fits together, but if you read Part I you should be able to do that on your own.
Let's begin.

Data structures

OpenCL allows you to create structs (as in the C language).
We are going to need several of them to organize the code a bit :

struct Material{
	/* 0 - Standard diffuse color, 1 - Compute 'Chessboard' texture */
	int computeColorType;
	float4 color;
	float reflectivity;
	float refractivity;
};

struct Material createMaterial()
{
	struct Material m;
	m.color = (float4)(1,1,1,1);
	m.computeColorType = 0;
	m.reflectivity = 0;
	m.refractivity = 0;
	return m;
}

struct Sphere{
	struct Material* m;
	float3 pos;
	float radius;
};

struct Plane{
	struct Material* m;
	float3 normal;
	float3 point;
};

struct Ray{
	float3 origin;
	float3 dir;
};

struct Light{
	float3 pos;
	float3 dir;
	bool directional;
	float4 color;
};

Not the most pretty way to do it, but works. You can't have constructors by the way, so I created the above "createMaterial" function that just creates new materials and fills them with defaults.

struct Scene{
	struct Sphere spheres[10];
	int spheresCount;
	
	struct Plane planes[10];
	int planesCount;
	
	struct Light lights[10];
	int lightsCount;
	
	struct Material standardMaterial;
};

The scene just contains all our spheres, planes and lights. The standard material is applied to geometries with no materials (where the m pointer is null).

The kernel

Now I'm going to skip all the mambo-jambo and jump right into the kernel function to show you how to use the data structures and setup the pipeline. Then I will explain all the peripheral methods that every raytracer has.

Important note : I'm building the whole scene in the OpenCL code and not on the C++ side.
This was specifically allowed by my lecturer, so I used it. If you want the raytracer to be reusable, you need to move the scene creation to the C++ program and pass it to the OpenCL program. It will probably be faster as well (if you organize it properly).

__kernel void main( __global float4 *dst, uint width, uint height, __global float* viewTransform, __global float* worldTransforms )

Let me explain the kernel function's parameters
dst - the output buffer to which we write our rendered image. Its of size width*height
width - the width of the output buffer / resolution of rendering
height - the height of the output buffer / resolution of rendering
viewTransform - the camera matrix
worldTransforms - an array of objects transform (could be more than one).


So lets create our materials first

	struct Scene scene;
	
	scene.standardMaterial = createMaterial();
	scene.standardMaterial.reflectivity = 0;
	scene.standardMaterial.computeColorType = 1;
	
	struct Material floorMaterial = createMaterial();
	floorMaterial.reflectivity = 0.5;
	floorMaterial.computeColorType = 1;
	
	struct Material ballMaterial1 = createMaterial();
	ballMaterial1.reflectivity = 1;
	ballMaterial1.color = (float4)(1,0,0,1);
	struct Material ballMaterial2 = createMaterial();
	ballMaterial2.reflectivity = 1;
	ballMaterial2.color = (float4)(0,0,1,1);
	struct Material ballMaterial3 = createMaterial();
	ballMaterial3.reflectivity = 1;
	ballMaterial3.color = (float4)(1,1,1,1);
	
	struct Material refractMaterial = createMaterial();
	refractMaterial.refractivity = 1;

Now fill in the geometry. Not too much to explain there.

	scene.spheresCount = 2;
	scene.spheres[0].pos = (float3)(0,0,0);
	scene.spheres[0].radius = 3;
	scene.spheres[0].m = &ballMaterial1;
	scene.spheres[1].pos = (float3)(0,0,-0);
	scene.spheres[1].radius = 3;
	scene.spheres[1].m = &ballMaterial2;
	
	scene.planesCount = 5;
	scene.planes[0].point = (float3)(0,-5,0);
	scene.planes[0].normal = (float3)(0,1,0);
	scene.planes[0].m	  = &floorMaterial;
	scene.planes[1].point = (float3)(0,40,0);
	scene.planes[1].normal = normalize((float3)(0,-1,0));
	scene.planes[2].point = (float3)(-40,-5,0);
	scene.planes[2].normal = (float3)(1,1,0);
	scene.planes[3].point = (float3)(40,-5,0);
	scene.planes[3].normal = normalize((float3)(-1,1,0));
	
	scene.planes[4].point = (float3)(0,0,0);
	scene.planes[4].normal = normalize((float3)(0,0,-1));
	scene.planes[4].m = &refractMaterial;
	
	scene.lightsCount = 2;
	scene.lights[0].pos = (float3)(0,30,-20);
	scene.lights[0].directional = false;
	scene.lights[0].color = (float4)(1,1,1,1);
	scene.lights[1].pos = (float3)(0,30,20);
	scene.lights[1].dir = normalize((float3)(0,1,1));
	scene.lights[1].directional = false;
	scene.lights[1].color = (float4)(1,1,1,1);

Now, since in our demo we have 2 spheres moving we want to transform their positions by the worldTransforms
		scene.spheres[0].pos = matrixVectorMultiply(worldTransforms, &scene.spheres[0].pos);
		scene.spheres[1].pos = matrixVectorMultiply(worldTransforms+16, &scene.spheres[1].pos);

If you build your scene on the c++ side out of triangles for example, you could specify all your coordinates in world coordinates, which would make this step unnecessary.


Finally do the raytracing (+antialiasing) and store the result pixel color in the buffer

	float dx = 1.0f / (float)(width);
	float dy = 1.0f / (float)(height);
	float aspect = (float)(width) / (float)(height);
	
	dst[get_global_id(0)] = (float4)(0,0,0,0);
	for(int i = 0; i < kAntiAliasingSamples; i++){
		for(int j = 0; j < kAntiAliasingSamples; j++){
				float x = (float)(get_global_id(0) % width) / (float)(width) + dx*i/kAntiAliasingSamples;
				float y = (float)(get_global_id(0) / width) / (float)(height) + dy*j/kAntiAliasingSamples;
				
				x = (x -0.5f)*aspect;
				y = y -0.5f;
				
				struct Ray r;
				r.origin = matrixVectorMultiply(viewTransform, &(float3)(0, 0, -1));
				r.dir	= normalize(matrixVectorMultiply(viewTransform, &(float3)(x, y, 0)) - r.origin);
				float4 color = raytrace(&r, &scene, 0);
				dst[get_global_id(0)] += color / (kAntiAliasingSamples*kAntiAliasingSamples) ;
		}
	}

Now we need the following perihperal functions : matrixVectorMultiply and raytrace.
float3 matrixVectorMultiply(__global float* matrix, float3* vector){
	float3 result;
	result.x = matrix[0]*((*vector).x)+matrix[4]*((*vector).y)+matrix[8]*((*vector).z)+matrix[12];
	result.y = matrix[1]*((*vector).x)+matrix[5]*((*vector).y)+matrix[9]*((*vector).z)+matrix[13];
	result.z = matrix[2]*((*vector).x)+matrix[6]*((*vector).y)+matrix[10]*((*vector).z)+matrix[14];
	return result;
}
}

Raytrace is the function where the actual raytracing happes, so we might want to look at it in more detail :

float4 raytrace(struct Ray* ray, struct Scene* scene,int traceDepth)

We accept a ray, the scene and the depth at which we currently are in recursive tracing.

The following code :
	void* intersectObj = 0;
	int intersectObjType = 0;
	float t = intersect( ray, scene, &intersectObj, &intersectObjType);

finds the first intersection of the ray in the scene and returns a pointer to the object, as well the type of this object.
There is no polymorphism in OpenCL, so we need this to differentiate between the objects.

Now compute the normal based on the object type and get its material.

	float4 color = (float4)(0,0,0,0);
	if ( t < kMaxRenderDist ){
		float3 intersectPos = ray->origin+ray->dir*t ;
		float3 normal;
		
		struct Material* m = 0;
		
		if ( intersectObjType == 1 ){		
			normal = normalize(intersectPos-((struct Sphere*)intersectObj)->pos);
			m = ((struct Sphere*)intersectObj)->m;
		}
		else if (intersectObjType == 2 ){
			normal = ((struct Plane*)intersectObj)->normal;
			m = ((struct Plane*)intersectObj)->m;
		}
		
		if ( !m ){
			m = &scene->standardMaterial;
		}

If there is no material we use the "standard material"

Time to compute the color. I used a procedural checkboard texture for some of the planes, so we need to check the field "computeColorType".
This is a good place to plug in any texturing code you might want to add. You could, for example use ""computeColorType = 2" for textured materials and supply a texture id.

		float4 diffuseColor = m->color;
		
		if ( m->computeColorType == 1){
			if ( (int)(intersectPos.x/5.0f) % 2 == 0 ){
				if ( (int)(intersectPos.z/5.0f) % 2 == 0 ){
					diffuseColor = (float4)(0,0,0,0);
				}
			}
			else{
				if ( (int)(intersectPos.z/5.0f) % 2 != 0 ){
					diffuseColor = (float4)(0,0,0,0);
				}
			}
		}

Reflection and refraction. We use raytrace recursively and increase the recursion depth :

		if ( traceDepth < kMaxTraceDepth && m->reflectivity > 0 ){
				struct Ray reflectRay;
				float3 R = reflect(ray->dir, normal);
				reflectRay.origin = intersectPos + R*0.001;
				reflectRay.dir	= R;
				diffuseColor += m->reflectivity*raytrace(&reflectRay, scene, traceDepth+1);
		}
		
		if ( traceDepth < kMaxTraceDepth && m->refractivity > 0 ){
				struct Ray refractRay;
				float3 R = refract(ray->dir, normal, 0.6);
				if ( dot(R,normal) < 0 ){
					refractRay.origin = intersectPos + R*0.001;
					refractRay.dir	= R;
					diffuseColor = m->refractivity*raytrace(&refractRay, scene, traceDepth+1);
				}
		}

Next add lights contribution for this ray. Note that there is some room for optimization here :
We could have computed the light's contribution first (by adding pointLit*scene->lights[i].color*max(0.0f,dot(normal, L)) to color ).
Then if color was close to black we could skip the diffuseColor computation althogether (including reflection and refraction).

		for(int i = 0; i < scene->lightsCount; i++){
			float3 L = scene->lights[i].dir;
			float lightDist = kMaxRenderDist;
			if ( !scene->lights[i].directional ){
				L = scene->lights[i].pos - intersectPos ;
				lightDist = length(L);
				L = normalize(L);
			}
			
			float pointLit = 1;
			struct Ray shadowRay;
			shadowRay.origin = intersectPos + L*0.001;
			shadowRay.dir = L;
			t = intersect( &shadowRay, scene, &intersectObj, &intersectObjType);
			if ( t < lightDist ){
				pointLit = 0;
			}
			color += pointLit*diffuseColor*scene->lights[i].color*max(0.0f,dot(normal, L));
		}
	}
	return clamp(color,0,1);
We also shoot the shadow rays here. It might be a good idea to add some indication that this is a shadow ray to the intersect routine, because we might make additional optimization : we don't need to find the closest intersetction, but the first intersection that's closer than the light (if there is one).

Finally we return the color and clamp each component between [0,1]

Now we need 3 more functions : reflect, refract and intersect.


float3 reflect(float3 V, float3 N){
	return V - 2.0f * dot( V, N ) * N;
}

float3 refract(float3 V, float3 N, float refrIndex)
{
	float cosI = -dot( N, V );
	float cosT2 = 1.0f - refrIndex * refrIndex * (1.0f - cosI * cosI);
	return (refrIndex * V) + (refrIndex * cosI - sqrt( cosT2 )) * N;
}


Intersection is pretty straightforward. We look for the closest intersection and save the object and its type.


float intersect(struct Ray* ray, struct Scene* scene, void** object, int* type)
{
	float minT = kMaxRenderDist;
	
	for(int i = 0; i < scene->spheresCount; i++){
		float t;
		if ( raySphere( &scene->spheres[i], ray, &t ) ){
			if ( t < minT ){
				minT = t;
				*type = 1;
				*object = &scene->spheres[i];
			}
		}
	}
	
	for(int i = 0; i < scene->planesCount; i++){
		float t;
		if ( rayPlane( &scene->planes[i], ray, &t ) ){
			if ( t < minT ){
				minT = t;
				*type = 2;
				*object = &scene->planes[i];
			}
		}
	}
	
	return minT;
}

Finally, the functions to intersect ray with plane and sphere :
bool raySphere(struct Sphere* s, struct Ray* r, float* t)
{
	float3 rayToCenter = s->pos - r->origin ;
	float dotProduct = dot(r->dir,rayToCenter);
	float d = dotProduct*dotProduct - dot(rayToCenter,rayToCenter)+s->radius*s->radius;

	if ( d < 0)
		return false;

	*t = (dotProduct - sqrt(d) );

	if ( *t < 0 ){
		*t = (dotProduct + sqrt(d) ) ;
		if ( *t < 0){
			return false;
		}
	}

	return true;
}

bool rayPlane(struct Plane* p, struct Ray* r, float* t)
{
	float dotProduct = dot(r->dir,p->normal);
	if ( dotProduct == 0){
		return false;
	}
	*t = dot(p->normal,p->point-r->origin) / dotProduct ;

	return *t >= 0;
}

The full source of the OpenCL program

const int kAntiAliasingSamples  = 2;
const int kMaxTraceDepth = 2;
const float kMaxRenderDist = 1000.0f;

struct Material{
	/* 0 - Standard diffuse color, 1 - Compute 'Chessboard' texture */
	int computeColorType;
	float4 color;
	float reflectivity;
	float refractivity;
};

struct Material createMaterial()
{
	struct Material m;
	m.color = (float4)(1,1,1,1);
	m.computeColorType = 0;
	m.reflectivity = 0;
	m.refractivity = 0;
	return m;
}

struct Sphere{
	struct Material* m;
	float3 pos;
	float radius;
};

struct Plane{
	struct Material* m;
	float3 normal;
	float3 point;
};

struct Ray{
	float3 origin;
	float3 dir;
};

struct Light{
	float3 pos;
	float3 dir;
	bool directional;
	float4 color;
};


struct Scene{
	struct Sphere spheres[10];
	int spheresCount;
	
	struct Plane planes[10];
	int planesCount;
	
	struct Light lights[10];
	int lightsCount;
	
	struct Material standardMaterial;
};

float3 reflect(float3 V, float3 N){
	return V - 2.0f * dot( V, N ) * N;
}

float3 refract(float3 V, float3 N, float refrIndex)
{
    float cosI = -dot( N, V );
    float cosT2 = 1.0f - refrIndex * refrIndex * (1.0f - cosI * cosI);
	return (refrIndex * V) + (refrIndex * cosI - sqrt( cosT2 )) * N;
}

bool raySphere(struct Sphere* s, struct Ray* r, float* t)
{
	float3 rayToCenter = s->pos - r->origin ;
	float dotProduct = dot(r->dir,rayToCenter);
	float d = dotProduct*dotProduct - dot(rayToCenter,rayToCenter)+s->radius*s->radius;

	if ( d < 0)
		return false;

	*t = (dotProduct - sqrt(d) );

	if ( *t < 0 ){
		*t = (dotProduct + sqrt(d) ) ;
		if ( *t < 0){
			return false;
		}
	}

	return true;
}

bool rayPlane(struct Plane* p, struct Ray* r, float* t)
{
	float dotProduct = dot(r->dir,p->normal);
	if ( dotProduct == 0){
		return false;
	}
	*t = dot(p->normal,p->point-r->origin) / dotProduct ;

	return *t >= 0;
}

float intersect(struct Ray* ray, struct Scene* scene, void** object, int* type)
{
	float minT = kMaxRenderDist;
	
	for(int i = 0; i < scene->spheresCount; i++){
		float t;
		if ( raySphere( &scene->spheres[i], ray, &t ) ){
			if ( t < minT ){
				minT = t;
				*type = 1;
				*object = &scene->spheres[i];
			}
		}
	}
	
	for(int i = 0; i < scene->planesCount; i++){
		float t;
		if ( rayPlane( &scene->planes[i], ray, &t ) ){
			if ( t < minT ){
				minT = t;
				*type = 2;
				*object = &scene->planes[i];
			}
		}
	}
	
	return minT;
}

float4 raytrace(struct Ray* ray, struct Scene* scene,int traceDepth)
{
	void* intersectObj = 0;
	int intersectObjType = 0;
	float t = intersect( ray, scene, &intersectObj, &intersectObjType);
	
	float4 color = (float4)(0,0,0,0);
	if ( t < kMaxRenderDist ){
		float3 intersectPos = ray->origin+ray->dir*t ;
		float3 normal;
		
		struct Material* m = 0;
		
		if ( intersectObjType == 1 ){		
			normal = normalize(intersectPos-((struct Sphere*)intersectObj)->pos);
			m = ((struct Sphere*)intersectObj)->m;
		}
		else if (intersectObjType == 2 ){
			normal = ((struct Plane*)intersectObj)->normal;
			m = ((struct Plane*)intersectObj)->m;
		}
		
		if ( !m ){
			m = &scene->standardMaterial;
		}
		
		float4 diffuseColor = m->color;
		
		if ( m->computeColorType == 1){
			if ( (int)(intersectPos.x/5.0f) % 2 == 0 ){
				if ( (int)(intersectPos.z/5.0f) % 2 == 0 ){
					diffuseColor = (float4)(0,0,0,0);
				}
			}
			else{
				if ( (int)(intersectPos.z/5.0f) % 2 != 0 ){
					diffuseColor = (float4)(0,0,0,0);
				}
			}
		}
		if ( traceDepth < kMaxTraceDepth && m->reflectivity > 0 ){
				struct Ray reflectRay;
				float3 R = reflect(ray->dir, normal);
				reflectRay.origin = intersectPos + R*0.001;
				reflectRay.dir    = R;
				diffuseColor += m->reflectivity*raytrace(&reflectRay, scene, traceDepth+1);
		}
		
		if ( traceDepth < kMaxTraceDepth && m->refractivity > 0 ){
				struct Ray refractRay;
				float3 R = refract(ray->dir, normal, 0.6);
				if ( dot(R,normal) < 0 ){
					refractRay.origin = intersectPos + R*0.001;
					refractRay.dir    = R;
					diffuseColor = m->refractivity*raytrace(&refractRay, scene, traceDepth+1);
				}
		}
		
		for(int i = 0; i < scene->lightsCount; i++){
			float3 L = scene->lights[i].dir;
			float lightDist = kMaxRenderDist;
			if ( !scene->lights[i].directional ){
				L = scene->lights[i].pos - intersectPos ;
				lightDist = length(L);
				L = normalize(L);
			}
			
			float pointLit = 1;
			struct Ray shadowRay;
			shadowRay.origin = intersectPos + L*0.001;
			shadowRay.dir = L;
			t = intersect( &shadowRay, scene, &intersectObj, &intersectObjType);
			if ( t < lightDist ){
				pointLit = 0;
			}
			color += pointLit*diffuseColor*scene->lights[i].color*max(0.0f,dot(normal, L));
		}
	}
	return clamp(color,0,1);
}

float3 matrixVectorMultiply(__global float* matrix, float3* vector){ 
	float3 result;
	result.x = matrix[0]*((*vector).x)+matrix[4]*((*vector).y)+matrix[8]*((*vector).z)+matrix[12];
	result.y = matrix[1]*((*vector).x)+matrix[5]*((*vector).y)+matrix[9]*((*vector).z)+matrix[13];
	result.z = matrix[2]*((*vector).x)+matrix[6]*((*vector).y)+matrix[10]*((*vector).z)+matrix[14];
	return result;
}

__kernel void main( __global float4 *dst, uint width, uint height, __global float* viewTransform, __global float* worldTransforms )                                 
{                                                                            
	struct Scene scene;
	
	scene.standardMaterial = createMaterial();
	scene.standardMaterial.reflectivity = 0;
	scene.standardMaterial.computeColorType = 1;
	
	struct Material floorMaterial = createMaterial();
	floorMaterial.reflectivity = 0.5;
	floorMaterial.computeColorType = 1;
	
	struct Material ballMaterial1 = createMaterial();
	ballMaterial1.reflectivity = 1;
	ballMaterial1.color = (float4)(1,0,0,1);
	struct Material ballMaterial2 = createMaterial();
	ballMaterial2.reflectivity = 1;
	ballMaterial2.color = (float4)(0,0,1,1);
	struct Material ballMaterial3 = createMaterial();
	ballMaterial3.reflectivity = 1;
	ballMaterial3.color = (float4)(1,1,1,1);
	
	struct Material refractMaterial = createMaterial();
	refractMaterial.refractivity = 1;
	
	scene.spheresCount = 2;
	scene.spheres[0].pos = (float3)(0,0,0);
	scene.spheres[0].radius = 3;
	scene.spheres[0].m = &ballMaterial1;
	scene.spheres[1].pos = (float3)(0,0,-0);
	scene.spheres[1].radius = 3;
	scene.spheres[1].m = &ballMaterial2;
	
	scene.planesCount = 5;
	scene.planes[0].point = (float3)(0,-5,0);
	scene.planes[0].normal = (float3)(0,1,0);
	scene.planes[0].m      = &floorMaterial;
	scene.planes[1].point = (float3)(0,40,0);
	scene.planes[1].normal = normalize((float3)(0,-1,0));
	scene.planes[2].point = (float3)(-40,-5,0);
	scene.planes[2].normal = (float3)(1,1,0);
	scene.planes[3].point = (float3)(40,-5,0);
	scene.planes[3].normal = normalize((float3)(-1,1,0));
	
	scene.planes[4].point = (float3)(0,0,0);
	scene.planes[4].normal = normalize((float3)(0,0,-1));
	scene.planes[4].m = &refractMaterial;
	
	scene.lightsCount = 2;
	scene.lights[0].pos = (float3)(0,30,-20);
	scene.lights[0].directional = false;
	scene.lights[0].color = (float4)(1,1,1,1);
	scene.lights[1].pos = (float3)(0,30,20);
	scene.lights[1].dir = normalize((float3)(0,1,1));
	scene.lights[1].directional = false;
	scene.lights[1].color = (float4)(1,1,1,1);
	
	scene.spheres[0].pos = matrixVectorMultiply(worldTransforms, &scene.spheres[0].pos);
	scene.spheres[1].pos = matrixVectorMultiply(worldTransforms+16, &scene.spheres[1].pos);
	
	float dx = 1.0f / (float)(width);
	float dy = 1.0f / (float)(height);
	float aspect = (float)(width) / (float)(height);
	
	dst[get_global_id(0)] = (float4)(0,0,0,0);
	for(int i = 0; i < kAntiAliasingSamples; i++){
		for(int j = 0; j < kAntiAliasingSamples; j++){
				float x = (float)(get_global_id(0) % width) / (float)(width) + dx*i/kAntiAliasingSamples;
				float y = (float)(get_global_id(0) / width) / (float)(height) + dy*j/kAntiAliasingSamples;
				
				x = (x -0.5f)*aspect;
				y = y -0.5f;
				
				struct Ray r;
				r.origin = matrixVectorMultiply(viewTransform, &(float3)(0, 0, -1));
				r.dir    = normalize(matrixVectorMultiply(viewTransform, &(float3)(x, y, 0)) - r.origin);
				float4 color = raytrace(&r, &scene, 0);
				dst[get_global_id(0)] += color / (kAntiAliasingSamples*kAntiAliasingSamples) ;
		}
	}

} 

The full source of the C++ program

#include <CL/cl.h>
#include <iostream>
#include <fstream>
#include <sstream>

#include <SDL/SDL.h>
#include <SDL/SDL_opengl.h>

const int kWidth = 1366;
const int kHeight = 768;
const bool kFullscreen = true;

size_t global_work_size = kWidth * kHeight;

float viewMatrix[16];

float sphere1Pos[3] = {0,0,10};
float sphere2Pos[3] = {0,0,-10};
float sphereVelocity = 1;
float sphereTransforms[2][16];

cl_command_queue queue;
cl_kernel kernel;
cl_mem buffer, viewTransform, worldTransforms;

void InitOpenCL()
{
	// 1. Get a platform.
	cl_platform_id platform;

	clGetPlatformIDs( 1, &platform, NULL );
	// 2. Find a gpu device.
	cl_device_id device;

	clGetDeviceIDs( platform, CL_DEVICE_TYPE_GPU,
								1,
								&device,
								NULL);
	// 3. Create a context and command queue on that device.
	cl_context context = clCreateContext( NULL,
											1,
											&device,
											NULL, NULL, NULL);
	queue = clCreateCommandQueue( context,
													device,
													0, NULL );
	// 4. Perform runtime source compilation, and obtain kernel entry point.
	std::ifstream file("kernel.txt");
	std::string source;
	while(!file.eof()){
		char line[256];
		file.getline(line,255);
		source += line;
	}

	cl_ulong maxSize;
	clGetDeviceInfo(device, CL_DEVICE_MAX_MEM_ALLOC_SIZE , sizeof(cl_ulong), &maxSize, 0);

	const char* str = source.c_str();
	cl_program program = clCreateProgramWithSource( context,
													1,
													&str,
													NULL, NULL );
	cl_int result = clBuildProgram( program, 1, &device, NULL, NULL, NULL );
	if ( result ){
		std::cout << "Error during compilation! (" << result << ")" << std::endl;
	}
	kernel = clCreateKernel( program, "main", NULL );
	// 5. Create a data buffer.
	buffer        = clCreateBuffer( context,
									CL_MEM_WRITE_ONLY,
									kWidth * kHeight *sizeof(cl_float4),
									NULL, 0 );
	viewTransform = clCreateBuffer( context,
									CL_MEM_READ_WRITE,
									16 *sizeof(cl_float),
									NULL, 0 );

	worldTransforms = clCreateBuffer( context,
									CL_MEM_READ_WRITE,
									16 *sizeof(cl_float)*2,
									NULL, 0 );

	clSetKernelArg(kernel, 0, sizeof(buffer), (void*) &buffer);
	clSetKernelArg(kernel, 1, sizeof(cl_uint), (void*) &kWidth);
	clSetKernelArg(kernel, 2, sizeof(cl_uint), (void*) &kWidth);
	clSetKernelArg(kernel, 3, sizeof(viewTransform), (void*) &viewTransform);
	clSetKernelArg(kernel, 4, sizeof(worldTransforms), (void*) &worldTransforms);
}

void Render(int delta)
{

	clEnqueueNDRangeKernel(   queue,
							kernel,
							1,
							NULL,
							&global_work_size,
							NULL, 0, NULL, NULL);

	// 7. Look at the results via synchronous buffer map.
	cl_float4 *ptr = (cl_float4 *) clEnqueueMapBuffer( queue,
											buffer,
											CL_TRUE,
											CL_MAP_READ,
											0,
											kWidth * kHeight * sizeof(cl_float4),
											0, NULL, NULL, NULL ); 

	cl_float *viewTransformPtr = (cl_float *) clEnqueueMapBuffer( queue,
											viewTransform,
											CL_TRUE,
											CL_MAP_WRITE,
											0,
											16 * sizeof(cl_float),
											0, NULL, NULL, NULL ); 

	cl_float *worldTransformsPtr = (cl_float *) clEnqueueMapBuffer( queue,
											worldTransforms,
											CL_TRUE,
											CL_MAP_WRITE,
											0,
											16 * sizeof(cl_float)*2,
											0, NULL, NULL, NULL ); 


	memcpy(viewTransformPtr, viewMatrix, sizeof(float)*16);
	memcpy(worldTransformsPtr, sphereTransforms[0], sizeof(float)*16);
	memcpy(worldTransformsPtr+16, sphereTransforms[1], sizeof(float)*16);


	clEnqueueUnmapMemObject(queue, viewTransform, viewTransformPtr, 0, 0, 0);
	clEnqueueUnmapMemObject(queue, worldTransforms, worldTransformsPtr, 0, 0, 0);

	unsigned char* pixels = new unsigned char[kWidth*kHeight*4];
	for(int i=0; i <  kWidth * kHeight; i++){
		pixels[i*4] = ptr[i].s[0]*255;
		pixels[i*4+1] = ptr[i].s[1]*255;
		pixels[i*4+2] = ptr[i].s[2]*255;
		pixels[i*4+3] = 1;
	}

	glBindTexture(GL_TEXTURE_2D, 1);
	glTexParameterf( GL_TEXTURE_2D, GL_TEXTURE_MIN_FILTER, GL_LINEAR );
	glTexParameterf( GL_TEXTURE_2D, GL_TEXTURE_MAG_FILTER, GL_LINEAR );
	glTexImage2D(GL_TEXTURE_2D, 0, 4, kWidth, kHeight, 0, GL_RGBA, GL_UNSIGNED_BYTE, pixels);
	delete [] pixels;

	glClearColor(1,1,1,1);
	glClear(GL_COLOR_BUFFER_BIT | GL_DEPTH_BUFFER_BIT);

	glMatrixMode(GL_PROJECTION);
	glLoadIdentity();
	glOrtho(-1,1,1,-1,1,100);
	glMatrixMode(GL_MODELVIEW);
	   
	glLoadIdentity();
	glBegin(GL_QUADS);
	glTexCoord2f(0,1);
	glVertex3f(-1,-1,-1);
	glTexCoord2f(0,0);
	glVertex3f(-1,1,-1);
	glTexCoord2f(1,0);
	glVertex3f(1,1,-1);
	glTexCoord2f(1,1);
	glVertex3f(1,-1,-1);
	glEnd();

	SDL_GL_SwapBuffers();
	clFinish( queue );
}

void Update(int delta)
{
	int count;
	Uint8* keys = SDL_GetKeyState(&count);

	float translate[3] = {0,0,0};
	if ( keys[SDLK_DOWN] ){
		translate[2] = -0.01*delta;
	}
	if ( keys[SDLK_UP] ){
		translate[2] = 0.01*delta;
	}
	if ( keys[SDLK_LEFT] ){
		translate[0] =- 0.01*delta;
	}
	if ( keys[SDLK_RIGHT] ){
		translate[0] = 0.01*delta;
	}

	int x,y;
	SDL_GetMouseState(&x,&y);
	int relX = (kWidth/2.0f - x)*delta;
	int relY = (kHeight/2.0f - y)*delta;
	SDL_WarpMouse(kWidth/2.0f, kHeight/2.0f);

	glMatrixMode(GL_MODELVIEW);

	glLoadIdentity();
	glMultMatrixf(viewMatrix);
	glTranslatef(translate[0],translate[1],translate[2]); 

	if ( relX != 0){
		glRotatef(-relX/200.0f, 0, 1, 0);
	}
	if ( relY != 0){
		glRotatef(-relY/200.0f, 1, 0, 0);
	}

	glGetFloatv(GL_MODELVIEW_MATRIX, viewMatrix);

	// Sphere Transforms
	glLoadIdentity();
	glTranslatef(0, 0, sphere1Pos[2]);
	glGetFloatv(GL_MODELVIEW_MATRIX, sphereTransforms[0]);

	glLoadIdentity();
	glTranslatef(0, 0, sphere2Pos[2]);
	glGetFloatv(GL_MODELVIEW_MATRIX, sphereTransforms[1]);

	sphere1Pos[2] += sphereVelocity*delta/30.0f;
	sphere2Pos[2] += sphereVelocity*(-1)*delta/30.0f;

	if ( sphere1Pos[2] > 50 ){
		sphereVelocity = -1;
	}
	else if ( sphere1Pos[2] < -50 ){
		sphereVelocity = 1;
	}
}

int main(int argc, char* argv[])
{
	InitOpenCL();

	memset(viewMatrix, 0, sizeof(float)*16);
	viewMatrix[0] = viewMatrix[5] = viewMatrix[10] = viewMatrix[15] = 1;


	SDL_Init(SDL_INIT_EVERYTHING);

	Uint32 flags = SDL_OPENGL;
	if ( kFullscreen ){
		flags |= SDL_FULLSCREEN;

		SDL_ShowCursor(0);
	}

	SDL_SetVideoMode(kWidth, kHeight, 32, flags);

	glEnable(GL_TEXTURE_2D);

	bool loop = true;
	int lastTicks = SDL_GetTicks();
	while(loop){
		int delta = SDL_GetTicks() - lastTicks;
		lastTicks = SDL_GetTicks();
		SDL_Event e;
		while(SDL_PollEvent(&e)){
			if ( e.type == SDL_QUIT ){
				loop = false;
			}
			else if ( e.type == SDL_KEYDOWN && e.key.keysym.sym == SDLK_ESCAPE){
				loop = false;
			}
		}

		Update(delta);
		Render(delta);

		std::stringstream ss;
		ss << 1000.0f / delta ;
		SDL_WM_SetCaption(ss.str().c_str(), 0);
	}


	return 0;
}

That's it, hope you enjoyed the series :)
I will upload the VS 2010 project and executable later tonight.

Thanks for reading!

Attached Files




Realtime raytracing with OpenCL I

Posted by , 10 February 2012 - - - - - - · 8,344 views

Hello,

As my university's exams are going, I had been busy working on various course projects.
This semester I took a Raytracing course. That's something I've been playing with some time ago, and I thought I could get some credits without paying much attention (which is important due to my lack of time and motivation to go to lectures :D ).

Anyway, my course task is to "Use OpenCL to write a real-time raytracer" - which was very fun in my opinion.
Since that's something I've never done before and I believe some folk over here would find it interesting, I decided to write 2 short journal entries of my experience.




Introduction to OpenCL
OpenCL( Open Computing Language) is a hardware - independent language for performing computations in massively parallel manner.
In general you could run it on anything that has drivers for it - GPU or CPU.
The language itself is very similar to C (not C++) with a few additions and exceptions.
It also provides a small standard library of functions (mainly math functions) and native types for tuples (float3/float4).

In this journal, I will write and explain a small demo of how to use OpenCL for general computations. The actual raytracer will be expained in the next journal.

Compiling and linking OpenCL applications
First you need to decide on what hardware you will run your app. Since I have an ATI 5650 GPU, I downloaded the AMD APP SDK from
http://developer.amd...es/default.aspx
This (by default) installed the OpenCL.lib file n C:\Program Files\AMD APP\lib\x86 and all headers in C:\Program Files\AMD APP\include\CL .

Write the following application and link against the OpenCL.lib

#include <CL/cl.h>
void main()
{
   // 1. Get a platform.
   cl_platform_id platform;
  
   clGetPlatformIDs( 1, &platform, NULL );
}

If it builds and links proceed to the next step :

Building a minimal application
So now that you can use the OpenCL API, its time to write a short OpenCL app.
We need 2 things : First to write an "OpenCL kernel" - this is a small function written in the OpenCL language that is executed in a separate thread, and second - to init the OpenCL library properly and pass data to the "kernel" - the data that our GPU will be processing in this case.

If you are intrested in detailed description of a OpenCL API function, visit http://www.khronos.o...ocs/man/xhtml/.
I'm gonna keep this as brief as possible, but mostly this is boilerplate code you probably would want to copy/paste every time.

As you already saw, we need to get an OpenCL platform :
// 1. Get a platform.
   cl_platform_id platform;
  
   clGetPlatformIDs( 1, &platform, NULL );

The first argument sets how many platforms we want to get (if the second param is an array) and we could use the last parameter to get the maximum number of platforms available.
After that we need to request a "device". A device is the actual hardware we are going to use - in our case our GPU :
   // 2. Find a gpu device.
   cl_device_id device;

   clGetDeviceIDs( platform, CL_DEVICE_TYPE_GPU, 1, &device, NULL);
We pass our platform as the 1st parameter and the type of the device as the 2nd.
Now we need a context and a command queue.
  // 3. Create a context and command queue on that device.
  cl_context context = clCreateContext( NULL, 1, &device, NULL, NULL, NULL);
  cl_command_queue queue = clCreateCommandQueue( context, device, 0, NULL );

If you are intrested in the parameters, look at he specification.
Now we need to create the actual OpenCL program. Its advisable to store it in a file and load it from there. Let's call this file kernel.txt.
Here is how we are gonna read its contents and load it :


  // 4. Perform runtime source compilation, and obtain kernel entry point.
  std::ifstream file("kernel.txt");
  std::string source;
  while(!file.eof()){
    char line[256];
    file.getline(line,255);
    source += line;
  }
  const char* str = source.c_str();
  cl_program program = clCreateProgramWithSource( context, 1, &str, NULL, NULL );
  cl_int result = clBuildProgram( program, 1, &device, NULL, NULL, NULL );
  if ( result ){
    std::cout << "Error during compilation! (" << result << ")" << std::endl;
  }

Note that I haven't done any error checking until know. However, clBuildProgram should always be checked, since if you make a mistake in the OpenCL code itself - it will be detected here.
Lets know load our "kernel" function - if we compiled our program successfully this should work.
   cl_kernel kernel = clCreateKernel( program, "main", NULL );

Ok its time to create the data that we are gonna pass into our kernel function and the data we are gonna use as output from the function.
This is important step to remember since you would want to modify it for each application you write.
In this example I'm going to sum the contents of two arrays (of equal size) and display the result on the screen.
So I need three "OpenCL buffers".


  // 5. Create data buffers.
  cl_mem output  = clCreateBuffer( context, CL_MEM_WRITE_ONLY, 10*sizeof(cl_int), NULL, 0 );
  cl_mem buffer1 = clCreateBuffer( context, CL_MEM_READ_WRITE, 10*sizeof(cl_int), NULL, 0 );
  cl_mem buffer2 = clCreateBuffer( context, CL_MEM_READ_WRITE, 10*sizeof(cl_int), NULL, 0 );
  clSetKernelArg(kernel, 0, sizeof(output), (void*) &output);
  clSetKernelArg(kernel, 1, sizeof(buffer1), (void*) &buffer1);
  clSetKernelArg(kernel, 2, sizeof(buffer2), (void*) &buffer2);

Now we need to fill the input buffers that we are going to process.

// 6. Fill input data buffers
cl_int *buffer1Ptr = (cl_int *) clEnqueueMapBuffer( queue,
buffer1,
CL_TRUE,
CL_MAP_WRITE,
0,
10 * sizeof(cl_int),
0, NULL, NULL, NULL ); 

cl_int *buffer2Ptr = (cl_int *) clEnqueueMapBuffer( queue,
buffer2,
CL_TRUE,
CL_MAP_WRITE,
0,
10 * sizeof(cl_int),
0, NULL, NULL, NULL );

for(int i = 0; i < 10; i++){
buffer1Ptr[i] = i;
buffer2Ptr[i] = i;
}

clEnqueueUnmapMemObject(queue, buffer1, buffer1Ptr, 0, 0, 0);
clEnqueueUnmapMemObject(queue, buffer2, buffer2Ptr, 0, 0, 0);


We pass the buffer as the second param to the clEnqueueMapBuffer function. The third param denotes we want to "block" until the operation is complete.
This is important if we want simplicity, otherwise we would have to use events to sync. Lastly we need to set the operation we are doing on the buffers and the amount of memory we are going to use

The clEnqueueMapBuffer functions return buffers which we can modify directly (using for loops above). Alternatively we could use memcpy.
Finally we Unmap the buffers, denoting we are done writing information to them.

Next we need to execute our "kernel" function :
size_t global_work_size = 10;

// 7. Execute the kernel
clEnqueueNDRangeKernel(   queue,
kernel,
1,
NULL,
&global_work_size,
NULL, 0, NULL, NULL);




We set the work size (which is usually the same as the size of our output buffer)
Finally read the result and print it to the console :

// 8. Look at the results via synchronous buffer map.
cl_int *resultBufferPtr = (cl_int *) clEnqueueMapBuffer( queue,
output,
CL_TRUE,
CL_MAP_READ,
0,
10 * sizeof(cl_int),
0, NULL, NULL, NULL ); 

for(int i = 0; i < 10; i++){
std::cout << "ptr[" << i << "] = " << resultBufferPtr[i] << std::endl;
}


The full source of our program :


#include <iostream>
#include <fstream>
#include <CL/cl.h>
void main()
{
   // 1. Get a platform.
   cl_platform_id platform;
  
   clGetPlatformIDs( 1, &platform, NULL );    // 2. Find a gpu device.
   cl_device_id device;
  
   clGetDeviceIDs( platform, CL_DEVICE_TYPE_GPU, 1, &device, NULL);
   // 3. Create a context and command queue on that device.
   cl_context context = clCreateContext( NULL, 1, &device, NULL, NULL, NULL);
   cl_command_queue queue = clCreateCommandQueue( context, device, 0, NULL );   // 4. Perform runtime source compilation, and obtain kernel entry point.
   std::ifstream file("kernel.txt");
   std::string source;
   while(!file.eof()){
      char line[256];
      file.getline(line,255);
      source += line;
   }
   const char* str = source.c_str();
   cl_program program = clCreateProgramWithSource( context,
			 1,
			 &str,
			 NULL, NULL );
  cl_int result = clBuildProgram( program, 1, &device, NULL, NULL, NULL );
  if ( result ){
    std::cout << "Error during compilation! (" << result << ")" << std::endl;
  }  cl_kernel kernel = clCreateKernel( program, "main", NULL );
  // 5. Create data buffers.
  cl_mem output  = clCreateBuffer( context,
           CL_MEM_WRITE_ONLY,
           10*sizeof(cl_int),
           NULL, 0 );
  cl_mem buffer1 = clCreateBuffer( context,
		 CL_MEM_READ_WRITE,
		 10 *sizeof(cl_int),
		 NULL, 0 ); cl_mem buffer2 = clCreateBuffer( context,
		 CL_MEM_READ_WRITE,
		 10*sizeof(cl_int),
		 NULL, 0 );
  clSetKernelArg(kernel, 0, sizeof(output), (void*) &output);
  clSetKernelArg(kernel, 1, sizeof(buffer1), (void*) &buffer1);
  clSetKernelArg(kernel, 2, sizeof(buffer2), (void*) &buffer2);  // 6. Fill input data buffers
  cl_int *buffer1Ptr = (cl_int *) clEnqueueMapBuffer( queue,
		  buffer1,
		  CL_TRUE,
		  CL_MAP_WRITE,
		  0,
		  10 * sizeof(cl_int),
		  0, NULL, NULL, NULL );
 cl_int *buffer2Ptr = (cl_int *) clEnqueueMapBuffer( queue,
		 buffer2,
		 CL_TRUE,
		 CL_MAP_WRITE,
		 0,
		 10 * sizeof(cl_int),
		 0, NULL, NULL, NULL );  for(int i = 0; i < 10; i++){
    buffer1Ptr[i] = i;
    buffer2Ptr[i] = i;
  }
  clEnqueueUnmapMemObject(queue, buffer1, buffer1Ptr, 0, 0, 0);
  clEnqueueUnmapMemObject(queue, buffer2, buffer2Ptr, 0, 0, 0);
  size_t global_work_size = 10;
  // 7. Execute the kernel
 clEnqueueNDRangeKernel(   queue,
	  kernel,
	  1,
	  NULL,
	  &global_work_size,
	  NULL, 0, NULL, NULL);

  // 8. Look at the results via synchronous buffer map.
  cl_int *resultBufferPtr = (cl_int *) clEnqueueMapBuffer( queue,
	 output,
	 CL_TRUE,
	 CL_MAP_READ,
	 0,
	 10 * sizeof(cl_int),
	 0, NULL, NULL, NULL );  for(int i = 0; i < 10; i++){
    std::cout << "ptr[" << i << "] = " << resultBufferPtr[i] << std::endl;
  }
}

Now we need one more thing : the kernel file.
Open up your text editor and create a kernel.txt with the following contents
__kernel void main( __global int *dst, __global int* buffer1, __global int* buffer2 ) 
{
dst[get_global_id(0)] = buffer1[get_global_id(0)] + buffer2[get_global_id(0)];
}


There are a couple of strange things above :
get_global_id(0) returns the current id of our job. Since we set our work size to be 10, this means we will get 10 parallel executions of our functions with ids of 0-9.
__kernel denotes that the function is a kernel.
The __global means that the memory is allocated in the global memory pool.

So that was it!
Hope somebody finds this helpful. I will post the Raytracer code with explanations next week.


New year, Robica and Trine 2

Posted by , in Uncategorized, Robica 01 January 2012 - - - - - - · 838 views

First of all happy new year! I hope that 2012 will brings new emotions and sensations.
I wish you all to be productive at your hobby, but still have much time for your friends and families, and other things you love.

So much stuff to write about...
I've done a fair amount of work during the holidays.
I carried my laptop during the holiday trips. You never know when the muse to do stuff will strike you...

Game status

One of the major technical improvements with Robica is Frustum culling.
This was already started ages ago, but I needed some tweaks to make it usable.
For anyone unfamiliar with the technique I recommend this article http://www.lighthous...rustum-culling/ .
Basically you try to determine objects that are outside the visible portion of the camera and not render them.
Since in Robica the view is top down - there is only about 300-400 tiles you can see at once, which makes this very effective.
While this is just a small technical improvement - the effect on the gameplay is huge, because this now allows me to create much bigger levels.

Another aspect of the game I've been working on is the Level editor. Currently the levels are stored as plan text tile maps such as this :
xxx
x0x
xxx
This is easy to edit in a text editor - so I've been doing all the puzzles up until now in notepad. However, as the game becomes more complex - the levels become more complex and I need to edit various other properties of the tiles by hand which is hard.
After some thinking, I've decided to do a very rude In-game level editor.
This will be explained in more detail in a following entry.

The third thing I'm working on is the story line. Robica will have a simple and not too deep story that will drive the player to explore the strange world he will be set in. This will also be explained in a separate blog post.

Finally, since 2011 is over - I'm setting a new deadline for Robica's completion - 6 months.
This (if met) will bring the total development time to a little over than 2 years.
It will be pretty hard to do though.

Other stuff

On another note, unrelated to Robica - is a game called Trine 2 that I've been playing during the holidays,
I must say I'm very impressed with Frozenbyte's creation (as well with their other games).
The game is awesome on many levels and I strongly recommend it if you like platformers with puzzles.

http://www.youtube.com/watch?v=rAx9Q9z4Pdo


Robica - Teleporter robot and new puzzle

Posted by , in Uncategorized, Robica 13 December 2011 - - - - - - · 570 views

Hi,

I've been working some more on my puzzle/adventure game called Robica .
I'm showing a new challenge and a new robot (The Teleporter) in this video :




The teleporter is a robot that can teleport anywhere that he can get his "teleportation beam" to point to.
This mehanic allows to teleport to walls far behind pit falls, or to move objects and teleport at them.
I created a special kind of tile (the white wall), which prevents teleportations. This is used to restrict this ability and create puzzles.
The actual model of the Robot is done by Alex Anke (XDigital on this forum).

Thanks to anyone who is following my progress!


Fall - 0h game jam

Posted by , 30 October 2011 - - - - - - · 374 views

Hi world,

Some folks come up with the idea to create a game in zero hours (because of daylight saving time).
So the challenge was to wait for one hour before the DST shift and make the game until the time is set to go 1 hour back.
In my case, this was 3 AM (For England and Germany 2AM).

I thought the design a few hours beforehand - I decided to make something like Icy Tower.
I knew however that the jumping and random level generation would be too much for 1 hour, so I decided to twist a bit the idea -
instead of going up (jumping), you go down (falling). Thus I called the game Fall.

I had everything set up and tried my best. Unfortunately I wasn't able to finish on time. By DST shift I had a randomly generated level,
that goes up, with new platforms being generated when the old ones go out of scope.

This is the full time lapse
http://www.youtube.com/watch?v=BlX4tVg9dO4

And gameplay video:

http://www.youtube.com/watch?v=3qZkg29tmQc


The full thing took me about 4 hours to do. I had something playable by the third hour.
It was really fun and I actually completed a game this year! Woo-ho!

You can download Fall here http://kamen.zzl.org/Fall/Fall.rar

Thanks for reading and playing:)


Robica - Another Puzzle

Posted by , in Uncategorized, Robica 12 October 2011 - - - - - - · 405 views

Hello gd world,

Progress on my Puzzle/Adventure game Robica is going along.
If you are unfamiliar with my game, please check my previous 2 entries.

Here are some of the new stuff that I implemented since the last journal :
- "Magneto" robot can attract objects
- Created a robot - swapping block. The player can walk with a robot over it and press "TAB" to switch to another robot.
- Created a "Dead man's switch" block. Its a handle that activates devices (such as doors) while something is standing on top of it,
and deactivates them when there isn't something on top of it. Its a concept similar to Portal's red platforms.
Posted Image


The game is taking shape, but creating puzzles is harder and slower than I thought. Right now it takes me about a week to create
a puzzle. A puzzle is like a room in Portal - the final level will be build using several of these puzzles.
As you can see in the videos I solve these puzzles in about 2 minutes.
So if I continue to produce them with the speed I currently do, it would take me 7-8 months (!) to create just enough puzzles for 1 hour gameplay.
The original goal was to finish the game before 31 December 2011. This means I either have to make the game shorter or somehow improve my workflow.
The latter will be hard considering I have a full time job and school started last week.

There is the mandatory video of the puzzle I created :

http://www.youtube.com/watch?v=WoAdgyZm6_0


Robica - New robot & video

Posted by , in Uncategorized, Robica 21 September 2011 - - - - - - · 394 views

Hello gamedev world,

Sadly I didn't have enough time this week to make another level. In case you missed the video & into I posted last week, you can
check my previous entry.

What I managed to do this week is the model of another robot.

Posted Image
Posted Image

This robot is intended to have the ability to attract and repulse objects at distance. His code name is "Magneto" Posted Image

Also a video of the robot in action :
http://www.youtube.com/watch?v=0_lIp8-6FMo

Hopefully I will be able to do more work during the holidays and write about it.
Thanks for reading & commenting.


Robica - Intro & Video

Posted by , in Uncategorized, Robica 13 September 2011 - - - - - - · 409 views

Hello world,



it took me quite a lot longer to post the follow up of my "New journal land scribe", but here am I.
So today I decided to finally let you know about my "top secret" project and give a video at the end of the entry.

Robica is a game that I and a fellow from these forums (Alex Anke) started last year. It was actually an entry for the Intel level up 2010.
The game was not selected as a finalist, which means the motivation behind the project gradually went down to cold zero.
Fortunately (for Alex) his game Cycle was selected as a finalist and eventually won ( read the end of the thread).

Long story short Robica was frozen for a long time, and from a month or two I'm continuing it (all by myself).
So what is it about?


In the game there are different robots.
The player controls one of them at any moment, and tries to reach the next level, by reaching the level exit.
In a level, there are various difficulties, which attempt to challenge the player such as locked doors, enemies and automatic turrets, puzzles that need quick reactions
or more in-depth thought.
In each level, there are special ‘robot changers’ which the player could use to swap one of the robots with another, so he could use that robot’s abilities.


The robots and their abilities are undetermined yet, but I will share my thoughts on them in another entry.
Any suggestions on cool robot abilities that could be used in puzzles? This could be anything from a weapon such as a flamethrower, grenade launchers,lasers...etc,
to more passive skills such as cloak, shields or time dilation...

http://www.youtube.com/watch?v=nbhcmJJOOHc

You can also see the older videos from my profile that were recorded last year
http://www.youtube.com/watch?v=MUEsDckJ2os&feature=related
http://www.youtube.com/watch?v=k2vRtg4MM30&feature=related


Why I'm starting this journal ?

Posted by , 22 August 2011 - - - - - - · 393 views



Hi, folks,

First of all if you are reading this, then I’m terribly sorry for your time loss.
Seriously.

I want to outline why I’m programming games and why I’m starting this blog (as I never maintained one).

During the past 5 years, I’ve created several games, mostly uninspired arcade clones.
While probably the people who played these games, could be counted on my fingers and toes, I had fun building and learning from these games.
I learned programming, because of them.

In my time, I discovered that two questions consistently come up regarding creating games – “how” to create what I wanted to create and “why”
I wanted to create it.

The “how” was – and still is – easier to answer, because I had to answer it just once and it stayed answered.
The “why” I had to answer many times, and yet it stayed unanswered.

Weird.

I believe that the answers to “why” change with time (just as the answers to “how” change, thanks to gaining knowledge).
What could be a reason for creating a game before, may become invalid. And new reasons may come up, too (money, being the most frequent,though).

When I was younger and created my Breakout clone (my first game), I was amazed by how I could write some text in a file and suddenly my paddle started to reflect the ball. Or I could move the paddle with my mouse.Or the ball destroyed a brick and reflected back.

Posted Image

Programming was for me the ultimate sandbox – one without limits.
And that’s what kept me working on games all these years.
Because I like to play in the sand.

Posted Image

As I said, however, reasons change and arise, and one of the newish reasons I am undoubtfully ashamed to confess is desire of acknowledgment.
Or pride perhaps.

And this leads me to the reason why I started this blog.
I thought that by doing so, I will get some people to play the game I’m currently developing (if I get it to a “playable” state).

More on that will come in another entry.
I’m currently on vacation, so I will have plenty of time to write about it.















Categories

August 2016 »

S M T W T F S
 123456
78910111213
14151617181920
21222324252627
28 293031   


PARTNERS