Jump to content

  • Log In with Google      Sign In   
  • Create Account


Member Since 05 Feb 2010
Offline Last Active Jun 23 2013 12:00 AM

Posts I've Made

In Topic: OpenCL or OpenMP for Streaming in a Game Engine

08 March 2013 - 04:53 PM

Now, I don't have a deep understanding of OpenCL and OpenMP, but I believe these are tools to help spawn and manage many, many threads on either the CPU or GPU.


OpenCL isn't designed for task parallelism, but OpenMP is actually pretty well suited to forking off a single thread at a time.


And compared to OpenCL, it's very easy to use OpenMP in an existing C++ project.


#pragma omp task would be a good place to start.

In Topic: about OpenMP

07 March 2013 - 02:38 AM

OpenMP is supported in Visual C++ 2010, but it isn't enabled by default when you make a new project.
Go to Project->Properties, then Configuration Properties, then C/C++, Language, and set Open MP Support to Yes.
Kambiz and Yourself are correct, too - the compiler is optimising away the code branches that don't go anywhere.
Try switching on OpenMP support in your project and running this:


#include "windows.h"

#include "math.h"

#include <omp.h>

#include <iostream>

int main(void)


    double t1 = omp_get_wtime( );

    float sum = 0.0f;//new!

    for(int i = 0;i < 8;i ++)


        float a = 0;

        for(int j = 0;j < 10000000;j++)


            a += sqrtf(j);


        sum += a;//new!


    double t2 = omp_get_wtime( );

    std::cout<<"time: "<<t2 - t1<<std::endl;

    std::cout<<"sum: "<<sum<<std::endl<<std::endl;//new!

    sum = 0.0f;//new!

#pragma omp parallel for

    for(int i = 0;i < 8;i ++)


        float a = 0;

        for(int j = 0;j < 10000000;j++)


            a += sqrtf(j);


        sum += a;//new!


    double t3 = omp_get_wtime( );

    std::cout<<"time: "<<t3 - t2<<std::endl;

    std::cout<<"sum: "<<sum<<std::endl<<std::endl;//new!

    std::cout<<"speed improvement: "<<((t2-t1)/(t3-t2))<<"x"<<std::endl;//new!


    return 0;




You should see a speed improvement that's close to your CPU's core-count. I get around 3.8x on a quad-core.

In Topic: Why can't Bilinear/Bicubic resize transparent images properly?

04 October 2012 - 04:19 AM

The problem is that the transparent pixels bordering the opaque pixels are still light grey in their RGB values, even when they have zero alpha.

So when you take anything other than a nearest-neighbour sample of those borders, you get a blend of each pixel and its neighbours. So you're getting a blend of alpha values and a blend of RGB values.

So say you get a 50-50 blend of opaque warrior and transparent background. That's a 50-50 blend of alphas (in this case: 50%) and a 50-50 blend of RGB values, in this case a shade of grey that's between the background colour and the warrior's colour. The result you're getting is in fact completely correct - it's your source material that's gone wrong.

The solution is to always, always save your sprite art with a black background, even where it's completely transparent. Some programs will solve the problem for you automatically, but most will not. Don't expect a game engine or realtime graphics API to get it right, because alpha blending always expects the foreground layer's transparent areas to fade to black.

In Topic: SPH simulation - pressure force doens't work

15 September 2012 - 12:28 AM

Is there any particular reason you're dividing the pressure force by the particle density?

particles[ i ].velocity += ( particles[ i ].force / particles.density + gravity ) * dt;

That means that as particles pile up, the pressure force will decrease. That's the opposite of what you want to happen.

You should instead divide the force by each particle's mass, such that F = ma. Density is not a substitute for mass in this case. Density shouldn't even be considered as a per-particle value - you're only storing it so that you can sample the density field via a smoothing kernel.

Simply removing that reference to density in the integration step should get you some fluid-like behaviour.

In Topic: OpenCL ports and wrappers

01 February 2012 - 07:41 PM

Did you know that there's already an official C++ wrapper for object-oriented use of OpenCL?
It's in here: http://www.khronos.org/registry/cl/
specifically: http://www.khronos.org/registry/cl/api/1.2/cl.hpp
docs here: http://www.khronos.org/registry/cl/specs/opencl-cplusplus-1.1.pdf

It's relatively simple and painless to use. First, include cl.hpp.
Then you create a cl::Context, some cl::Kernel objects, and some cl::Buffer or cl::Image objects, and finally you load your kernels into a cl::CommandQueue for execution.

There are OpenGL-friendly versions of the buffer & image classes in cl.hpp that you can use as VBOs or FBOs for speedy display, too, so if you already have an OpenGL project that you can use for "fast prototyping", you should be able to incorporate a CL context with some texture-processing kernels and see results straight away.

Last I checked, though, NVidia weren't using cl.hpp in their GPU computing SDK, and their OpenCL examples were pretty crudely ported from their CUDA examples, accessing the OpenCL API through some rather impenetrable C code.

ATI's Stream SDK (aka AMD APP SDK) has some decent examples of proper use of the C++ wrapper, though. ATI seem to be a lot more interested in OpenCL than NVidia are, and this is reflected in their sample code. So even if you're on or targeting NVidia hardware specifically, you may as well install the AMD APP SDK just to have better sample code to learn from. Most of it should still compile & run with few modifications, regardless of your hardware setup - but bear in mind that there are OpenCL extensions that may be unsupported on one platform or the other.