Jump to content

  • Log In with Google      Sign In   
  • Create Account


We have 4 x Pro Licences (valued at $59 each) for 2d modular animation software Spriter to give away in this Thursday's GDNet Direct email newsletter.

Read more in this forum topic or make sure you're signed up (from the right-hand sidebar on the homepage) and read Thursday's newsletter to get in the running!


Member Since 30 Aug 2006
Offline Last Active Nov 20 2014 03:23 PM

#5191669 Is there a known bug with glMemoryBarrier() when using the latest AMD Catalys...

Posted by JoeJ on 07 November 2014 - 08:45 AM

Same happens to me here on 280x sad.png

It compiles when using Intel CPU driver, but also does not when using AMD CPU driver.


Now you have a nice collection of driver bugs to submit.


I hope AMD soon releases a new driver with generic 2.0 support and fixing this.

Also when both AMD and Intel have 2.0 then, maybe NV is willing to update their OpenCL too.


Edit: Same for cl_khr_spir, cl_khr_image2d_from_buffer, cl_khr_image2d_from_buffer, cl_khr_dx9_media_sharing, ... ?


What a mess



See http://devgurus.amd.com/thread/155539

Very similar problem, so maybe you can just remove the pragma line and use the extension?

#5191533 Is there a known bug with glMemoryBarrier() when using the latest AMD Catalys...

Posted by JoeJ on 06 November 2014 - 10:02 AM

Maybe this one:




I'm not sure because i also dwnloaded OpenCL 2.0 beta drivers (which did not work for me, because they need win 8).

But i believe this is the right file - it may be a beta driver, but i don't think so.


I'm using R9 280x

#5191529 Is there a known bug with glMemoryBarrier() when using the latest AMD Catalys...

Posted by JoeJ on 06 November 2014 - 09:35 AM

What i thought is using ping pong image, meaning CL writes to imgB while GL reads from imgA, swapping pointers at the start of each frame.

The image should contain only the stuff that has been updated, so i still need a compute shader to copy this data to the very large full scene lightmap.

But there should be no sync issues, do you agree? Tha lag of one frame should not matter for me.


Edit: cl_khr_gl_event is supported on AMD according to my log:


Selected Platform Vendor: Advanced Micro Devices, Inc.
Version: OpenCL 1.2 AMD-APP (1573.4)

Device: Tahiti
Version: OpenCL 1.2 AMD-APP (1573.4)
Adress Bits: 32
Extensions: cl_khr_fp64 cl_amd_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_gl_sharing cl_ext_atomic_counters_32 cl_amd_device_attribute_query cl_amd_vec3 cl_amd_printf cl_amd_media_ops cl_amd_media_ops2 cl_amd_popcnt cl_khr_d3d10_sharing cl_khr_d3d11_sharing cl_khr_dx9_media_sharing cl_khr_image2d_from_buffer cl_khr_spir cl_khr_gl_event
Global Cache Size: 16384
Cache Line: 64
Local Memory Size: 32768
Max Work Group Size: 256
Max Work Item Sizes: [256,256,256]
Max Image Width: 16384

#5191521 Is there a known bug with glMemoryBarrier() when using the latest AMD Catalys...

Posted by JoeJ on 06 November 2014 - 08:27 AM

I remember better now - in deed i needed to add the fence when changing from NV to AMD. For NV the glMemoryBarrier() alone has worked.

Using the fence then everything worked as expected, no need for coherent.


I hope moving data to GL will not expensive for me too. I plan to share an image between CL and GL and assumed there should be no slow down.

#5191395 Is there a known bug with glMemoryBarrier() when using the latest AMD Catalys...

Posted by JoeJ on 05 November 2014 - 01:55 PM

I've had similar issues on NV cards too, code below is that i ended up using to call after glDispatchCompute.

Just to let you know, i gave up on compute shaders for now and use OpenCL because it's much faster.
I do realtime GI and my stuff covers a lot of different algorithms from complex tree traversal to simple brute force stuff.
On NV cards OpenCL is about twice as fast than Compute shader (!).
On ATI OpenCL is about 20% faster.

And... ATI R9 280X is 2.3x faster than Geforce Titan.

void GPUwait ()
	GLsync syncObject = glFenceSync (GL_SYNC_GPU_COMMANDS_COMPLETE, 0);
	GLenum ret = glClientWaitSync(syncObject, GL_SYNC_FLUSH_COMMANDS_BIT, 1000*1000*1000);
	if (ret == GL_WAIT_FAILED || ret == GL_TIMEOUT_EXPIRED)
		SystemTools::Log ("glClientWaitSync failed./n");
	glMemoryBarrier (GL_ALL_BARRIER_BITS);
	glDeleteSync (syncObject);

#5182229 Compute shader runs more than once?!

Posted by JoeJ on 22 September 2014 - 02:59 PM

At least this problem helped me to solve mine :)


I've not read much yet about sync objects, but adding it to my code fixed my synchrinisation issues:


void GPUwait ()
    GLsync syncObject = glFenceSync (GL_SYNC_GPU_COMMANDS_COMPLETE, 0);
    GLenum ret = glClientWaitSync(syncObject, GL_SYNC_FLUSH_COMMANDS_BIT, 1000*1000*1000);
    if (ret == GL_WAIT_FAILED || ret == GL_TIMEOUT_EXPIRED)
        SystemTools::Log ("glClientWaitSync failed./n");
    glMemoryBarrier (GL_ALL_BARRIER_BITS);
    glDeleteSync (syncObject);


Maybe you should try to add the glMemoryBarrier (i assumed this alone should block until shader is finished, but seems i was wrong)

And in my paranoia i've made all my data coherent in shader:


layout(std430, binding = 0) coherent buffer


Just try, i don't know what i'm talking about :)

My confusion raised too high the last time and i went to OpenCL... seems much easier to learn GPGPU.

#5162992 Can't use imageBuffer with integer data (Compute Shader)?

Posted by JoeJ on 26 June 2014 - 05:47 AM

Ooops, solved:


layout (binding = 4, rgba32ui) uniform uimageBuffer TpackInt;


...i just forgot to add the u rolleyes.gif

#5065328 How do I get the euler angle from a matrix?

Posted by JoeJ on 27 May 2013 - 01:53 PM

I once extended that so it can handle various orders. I did not check if it works with orders like XYX, but XYZ is ok.

Note that it's for OpenGL matrices, so you need to swap matrix indices.


static v3f ToEulerAngles (matrix4f &m, int order = 0x012) // 0x012 = xyz, 0x201 = zxy and so on...

  int a0 = (order>>8)&3;
  int a1 = (order>>4)&3;
  int a2 = (order>>0)&3;

  v3f euler;
  // Assuming the angles are in radians.
  if (m.m[a0][a2] > 0.9999)
  { // singularity at north pole
   euler[a0] = -atan2(m.m[a2][a1], m.m[a1][a1]);
   euler[a1] = -PI/2;
   euler[a2] = 0;
   return euler;
  if (m.m[a0][a2] < -0.9999)
  { // singularity at south pole
   euler[a0] = -atan2(m.m[a2][a1], m.m[a1][a1]);
   euler[a1] = PI/2;
   euler[a2] = 0;
   return euler;
  euler[a0] = -atan2(-m.m[a1][a2], m.m[a2][a2]);
  euler[a1] = -asin ( m.m[a0][a2]);
  euler[a2] = -atan2(-m.m[a0][a1], m.m[a0][a0]);
  return euler;


#5051464 Realistic/alternative first person carrying of objects

Posted by JoeJ on 09 April 2013 - 06:01 AM

Have you already implemented an interaction model?

I ask because - usually you get the effects of your ideas by unwanted accident, while developing :)

It is harder to create a responsive and stiff interaction than a laggy one.

If your model is responsive, it is easy to make it more smooth, laggy, or feeling heavy as you want.

I like the ideas, i saw most of them done very well in the game 'Penumra' for the first time.

It's more physics related than Amnesia and you should toke a look, if you don't know it already.

HL2, Dead Space... are not very good examples. They use Gravity Guns and other magic forces to operate on objects.

I think this is not a question of game design, it's done to hide the physics engines weakness (Push an object against a wall,

and it starts jittering... but you don't recognize - you think it's because the 'magic force')

If you're free to choose physics engine - take a look an Newton!

And if you wanna be really innovative - think most about how to handle the rotation of the objects :)

#5044833 Shaders and VBOs, Performance and Relation

Posted by JoeJ on 20 March 2013 - 04:16 AM

Downgrade? Don't treat my disagreement as attack to you or the things you said.

I just wanted to point out that both methods should result in equal GPU instructions and performance.


Except the performence difference all the other questions already had good answers.

We all know we should not use deprecated stuff and how to put a percent value in relation with a fixed number.

#5044802 Shaders and VBOs, Performance and Relation

Posted by JoeJ on 20 March 2013 - 02:09 AM

Don't agree to all of that.


Both options calculate a combined modelview / projection matrix once per frame.

I assume Danicco simply forgot to post the 2nd projection setup, and matrix setup does not affect performance because he uses lots of vertices (?)


And both should do a single matrix mult per vertex.


Option 1 does it in shader and option 2 too, because OGL will add the necessary instructions automatically (If it doesn't know it should not), right?

Thus my assumption that this accidently happens to option 1 too.

Maybe there's even a driver bug forcing this to happen all the time - who knows?


Also, pure FPS should be accurate enough to measure a 25% difference, if there are enough vertices in the test.

#5044523 Android game loop issues

Posted by JoeJ on 19 March 2013 - 04:08 AM

Edit: "i've all that in one class/file, if you wanna keep Java as less as possible" means:

I use only one activity (GLViewSurface), and generate the second thread within that - so only one activity but multiple thread.


Some personal advice: Understanding lifecycle is somehow frustrating - don't mix that stuff with your game code - don't divide your game code in a way you think android might request.

Keep the interface between OS and game as small as possible. That way the game keeps portable and fun developing, while the frustrating OS section keeps small and exchangeable :)

#5044517 Android game loop issues

Posted by JoeJ on 19 March 2013 - 03:42 AM

You should consider adding multithreading from start on - it was some work for me to add it later.

Because on mobiles calls to OGL do not return until the GPU finished its job, even single cores benefit from it.


For me it looks somehow like this (i've all that in one class/file, if you wanna keep Java as less as possible):




float accelV[3]; // global sensor data


PhysicsThreadFunc () // called from thread 1


      if update request flag -> do physics using accelV, game logic, setup render stuff and store results



GraphicsThread () // onFrameDraw called from thread 2


      if results are ready, copy them, render and set request flag // while rendering (CPU mostly waiting), next physics step executes in parallel



AcceleromaterMessage () // called frequently from thread 3 - you can set the update rate only by symbolic definitions ('GAME', 'FASTEST'...)


      accelV[0] = ... // simply update - writing a float is atomic, so no need to synchronize with simultaneous read access from other threads


#5044380 How to center a rotated line between two angles

Posted by JoeJ on 18 March 2013 - 04:46 PM

I want to calculate the distance d that moves the beta angled blue line out of unit circle center so that both alpha angles are equal.

Seems not so easy than i initially thought - please help :)



#5040430 How feasible is using the NDK?

Posted by JoeJ on 07 March 2013 - 10:12 AM

Using NDK is a must for me, as it allows to share 99,9% of code with IOS, and i can develop my game with Visual Studio 99,9 % of the time :)

Using native activity is only available on updated devices, so i avoid it and use java for those things:


* Loading files (c file functions work, but can't access the comprerssed package contents so easy)

* Sound (OpenAL is available but relatively new to Android)

* Threading (never tried pthreads, but they would be available on IOS too!)

* Creationg OGL context 

* reading sensors

* (Network, server communiction... if you need)


Expect a little bit of pain getting all this basic things to work, but then it shouldn't be a problem

to run any OGL engine with a few #ifdefs, no matter how complex it is

NDK itself is just C++ as you know it - Nvidia has some nice Installers that setup all you need (Cygwin...)