Jump to content

  • Log In with Google      Sign In   
  • Create Account


Member Since 30 Aug 2006
Online Last Active Today, 09:37 AM

#5312370 Is OpenCL slowly dying?

Posted by on 24 September 2016 - 11:29 PM

I've recently noticed that it looks like the support for OpenCL looks like being slowly dropped in favor of using Vulkan (although it might be only in game industry, as I assume OpenCL is still used in places where rendering is not going to be the thing),


Agree, but even for non rendering tasks Compute Shader will be preferred because you can do both async if you use Graphics API for everything.

Also NVidias lack of support makes OpenCL inpractical because 1.2 has no indirect dispatch.


I do a large project in OpenCL and Vulkan (to get some profiler data - CodeXL does not work yet for Vulkan).

Ignoring the dispatch problem and looking only at GPU time and AMD, the performance varies about 20-30%. Sometimes VK wins, sometimes OpenCL.

Next Vulkan will have data sharing, so personally i'm still considering OpenCL as an alternative in extreme cases.

#5312323 Branching in compute kernels

Posted by on 24 September 2016 - 10:19 AM

In my Vulkan implementation, it just crashes after the compute fence goes timeout because of resource locking.

So the shader finishes, and you get the crash afterwards?

Try a vkQueueWaitIdle() after vkQueueSubmit(), to quickly ensure all work has finished and see if it prevents the crash.


Or does the shader never finish? Usually this causes a bluescreen or a unstable system (Do NOT save your source files in this case - reboot instantly. I've lost some work with this)

Probably a infinite loop - implement an additional max counter to prevent this and see if the crash goes away.



Can you post some performance comparision if you get it to work?

#5312164 How to get voxel opacity during gpu surface voxelization

Posted by on 23 September 2016 - 02:05 PM

One common representation of aniso-voxel is 6-axis color. But I think the SH representation is more appropriate for this situation




I made this pic to show the problem you get with SH.


On the left is a voxel of solid wall. If we want to know the visible surface area from any direction, we simply intersect the direction with the perfect blue circle for the correct answer.

(Should have been adding this to the picture - probably not very clear but the circle is exactly the signal we want to encode here to calculate accurate occlusion)

The red sketch of a 2-band SH is pretty accurate here.


In the middle is a double sided wall, so we want circles in both directions.


On right we see the 2-band red sketch again - a unit circle without any directional information, we get half occlusion no matter if we expect full or zero occlusion :(

The green is what a 3-band will look like: better, but still bad.


The big advantage is that SHs are rotational invariant. They can cover any surface normal exactly (as long as only one surface is in the voxel). No snapping to main directions.

But it definitively can not prevent color bleeding on low resolution.


Maybe the VTR feature in dx11.3 is the cure!

... in five years - maybe. Up to now no AMD hardware has support. Also, AFAIK any tiled resources need to be managed from CPU, which is... terrible?


It's a bit of a dilemma. I still think the only way using VCT is to carefully design levels so either walls are thick enough or leaks are no big problem.

#5311218 Physically Based Rendering

Posted by on 17 September 2016 - 11:20 AM

Maybe this helps: https://www.shadertoy.com/results?query=pbr

I guess it's easier to go from OpenGL shader than from full DirectX sample

#5311114 How to get voxel opacity during gpu surface voxelization

Posted by on 16 September 2016 - 01:19 PM

I don't know about technical MSAA details, but if you can read the number of covered samples in pixel shader, density would be simply surfaceAlpha * coveredSamples / 8.0.

If this works you can accumulate density weighted color samples: targetVoxel += vec4(surfaceRGB * density, density),

and in a final pass 'normalize' all voxels: voxel.xyz /= voxel.w; voxel.w = min(1.0, voxel.w);

This should give a good average if the voxel is covered by multiple surface samples, and also a transparent value if it's only partially covered.


The quality improvement would be less popping on dynamic geometry (of course you still would get better results from the binary approach by increasing voxel resolution).


You would need to describe (or show) your kind of 'big problem' with more detail.

There are issues with voxel tracing for GI - i don't think it will become a generic practical solution suited for every kind of game.

#5311061 How to get voxel opacity during gpu surface voxelization

Posted by on 16 September 2016 - 06:19 AM

Guess you could use coverage from multisampling?

#5309537 Making realtime raytracer interactive?

Posted by on 05 September 2016 - 11:43 AM

As there is no way to update the position of the camera or objects via user input without sending info from the host without slowing it down, i am at a loss.


You can upload your data for next_frame+1, instead for next_frame (or even next_frame+2, if you use triple buffering).

Basically it's a trade off between frequency (FPS) and latency (lag), but usually you can get rid of any slow down in practice without noticeable lag. (It becomes more of an issue with VR, where lag is more likely to be recognized).


Additianally, modern APIs allow to upload data while both CPU and GPU are busy with other things, probably OpenGL has this too - not sure.

#5309498 Spherical camera movement for planets

Posted by on 05 September 2016 - 04:39 AM

Calculate camera orientation by using the up-vector pointing from planet center to player (gravity direction).

This up vector is the only thing you can trust in,

e.g. an additional tangent towards south pole (or any other fixed point on the planet surface) will flip as soon as you move over that point.

But the up vector alone is enough - looking left / right becomes rotating about up vector, Looking up down becomes rotating about cross(upVector, characterFrontDir).Unit()


So after that you have a matrix or quaternion storing camera orientation in worldspace, and you need to get 3 Euler angles from that.

Getting Euler angles directly from player input would be just pain - don't try to do so - waste of time.


Make sure your engine really does not support any other camera option than Euler angles (pan, tilt, roll, - pitch, yaw, roll, - rotate x,y,z... mathematically that's all just Euler angles).

To convert from matrix / quaternion to Euler you also have to know (but mostly have to guess) the order of the 3 rotations defined by those 3 angles.

This can become a frustrating source of trial and error.


If your engine can take any kind of projection matrix (And it really should - otherwise, what engine do you talk about?), use this instead.

#5308604 Working code doesnt work in opencl

Posted by on 30 August 2016 - 01:01 AM

The only thing that helps here is to add additional code so you can output numbers and compare with cpu.

The more time you invest in this upfront, the more you save later.


I this case it would first identify a certain ray / sphere combination, output their data to make sure positions, directions etc. match

(does indexing memory work? contains memory the values i expect?),

and if so start to output values calculated inside the function...

#5308602 Seriously weird bug in opencl

Posted by on 30 August 2016 - 12:49 AM

I would tell you that learning OpenCL is currently a waste of time when it comes to game programming.


I think this is a very bad advise.


OpenCL is the easiest way to do and learn GPU compute - you need one line of code to upload to GPU memory, with Vulkan you need 50.

Same ratio for the simple thing of executing a kernel.


OpenGL compute shaders are also a lot more cumbersome to use than OpenCL,

and at least in the past OpenCL was twice(!) as fast on Nvidia, and a bit faster on AMD.

You would not make such a statement if you would have taken the time to test this yourself.

OpenCL is not popular in game dev, but that's simply our own fault.


AFAIK SpirV is core with OpenCL 2.1, NV is at 1.2, AMD is at 2.0.

Only Vulkan really uses it now, but no one writes SpirV directly - it's just intermediate byte code used by compilers, so not relevant for learning.

Feature wise OpenCL 1.2 language and GLSL are very similar - anything you learn can be adapted to the other easily.

#5308096 Bullet - position a collision capsule for an animated character

Posted by on 26 August 2016 - 02:59 PM

I'll try to giv an example. Say AI wants to have the initially still standing character walk a straight line from a to b with constant velocity 5.

We then calculate the force necessery to change capsule velocity from zero to 5.


But we don't take ground friction into account, so we reach only a velocity of 3,

and because of that the capsule position is between the predicted animation target and its initial position.


Let's call this the position error and assume this error is smaller than our threshold,

so we render the character at predicted position and use the animation to set our next simulation target.


After some simulation steps the error becomes smaller and smaller and everything is fine.

So this is the compensation part of the example.


Then something bad happens: There is a heavy big crate in the way and AI was not aware of this - character tries to walk through which is impossible.

The capsule gets stuck, and the difference beween capsule and animation becomes larger and larger each step, and when it's larger than the threshold (10cm or something),

it's time to change the animation (keep a max distance of 10cm from rendered character to capsule), notify AI of the obstacle to stop walking and whatever.



This should work for the moving platform example as you expect (although the trick might filter out some of the cool sliding you expect, similar to how a low pass filter removes details).

Clamping force should be done always, also to prevent the capsule from moving too heavy crates and to keep simulation stable.


I think you can get some robust mechanics from that, but will not help to get life like animations - combining animation and simulation always makes this harder :)

#5307979 Bullet - position a collision capsule for an animated character

Posted by on 26 August 2016 - 12:39 AM

I think you should not precompute for the whole animations - stuff will drift appart not only due to friction, but also because of integration errors.

Instead i'd make the body follow the animation and allow a error threshold.

As long as the body / animation difference is smaller than the threshold, keep animation and compensate the error for physics in the next timestep,

otherwise change the animation so the difference is not larger than threshold (but keep trying to compensate physics error).


Setting force instead velocity as you say should work better (engine can fix bad input like trying to push something inside a wall),

It depends on physics engine and maybe the selected solver, but you should see improvements using forces, like less jitter on the wall example.


Here's some code to get the force from a target velocity.

currentVel is the actual body velocity.

Finally you should clamp resulting force to a maximum magnitude to prevent supermen and physics blow ups.

If you set this maximum large enough, the capsule should follow also with ground friction and it will push light weighted obstacles out of the way.

E.g. for the friction case it will automatically calculate a larger force because actual measured velocity will be initially low.

There will be some lag, oszillations or even jitter, but the error threshold talked above should be able to hide those things.



inline sVec3 ConvertLinVelToForce ( sVec3 &targetVel, sVec3 &currentVel, float timestep, float mass)
    sVec3 force ((targetVel - currentVel) * (mass / timestep));
    return force;

#5307819 Bullet - position a collision capsule for an animated character

Posted by on 25 August 2016 - 05:19 AM

Havok has animated ragdolls built in (at least it was so 10 years back).

You could take a look how their stuff works (how it reacts to obstacles etc).


Personally i do it the hard way using Newton physics engine: Full simulation of the whole skeleton, balancing controller etc.

Newton allows to create powered joints that work stable enough for a walking character (but i need at least 90 Hz).

There are plans for a built in simulated character feature in the (near?) future.


Because you are only interested in animation, it's a lot easier and that should work with Bullet too (although in my experience it's the worst engine i know when it comes to stability).

The main problem is the question how simulation should affect things, how to handle feedback from physics by procedural animation.

Your question about friction may be only the first of a infinite number...




Oh sorry - you talk about a SINGLE capsule, not one capsule per bone as i thought :)


Have you tried not to move the capsule by character, but the other way around?

I made a simple capsule character controller by attaching a upvector constraint to keep it upright despite friction and applying forces to move it at target speed.

You can tweak this easily to your needs and then use the capsule velocity to calculate matching animation speed.

#5305981 render huge amount of objects

Posted by on 15 August 2016 - 09:14 AM

Storing trees in linear arrays is always good, most if the tree structure remains static (e.g. a character).

That does not mean you have to process the whole tree even if there ar only a few changes.

The advantage is cache friendly linear memory access. You get this also for partial updates, if you use a nice memory order (typically sorted by tree level as hodgman said, and all children of any node in gapless order).


However, 100 is a small number and i can't imagine tree traversal or transformation can cause such low fps.

Do you upload each transform individually to GPU? It seems you map / unmap buffers for each object - that's slow and is probably the reason.

Use a single buffer containing all transforms instead so you have only one upload per frame.

Also make sure all render data (transforms and vertices) are on gpu memory and not on host memory.

#5305794 Blade Runner-ish city mood not working, could use some direction

Posted by on 14 August 2016 - 02:22 PM

In addition i miss fog and city density.

Your city looks sparse and still too bright, Bladerunner is dark, dense neverending city - no horizion, just more buildings everywhere.