Jump to content

  • Log In with Google      Sign In   
  • Create Account

We're offering banner ads on our site from just $5!

1. Details HERE. 2. GDNet+ Subscriptions HERE. 3. Ad upload HERE.


Member Since 22 Aug 2011
Offline Last Active Oct 27 2014 04:40 AM

Topics I've Started

tree traversal of LBVH with and without stack

03 August 2014 - 05:37 AM


I have implemented a tilebased and clustered deferred shading pipeline and am currently profiling and optimizing.

I am constructing a LBVH for all the lightsources every frame from scratch.

I'm doing this like described here: http://devblogs.nvidia.com/parallelforall/thinking-parallel-part-iii-tree-construction-gpu/

In short:

(1) compute a conservative AABB for all lights

(2) calculate Morton codes for each light

(3) sort the lights along their corresponding Morton codes

(4) construct a LBVH using a sparse tree representation


This works pretty well and uses all together 1 - 1.5 ms for up to 100 000 lights.

On the other hand the traversal of the LBVH consumes lots of time, primarily because I have to do this 2 times. 1st time to calculate the number of lights in each cluster, and the 2nd time after I have partitioned my indices texture to put the actual light indices into.


I have several different implementations for the traversal...

traversal with stack: http://pastebin.com/GZnzGrPw

stackless traversal: http://pastebin.com/4NE5UqVG


there are more variants to the stackless traversal (with less texture memory access) but I think they are not relevant for the time being.


My question is now, why is the stackless traversal faster (2 times as fast!) than the one with stack, even if there are AL LOT more texture memory reads. I figured the order in which the nodes are traversed is the same.

My theory goes as follows:

GPUs utilize fast context switches to gain performance. The stack (the 32 field array) uses up a lots of registers and the context switch is actually pretty slow. Although this is pure guessing an I have no proof what so ever. 


I tried to squeeze in as much info without bloating the post, so thanks to all those who have read up to this line biggrin.png

I'm very interested in your explanations as well.


Edit: I should have told you how the tree is represented in memmory...

It's a sparse tree representation with a texture holding the 2 children for the node N at the position N

The same with the parents, the parent of Node N is found by accessing the texture at position N

glsl syntax highlight and auto completion

16 June 2014 - 01:22 PM

Yes that's right the never ending story...


I've gone through a lot now tested different editors, tools, plugins and there's always something not the way I wanted. I think most people here know what I mean.

So recently I began searching for another solution (again). I came across a pretty nice solution in a forum post from like ages.


The main idea is to let your c-compiler think that .glsl files are to be parsed as header files. Not overwhelmingly new so far. But then you can go ahead an write another include file that defines all the glsl names and symbols and you are basically done, fore the c-compiler does the rest.


I took a liking to it and spend a night long crawling the glsl reference pages copy pasting function definitions and so on. Ill gladly share the result with you: https://github.com/Wh0p/Wh0psGarbageDump

You are free to test and improve this yourself! (However, I am totally new to all this git stuff and might need some time to figure this out)


Just have a look at the example on the bottom, how it looks in Visual Studio...


Still there are some drawbacks, you have to write a preprocessor that resolves or stripps the "#include" directives from the .glsl file.

The syntax for uniform buffer is somewhat broke.


Sooo, tell me if you like/hate/ignore it, or even have a better solution to this. Personally I think I have found a solution I can be happy with (for the time being).









image1D as function parameter in glsl

14 June 2014 - 05:38 AM

Hi like the topic suggests I am trying to find a way to pass an image1D (2D or whatever) in a parameterlist of a glsl function like:

vec4 myImageLoad (in image1D tex, int pos) {...}

This is simply for convenience when i need to do some more calcularion on the texel value.


However i did not get my driver (AMD HD6950) to compile this, the error would be "missing or invalid layout qualifier"

when I use a layout qualifier like this:

vec4 myImageLoad (layout (rgba32f) in image1D paramtex, int light) {...}

The compiler suggests: "parse error on 'layout'".


Since the glwiki told me this would be possible, but didnt give a syntax example (http://www.opengl.org/wiki/Image_Load_Store#Image_variables), one of you guys might give me a hint.


image layouts in openGL compute shader

02 May 2014 - 03:49 AM

Hi, in short I am trying to access an integer texture (GL_RGBA32I) within a compute shader.


This is how I declare the uniform:

layout (rgba32i) uniform image1D some_texture;

and my glsl compiler tells me the following error for this line of code:



error C1318: can't apply layout(rgba32i) to image type "image1D"


since the shader doesnt even compile I have no clue what to do to fix this (apparently all the int and uint formats won't compile). I do have a context set up with OpenGL 4.4 

floating point formats work just fine (I've been working with them for quite a while now, but i'd like to store cluster information in 32 bit ints now), the *i and *ui formats are troublesome.

Is it, that I missed something in the specs, but that would be a huge drawback, if there are no integer formats in computeshader?


I appreciate your help,


glBlendFunc vs glBlendFunci

25 February 2014 - 10:43 AM

Hello, right now im porting my engine from DX to OpenGL and I have encountered some difficulties with glBlendfunc.


Primarily I have 2 questions



From the docs:



glBlendFunc defines the operation of blending for all draw buffers when it is enabled. glBlendFunci defines the operation of blending for a single draw buffer specified by buf when enabled for that draw buffer.


which of the two calls is dominating the other one? Or how will my rendertarget in slot 0 be blended if I set glBlendFunc (GL_ZERO, GL_ONE) AND glBlendFunci (0, GL_SRC_ALPHA, GL_ONE_MINUS_SRC_ALPHA)? Is this behaviour well defined?



I have tested a simple scene using blending. Configuring the pipeline state with glBlendFunci instead of glBlendFunc causes serious performace issues dropping fps from >250 to 0.5. I find it hard to explain, since the funcionality should be supported (I'm running a 3.3 core context on a GeForce 9800 GT). I'd like to know if someone here knows any hints regarding this.


Thanks in advance!