Jump to content

  • Log In with Google      Sign In   
  • Create Account


Member Since 22 Aug 2011
Offline Last Active Jan 08 2016 08:15 PM

#5201525 scene graph render order

Posted by on 03 January 2015 - 08:12 AM

Nice, that's sufficient for a start. I guess I can be creative on this one - like doing things in parallel.

#5201170 [GLSL] Prevent lights from casting onto shadows

Posted by on 01 January 2015 - 02:19 PM

The "min" approach seems legit to me. Multiplying the lambert lighting term with the shadow scale will just dim the shadowed areas. I think of shadowed areas as areas, where only the ambient term effects the shading of the lightsource and diffuse and speculare are zero.

Having not solved your problem with this I only may recommend you this article from nvidia for edgeblurred soft shadows:



Theres a neat explanation with code and pretty decent results.

#5185105 Two Questions about DirectX 11 (XMVECTOR & Creating Input Layouts)

Posted by on 05 October 2014 - 05:37 AM

Regarding your 1st Question:

from the DirextXMath.h:

// Fix-up for (1st-3rd) XMVECTOR parameters that are pass-in-register for x86, ARM, and Xbox 360; by reference otherwise
#if ( defined(_M_IX86) || defined(_M_ARM) || defined(_XM_VMX128_INTRINSICS_) ) && !defined(_XM_NO_INTRINSICS_)
typedef const XMVECTOR& FXMVECTOR;

So in most cases FXMVECTOR is a reference to the XMVECTOR, and all funcions I know of are taking FXMVECTORs as arguments.

I do not want to start a discussion over the difference on pointers and references, but since we are talking about function arguments here, think of a reference as a pointer that can't be assigned NULL, which offers some nice advantages over pointers.


Edit: Too slow :o

#5180083 Poor Performance with tiled shading , Problem with bounding test

Posted by on 13 September 2014 - 09:57 AM

I don't know if I understood you correctly, but it sounds like you are performing the light culling on the CPU?

If yes, don't do that anymore since you can use compute shader. Doing the culling on the GPU is a LOT faster, especially for hight lightcounts. 

In a compute shader you could compute for each light an AABB and then project it into screenspace resulting in one culling rectangle for each lightsource.

After that you have to build the tile indices for each light. There are several possibilities to do this, the easiest one would be a compute shader with one thread per tile, that iterates over all cullingrects and performs the test. If the test is positive save the lights index in a texture buffer.


As I alrady said, I am using a LBVH to cull the lightsources, that is a huge overhead but still can be done in 4-5 milliseconds (including BVH construction and traversal). I will outline in short what I am doing every frame (all on the GPU):


1) Render to the GBuffer

2) Calculate the min/max for each tile

3) Calculate the AABBs in view space for each light (like above)

4) Construct a LBVH with the lights AABBs (this will need to assign Morton Codes to each AABB, sort the AABBs repecivly and then apply a fully parallel tree construction algorithm)

5) Traverse the tree to calculate the number of lights present in each tile

6) Resize the tile index buffer and calculate offsets, where the light indices of each tile start

7) Traverse the tree again and save the lights indices to tile index texture buffer

8) Render the lights utilising the GBuffer tile index texture (pretty much like you are doing)


This is a whole other level than the linear probing, what you are currently doing, but this way you are able to aplly lighting for  numbers of lightsources in the thens of thousands (don't be discouraged biggrin.png).


Actually I have written my bachelor thesis on this topic not so long ago, you are welcome to read it (but it's written in german, so there might be a lot of google translate involved)


Some resources for LBVH construction and traversal:

http://devblogs.nvidia.com/parallelforall/thinking-parallel-part-iii-tree-construction-gpu/ (you will need a parallel sorting algorithm too, I implemented BitonicSort) The article is about realtime collision detection, but the technique allies to this lighting algorithm as well.

http://jcgt.org/published/0002/01/03/paper.pdf (here the stackless traversal algorithms are described pretty nice)



I have uploaded my source code and thesis here:

(sorry the source is a mess, because I maintained it very reluctantly -- no guarantee what so ever)

Source: https://www.dropbox.com/sh/osccc1ynbgzqa09/AABreixi0dG8NrNJ8daORgfUa?dl=0

Thesis: https://www.dropbox.com/s/xuqsc678fm4ihyq/clustereddeferred.pdf?dl=0


Hopfully I could help you.

#5179672 Problems when moving from Nvidia to ATI card / GPGPU performance comparision

Posted by on 11 September 2014 - 02:08 PM

So this might be super obvious, but since there are no indications in the code above: Do you call glUseProgram(0) before doing immediate rendering (and all the other stuff that needs to be disabled like VAOs...)? I had no real problem getting it to work in a simple demo on a AMD HD6950. From my experience Nvidia cards are more forgiving, when comes to stuff like this.

#5179670 Poor Performance with tiled shading , Problem with bounding test

Posted by on 11 September 2014 - 01:56 PM

I am sorry, but I only roughly scanned through your shader code.


First thing I figured, is that synchronizing threads is usually very costly. And in your case 1 thread in each tile is copying the light indices struct  of a tile and all the other ones are idle. ---> Let every thread copy a portion of the memory.


1 pointlight in tilebased is very close to the worst case. Tilebased rendering is all about clipping lightsources and therefore introduces overhead for calculating lightindices in each tile. Again when you synch your threads, the whole workgroup is waiting just for 1 thread to copy the indices of 1 point light, not what you ideally desire.


Even further get rid of the indirection in you while loop. Use shared memory to hold the LightInfo not the indices to the light info, that's much more cache efficient.


Then again the way, you iterate over the tile's indices is kind of ineffctive regarding branch divergence. Maybe it helps to seperate the point and spottlights and not brach in the loop. Also terminating the loop with a check to the lightcount of a tile and not by its next light index might help.



All this is very vague I guess, but not knowing the whole pipeline this is it, what I can give to you.

How's the performance, when adding a few thousand more lightsources, did you profile the computation time of each step in the algoritm? My implementation (clustered deferred) goes with 40-50 thousand pointlights at smooth 60 fps (on a GTX 770). It is in OpenGL but if you want I can supply you with some code (also I'm using a LBVH to do the clip test).


Hopefully I could give at least some good advice.

#5171366 Camera Creation Question

Posted by on 04 August 2014 - 02:47 AM

D3DXMatrixPerspectiveFovLH(&m_matProj, D3DX_PI/4, fAspect, 1.0f, fFar );

Looks like your near clipping plane is 1.0f, obviously you wouldn't see any object that is closer to the camera than this value.

#5165269 VBO and glIsBuffer

Posted by on 07 July 2014 - 09:05 AM

Before the program enters the entry point of main() global variables (like 'quad' in the example) are allocated on the stack.

VBOQuad quad;
int main () { ... }

I can't say much to the error but have you checked the stack trace of you application when it crashed? This might give a clue what functions are called that lead to the destruction of this object.

#5165212 [GLSL] send array receive struct

Posted by on 07 July 2014 - 03:10 AM

I had a similar Problem. Make sure gGetUniformLocation is not -1 (thats what I got when using an array of struct).

Generally i woud suggest to keep away from the array of structs idea and instead use a struct of array, that is a struct like

struct Lights


  vec3 positions[MAX_LIGHTS];

  vec3 intensity[MAX_LIGHTS];




in many cases this woul be more cache efficient (think about updating on the client).


And then there would be another approach by using a samplerBuffer for each component in your shader, that you can texelFetch your lights from. This would enable you to have a variable count of lightsources.


Hope this could help

#5163037 SOLVED: Render a vector to screen in opengl

Posted by on 26 June 2014 - 10:49 AM

Yes, but make sure you call glBindFrameBuffer(GL_FRAMEBUFFER, 0) befor glGetTexImage (), or else it will fail. Mapping/binding textures that are attached to a bound fbo will fail.

#5163029 SOLVED: Render a vector to screen in opengl

Posted by on 26 June 2014 - 10:18 AM

If you are creating a framebuffer object you will most likly attach a texture (previously generated with glGenTextures ()...) as color rendertarget using the glFramebufferTexture() function.


I dont know if I understood you right but I guess you want to do something like this:

1 Bind the fbo for offscreen rendering

2 Render anything you like

3 Unbind the fbo

4 Retrieve texture data from your rendered image into main memory (you can do that using glGetTexImage() with the texture you have attached as colortarget to your fbo)

5 Modify the texture data in main memory and update the texture on you graphics card (with glTexSubImage2D ())

6 Render the modified texture in a fullscreen quad to your window backbuffer



This will work, although I highly recommend not to do this, and do what ever you want to modify in your texture on the gpu.

If it is just a pixel operation you want to do, then you can do the same as I described above but omit the steps 4 and 5 and do all modifications in the fragment shader stage.


#5163019 SOLVED: Compute shader atomic shared variable problem

Posted by on 26 June 2014 - 09:29 AM

Check the reference on atomicMin () http://www.opengl.org/sdk/docs/man/


I states: performs an atomic comparison of data to the contents of mem, writes the minimum value into mem and returns the original contents of mem from before the comparison occured


The second part is the interesting one in your case. You are passing minDepth into the parameter 'mem'. So AFTER the function returns minDepth has already the minimum assigned. But immediately after that you are assigning minDepth the value it has had BEFORE the function call. I guess this is what breaks the synchronized behaviour.

Why are you assigning the value anyway?


My guess is leave the assignment away and it will work quite fine. (At least my implementation does and it looks pretty much the same in the parts you showed me here)


Hope I could help you there.


Unfortunately I'm terrible at reading posts.... at least i did some explaining of the error :)

#5162778 Shared vertices or not for primitives

Posted by on 25 June 2014 - 09:23 AM

by working normally I mean making a beautiful sphere

When applying a rectangular texture to a sphere you always will suffer certain deformation of the image you are texturing. 


I don't know the exact representation of your sphere but I will just guess, that it consists of "stacks and slices" (Like starting with one vertex at the bottom and then creating concentric circles around an axis with first growing and then shrinking radii).

So lets just look at the body of the sphere (excluding the start and end points) the only thing you have to take care of here are the vertices located at the fissure points (the points where the UVs (0, t) and (1, t) meet). Since they have the same position and normal but different UVs you have to save 2 vertices here (like L. Spiro described). Note that the image will be interpolated to fit the sphere (just like the opposite, that is happening when you try to draw a world map onto a rectangular area the image is stretched).

For the top and bottom vertices you might want to save them multiple times so that each face can address its own vertex (you would then have number of slices times vertices with the same position and normal but different UVs for the top or the bottom vertice).


If its for learning just test around and you'll find your way.

Maybe try to texture an image that looks like this one: http://www.mediavr.com/belmorepark1left.jpg there the textured image would look a little nicer, because the image itself is streched beforehand.


hopefully i am not further confusing you.

#5162583 Shared vertices or not for primitives

Posted by on 24 June 2014 - 11:37 AM

I don't see any problem with using texture coordinates the way you described, although it greatly depends on how you do the mapping of your texture coordinates.

When using a spheremap you could assign textcoord for each vertex just like you described, no need for dublicating any vertices.

If you go ahead an try to project a rectangular texture (or other more fancy things) onto the sphere, you have to dublicate the "top" and "bootom" of the sphere, where you have that triangle fan like structure.

Hope that answers your question.

#5160904 glsl syntax highlight and auto completion

Posted by on 16 June 2014 - 01:22 PM

Yes that's right the never ending story...


I've gone through a lot now tested different editors, tools, plugins and there's always something not the way I wanted. I think most people here know what I mean.

So recently I began searching for another solution (again). I came across a pretty nice solution in a forum post from like ages.


The main idea is to let your c-compiler think that .glsl files are to be parsed as header files. Not overwhelmingly new so far. But then you can go ahead an write another include file that defines all the glsl names and symbols and you are basically done, fore the c-compiler does the rest.


I took a liking to it and spend a night long crawling the glsl reference pages copy pasting function definitions and so on. Ill gladly share the result with you: https://github.com/Wh0p/Wh0psGarbageDump

You are free to test and improve this yourself! (However, I am totally new to all this git stuff and might need some time to figure this out)


Just have a look at the example on the bottom, how it looks in Visual Studio...


Still there are some drawbacks, you have to write a preprocessor that resolves or stripps the "#include" directives from the .glsl file.

The syntax for uniform buffer is somewhat broke.


Sooo, tell me if you like/hate/ignore it, or even have a better solution to this. Personally I think I have found a solution I can be happy with (for the time being).