The "min" approach seems legit to me. Multiplying the lambert lighting term with the shadow scale will just dim the shadowed areas. I think of shadowed areas as areas, where only the ambient term effects the shading of the lightsource and diffuse and speculare are zero.
Having not solved your problem with this I only may recommend you this article from nvidia for edgeblurred soft shadows:
// Fix-up for (1st-3rd) XMVECTOR parameters that are pass-in-register for x86, ARM, and Xbox 360; by reference otherwise
#if ( defined(_M_IX86) || defined(_M_ARM) || defined(_XM_VMX128_INTRINSICS_) ) && !defined(_XM_NO_INTRINSICS_)
typedef const XMVECTOR FXMVECTOR;
typedef const XMVECTOR& FXMVECTOR;
So in most cases FXMVECTOR is a reference to the XMVECTOR, and all funcions I know of are taking FXMVECTORs as arguments.
I do not want to start a discussion over the difference on pointers and references, but since we are talking about function arguments here, think of a reference as a pointer that can't be assigned NULL, which offers some nice advantages over pointers.
I don't know if I understood you correctly, but it sounds like you are performing the light culling on the CPU?
If yes, don't do that anymore since you can use compute shader. Doing the culling on the GPU is a LOT faster, especially for hight lightcounts.
In a compute shader you could compute for each light an AABB and then project it into screenspace resulting in one culling rectangle for each lightsource.
After that you have to build the tile indices for each light. There are several possibilities to do this, the easiest one would be a compute shader with one thread per tile, that iterates over all cullingrects and performs the test. If the test is positive save the lights index in a texture buffer.
As I alrady said, I am using a LBVH to cull the lightsources, that is a huge overhead but still can be done in 4-5 milliseconds (including BVH construction and traversal). I will outline in short what I am doing every frame (all on the GPU):
1) Render to the GBuffer
2) Calculate the min/max for each tile
3) Calculate the AABBs in view space for each light (like above)
4) Construct a LBVH with the lights AABBs (this will need to assign Morton Codes to each AABB, sort the AABBs repecivly and then apply a fully parallel tree construction algorithm)
5) Traverse the tree to calculate the number of lights present in each tile
6) Resize the tile index buffer and calculate offsets, where the light indices of each tile start
7) Traverse the tree again and save the lights indices to tile index texture buffer
8) Render the lights utilising the GBuffer tile index texture (pretty much like you are doing)
This is a whole other level than the linear probing, what you are currently doing, but this way you are able to aplly lighting for numbers of lightsources in the thens of thousands (don't be discouraged ).
Actually I have written my bachelor thesis on this topic not so long ago, you are welcome to read it (but it's written in german, so there might be a lot of google translate involved)
Some resources for LBVH construction and traversal:
So this might be super obvious, but since there are no indications in the code above: Do you call glUseProgram(0) before doing immediate rendering (and all the other stuff that needs to be disabled like VAOs...)? I had no real problem getting it to work in a simple demo on a AMD HD6950. From my experience Nvidia cards are more forgiving, when comes to stuff like this.
I am sorry, but I only roughly scanned through your shader code.
First thing I figured, is that synchronizing threads is usually very costly. And in your case 1 thread in each tile is copying the light indices struct of a tile and all the other ones are idle. ---> Let every thread copy a portion of the memory.
1 pointlight in tilebased is very close to the worst case. Tilebased rendering is all about clipping lightsources and therefore introduces overhead for calculating lightindices in each tile. Again when you synch your threads, the whole workgroup is waiting just for 1 thread to copy the indices of 1 point light, not what you ideally desire.
Even further get rid of the indirection in you while loop. Use shared memory to hold the LightInfo not the indices to the light info, that's much more cache efficient.
Then again the way, you iterate over the tile's indices is kind of ineffctive regarding branch divergence. Maybe it helps to seperate the point and spottlights and not brach in the loop. Also terminating the loop with a check to the lightcount of a tile and not by its next light index might help.
All this is very vague I guess, but not knowing the whole pipeline this is it, what I can give to you.
How's the performance, when adding a few thousand more lightsources, did you profile the computation time of each step in the algoritm? My implementation (clustered deferred) goes with 40-50 thousand pointlights at smooth 60 fps (on a GTX 770). It is in OpenGL but if you want I can supply you with some code (also I'm using a LBVH to do the clip test).
I had a similar Problem. Make sure gGetUniformLocation is not -1 (thats what I got when using an array of struct).
Generally i woud suggest to keep away from the array of structs idea and instead use a struct of array, that is a struct like
in many cases this woul be more cache efficient (think about updating on the client).
And then there would be another approach by using a samplerBuffer for each component in your shader, that you can texelFetch your lights from. This would enable you to have a variable count of lightsources.
I states: performs an atomic comparison of data to the contents of mem, writes the minimum value into mem and returns the original contents of mem from before the comparison occured
The second part is the interesting one in your case. You are passing minDepth into the parameter 'mem'. So AFTER the function returns minDepth has already the minimum assigned. But immediately after that you are assigning minDepth the value it has had BEFORE the function call. I guess this is what breaks the synchronized behaviour.
Why are you assigning the value anyway?
My guess is leave the assignment away and it will work quite fine. (At least my implementation does and it looks pretty much the same in the parts you showed me here)
Hope I could help you there.
Unfortunately I'm terrible at reading posts.... at least i did some explaining of the error
by working normally I mean making a beautiful sphere
When applying a rectangular texture to a sphere you always will suffer certain deformation of the image you are texturing.
I don't know the exact representation of your sphere but I will just guess, that it consists of "stacks and slices" (Like starting with one vertex at the bottom and then creating concentric circles around an axis with first growing and then shrinking radii).
So lets just look at the body of the sphere (excluding the start and end points) the only thing you have to take care of here are the vertices located at the fissure points (the points where the UVs (0, t) and (1, t) meet). Since they have the same position and normal but different UVs you have to save 2 vertices here (like L. Spiro described). Note that the image will be interpolated to fit the sphere (just like the opposite, that is happening when you try to draw a world map onto a rectangular area the image is stretched).
For the top and bottom vertices you might want to save them multiple times so that each face can address its own vertex (you would then have number of slices times vertices with the same position and normal but different UVs for the top or the bottom vertice).
If its for learning just test around and you'll find your way.
I don't see any problem with using texture coordinates the way you described, although it greatly depends on how you do the mapping of your texture coordinates.
When using a spheremap you could assign textcoord for each vertex just like you described, no need for dublicating any vertices.
If you go ahead an try to project a rectangular texture (or other more fancy things) onto the sphere, you have to dublicate the "top" and "bootom" of the sphere, where you have that triangle fan like structure.
I've gone through a lot now tested different editors, tools, plugins and there's always something not the way I wanted. I think most people here know what I mean.
So recently I began searching for another solution (again). I came across a pretty nice solution in a forum post from like ages.
The main idea is to let your c-compiler think that .glsl files are to be parsed as header files. Not overwhelmingly new so far. But then you can go ahead an write another include file that defines all the glsl names and symbols and you are basically done, fore the c-compiler does the rest.
I took a liking to it and spend a night long crawling the glsl reference pages copy pasting function definitions and so on. Ill gladly share the result with you: https://github.com/Wh0p/Wh0psGarbageDump
You are free to test and improve this yourself! (However, I am totally new to all this git stuff and might need some time to figure this out)
Just have a look at the example on the bottom, how it looks in Visual Studio...
Still there are some drawbacks, you have to write a preprocessor that resolves or stripps the "#include" directives from the .glsl file.
The syntax for uniform buffer is somewhat broke.
Sooo, tell me if you like/hate/ignore it, or even have a better solution to this. Personally I think I have found a solution I can be happy with (for the time being).