Jump to content

  • Log In with Google      Sign In   
  • Create Account


Member Since 11 Apr 2005
Offline Last Active Apr 18 2016 02:32 PM

Topics I've Started

Doing SubSurfaceScattering / backlight

12 March 2016 - 04:53 PM

So we want to use SSS for materials like wax, candles, snow, jelly, but also thin objects like plant-leafs or curtains. And of course, skin.
Papers with lots of formula's are too hard for me, plus I don't have time to read them all in detail. But from what I understand, SSS in general boils down to:
* Blurring radiance maps (diffuse light)
** To get less harsh shadows,
** Either in screen- or texture space
** Result = blurred texture + specular
* And for backlighting, render an object inverted (reverse culling), and shade the backsides
** Compare light shadowmaps with eye-depthMap to get the theoretical distance a ray travels through an object
I got it... globally at least. But I miss/don't understand a couple of things, and in terms of actually implementing, especially backlight seems really expensive. In my case only few objects actually need SSS, and many of the lights don't have shadowmaps are or baked into a static lightmap / probes. Well, to the questions:
* When/where/how to blur?
I started gauss-blurring the diffuse-parts of the screen wherever there are "SSS pixels". The blur-radius depends on camera distance, which increases as we get closer. Then the final composition simply does
result := blurredDiffuse + specular(not blurred!).
Which obviously... sucks. It looks, well, blurry. Softer yes, but also less detailed. I could just as well blur the albedoMap, or soften the normals. 
Shouldn't we involve the normals / viewangles? To decide the blur-Radius, and/or to mix between the blurred & original(harsh) diffuseMap? Looking at Unreal where I just made a very basic SSS material, it doesn't seem to do any blurring (or so subtle that I can't see it). Instead, it looks more like some sort of partial view-angle dependant ambient term: unshaded parts will get greenish if I set the "SSS teint" to green. This generates a nice transition from litten to SSS-color (instead of black) at the edges. In code terms, something like this:
lightInfluence= saturate( dot( normal, lightVector ) );
lightDiffuse = lightInfluence * lightColor;

// Check if we are looking straight at the light
lookingAtSrc = saturate( dot( eyeToLightVector, -lightVector ) );
// Wrap it with some magic numbers
lookingAtSrc = 0.25 + lookingAtSrc * 0.75;
// Not sure if this really happens, but it seems translucency reduces when the light is on it
// ...or the light is just so much stronger that the SSS contribution becomes hardly visible
SSSresult = (translucentColorRGB * lookingAtSrc) * (1 - lightInfluence);

// Compose result
Result  = lightDiffuse + SSSresult;

But this is just some fake magic. My guts say Unreal is doing it more sophisticated.

* Color teints / Layers
Reading about skin, there seems to be a shift towards red, as that wavelength penetrates further. One way to fake that is to multiply the blurred diffuse with a (per material parameter) color. But again, when to do so? What are the parameters or values that tell to use more/less teint and/or blur?
* Fake backlight
Ears, finger tops, sides and nostrils are typically more reddish - because light penetrates here. But are games really calculating backlight / compare shadowmaps to see where rays travel through? Because its A: expensive, and B: doesn't involve ambient/indirect light. It may work if you have
the sun as 1 dominant light, but indoor scenes with lots of non-shadowcaster weak lights would be a nightmare.
Or do they simply do some sort of cheap Sheen effect? Or use an artist-made "thickness-texture", to mix between the original and teinted/blurred diffuse textures? I guess a mixture of both, but just checking.
* Better backlight
For thin materials like curtains or plant-leafs, backlight is extra important. I've seen how they did it cheap in Crysis(1) vegetation, but again, it didn't involve indirect light really. In my case an important portion of light is baked into lightmaps and probes.
How about:
1- Render translucent stuff inverted (backfaces)
2- Apply lighting on it, as usual, but diffuse-only (and you can use pre-calculated probes or lightmaps as well of course)
3- Write Z values of inverted geometry in a buffer, used for thickness calculations later on
Now you could simply add that the back-results to the front. But I guess its too bright, and also too sharp, as if it was glass. Again we need some blur here; the thicker the object, the more blur. Till a point light just can't penetrate.
4- Add back-result to the screen-diffuseBuffer (see first question) BEFORE BLURRING IT
5- Addition-strength depends on thickness comparison with front-depth. Thicker = less addition of the back / more internal scattering
6- Blur the buffer, apply your teint-colors
7- Run your final shaders to compose the end-result ( (translucent)blurredDiffuse + specular), as asked above
I think it might work, but rendering & lighting everything twice in a dense jungle... Smarter tricks here?

ComputeShader Performance / Crashes

27 January 2016 - 03:04 PM

Made a (looong) GLSL ComputeShader for Tiled-Deferred rendering. On my laptop with a 2013 nVidia graphics card, it works fine. But now I'm sending the program to some other guys. And as you know, that's always where the headaches start smile.png Can't debug or whatsoever, only guess. I need your experience or guessing-powers to give me some directions!



Guy1 had a nVidia card. Not too old, certainly not new either. Video card driver hanged / crashed when running this particular ComputeShader. Disabling loops "fixed" it:

requires OpenGL 430
#define TILE_SIZE 32
layout (local_size_x = TILE_SIZE, local_size_y = TILE_SIZE) in;
shared uint _indxLightsPoint[ MAXLIST_LIGHTS_POINT ];   // Found PointLights, indexes to UBO lightArray
// 1. Let each pixel inside a tile check ONE pointlight, see if it intersects tile-frustum. Ifso, add to a shared list
uint thrID = gl_LocalInvocationID.x + gl_LocalInvocationID.y * TILE_SIZE;  // Each tasks gets a number (0,1,2, ... 1023)
if ( thrID < counts1.x ) { // "count1" comes from a UBO parameter. Would be "2", if there were 2 active lights in the scene
   if ( pointLightIntersects( tileFrustum, lightPoint[ thrID ].posRange ) ) {
      // Add lightIndex to list
      uint index = atomicAdd( _cntLightsPoint, 1 );
      if ( index < MAXLIST_LIGHTS_POINT ) 
           _indxLightsPoint[ index ] = thrID;
// 2. Loop through the lights we found
for (uint i=0; i < _cntLightsPoint; i++) {
    uint index = _indxLightsPoint[i];
    addPointLight( brdf, surf, lightPoint[index] );
} // for

Compiles, starts, hangs the video-driver. If I simplify all this code to a fixed " addPointLight( ... lightPoint[ 0 ] )", it works. And a damn lot faster as well (even though I only had 1 or 2 lights in the scene anyway). If I re-enable "barrier" or some of the atomic operations, the FPS crumbles again. My first thought was that the "FOR LOOP" went crazy, counting to an extreme high number. But even if I put a hard-coded number here, it still crashes. The other suspect might be an out-of-range array read, but I can't see how.


Could it be that "older" cards (2010..2012) have issues with (GLSL) Barriers or Atomic operations? Or maybe the hard-coded Tilesize (32x32) is too big? Although I would expect a compiler crash in that case.


Guy1 now has a new AMD card. But it seems it doesn't support some OpenGL 4.5.0 features (though all shaders use 430). Got stranded after that.




Guy2 had a 2011 nVidia card, don't know what exactly. Everything works, but graphics seem more blurry (anisotropic / mipmapping settings?). Moroever, framerate is horrible. Mines is ~50 .. 60 FPS at a larger resolution, his is 5. I expected a drop, but not that much. As usual there could be a billio things wrong, but my main suspects are:


- ComputeShader setup (tilesize 32x32 too big)

- ComputeShader operations (atomicAdd / atomicMin / atomicMax / FOR LOOP / Barrier )

- I assume 24+ texture units are available ( ie "layout(binding=20) uniform sampler2D gBufferXYZ;" ). I know older cards only have 16 or so. But again I would expect a crash then.

- Not using glMemoryBarrier( GL_ALL_BARRIER_BITS ); (properly), prior or after calling the CS



My guts say to replace the ComputeShader with good old Fragment shaders and such. Then again it just works well on my own computer. And since its quite a job to change, it would suck if something very different turns out to be the party-crasher.



Many (IBL) CubeMaps / Bindless Textures

14 September 2015 - 01:08 PM

So I want Image Based Lighting. Reflections / GI is baked into cubemap textures (preferably with variable sizes between 32^2 x6 .. 256^2 x6) - with MipMaps btw.


That works, but the real challenge is to apply *the right* cubemap for any given pixel. Because there might be 10, maybe even 100 cubemaps loaded. A compute-shader can be used to sort out which probes are used where. But how to read these cubemaps? I can't bind 100+ textures "the old way".


I could make an texture-array with many layers, but...

- I can't put cubeMaps into an array (or sample them like you would sample/filter in a (mipmapped!) cubeMap), do I?

- Is the size gonna be a problem (its DDS compressed, yet I think the overall size of all probes together would consume 50 .. 75 MB)



Now I've read things about "Bindless Textures", though I haven't been able to succesfully implement it yet (weird errors, yet the nVidia demo's work, card supports GL 4.5). A few questions about that. If I understand it right, I'll:

- make my cubemaps as usual. MipMapped, eventually with different resolutions

- I put all their addressess into an array of some sort, and push that into my "lighting shader"

- In the (compute!) shader, I get the right texture via the address, and then just read


Would this be the solution?




One thing I don't quite get, is making textures "resident". I suppose any texture we're about to use, should be resident. But can I just make all my cubemaps resident just once, and leave it that way? Or do I have to "un- and re-resident" them each cycle? Also, is there a limit or penalty when making (too) many textures resident? As said, all probes together consume quite some megabytes.



PBR Specular reflections, next question(s)

13 August 2015 - 05:18 PM

Where to start... with the help of you guys on my earlier "PBR / Specular reflectance" questions here.
Probably I never fully understand as I slept too much during math/physics classes a billion years ago, but I made some steps nevertheless. Instead of just summing up good old Lambert & Blinn, I looked into Cook Torrance.
I find it hard to verify if I'm doing it correct. Every implementation I see is different, and being stuck in the past with simple Phong or Blinn, the results are quite different anyway. Yeah a pointLight generates a wide specular reflection on rough surfaces, and a "sharp" highlight on smooth surfaces. But overall, 4 issues are bugging me mainly:
1- Cook Torrance producing negative or values higher than 100%
Mainly the "Distrubution" term in the formula is confusing me. For example, I'm currently using this:
float roughness2= roughness * roughness;
float roughness4= roughness2 * roughness2;
float denom = NdotH * NdotH * ( roughness4 - 1.f) + 1.f;
float Distr = brdf.roughness4 / (3.14159f * denom * denom);
From the example here (the first non-optimized version):
The "Distr" result can result in a very high value at a glancing angle + lower roughness. And yes, NdotH is clamped between 0 and 1 (so are all dot products I use). Also found other ways, but I'm not sure if that falls correctly in the formula as a whole.
Maybe I should ask it differently: is there a nice demo program somewhere with HLSL or GLSL code included, so I have a good reference? I know, plenty of papers and samples out there, yet I didn't really find a working program so far.
2- Fresnel & Reflections for highly (non metal) reflective, polished materials
Got some bathroom tiles and a polished wooden floor, which should be pretty reflective. Being a non-metal, they have a F0 value of 0.03. The (Schlick) Fresnel formula produces near black values, except at glancing angles. 
As expected. But this means you'll never have a reflection in front of you. I guess this is sort of correct for most cases, but I really have to get down on the floor to see the window getting reflected on the floor, or take a big distance. And then the reflections usually rapidly get way too bright for my taste.
Yet in reality, I do see my ugly mug when looking straight forward at some bathroom tiles... (though vague / dark). Decreasing the roughness will produce "more" reflection. I like my materials not being too shiny/reflective, but a high roughness practically gives no reflections at all.
Again, I guess this is correct, but I miss the degree of control, and the more important it is to have the math correct. Right now the reflections are either gone, or too sharp & colorful (in the past I could control the color / saturation of the reflected color as well via material-parameters). I'm only encoding a "metal y/n" and "roughness" value into my G-Buffers btw. When looking at my oak floor here, the reflections are somewhat vague, and dark. The TV for example gets reflected, but the color seems "saturated". But in my program, reflections appear at full color when the angle is steep enough. Which doesn't look good.
3- IBL lighting (Diffuse / Specular ambient)
Examples usually only show how to do a pointlight or something. But I also have IBL "Probes" (cubeMaps) for Specular reflections and Diffuse (irradiance cubeMap).
Should I just use the same CookTorrance / Lambert formula's? Normally I would feed my "getLight" function with vectors such as NdotL, NdotV, HalfAngle, et cetera. But in the case of probes, what should those vectors be, as the light doesn't come from 1 specific point-in-space here. Or should I just simplify and do it like this:
iblSpecular = probeSpecular( reflVector, lodBasedOnRougness ) * Fresnel( NdotV )
iblDiffuse  = probeDiffuse( normal ) * materialDiffuseColor;

result = iblSpecular + (1-F0) * iblDiffuse;

But that wouldn't take the roughness into account to control the amount of reflection. Note the specularProbe LOD level is still based on the rougness though.
4- Energy conservation
Right now I do:
result = cookTorranceSpecular + (1 - F0) * lambertDiffuse * materialDiffuse
Where "materialDiffuse" is black for metals (thus no diffuse light at all... is that correct???), and F0 is the *input* for the Fresnel formula (IOR converted). Thus 0.03 for non metals, a relative high (RGB) value for metals.
But... as for metals, don't I miss some degree of diffuse, since "materialDiffuse" is black for them? And for non-metals, I can still end up with very bright results if the specularpart was near 100% (at a glancing angle). Since F0 is "always" around 0.03, I could get
F0 = 0.03
ONE_MIN_F0 = 97%

specular = ~100% (smooth surface, glancing angle)
diffuse  = 100% (light shining straight on it)

result = specular + 97% * diffuse = 197%
- edit -
Bad example. If the light shines straight on it, the Fresnel outcome would be low... see attached pic for a better example please.
Oh btw, if you guys would like to see the shader code, just call. And as always, Sorry for the long post!

Implementing (Lua) Script efficiently

26 June 2015 - 01:08 PM

I'm at the point of adding Lua script support in my engine. To override object events (onCollision, onUpdate, onWhatever), trigger events, UI interaction, and so on. About half of these scripts are only called "sometimes", when needed, yet other events may occur relative often, which makes me worrying about the performance a bit.



I did something similar a long time ago with Python, and then chose to collect all code pieces first, dump them in 1 large file, pre-compile it, get all functions from the script and register all my "API" (stuff the script can call from my engine) functions. So the whole pre-compilation and setup only had to be done once.


Grabbing all code from all possible entities, and then (re)naming it to ensure each entity class had its own unique functions was quite a task though. So before going that route again, I wonder if its worth it. Plus I'm new to Lua anyway.



So let's say if we have 100 different entities (monster, shotgun, medkit, door, lightswitch, ...), each with a bunch of functions (onClick, onHit, onUpdate, onXYZ, ...). What would be the best way to deal with these scripts ->


A: Just call & compile the individual functions on the fly when they're needed

B: Give each entity its own script (and to the registering for each script again)

C: Collect all code, make 1 huge file (with a lot functions inside!), register once

D: Something different? 


My feeling says B is the clean & comfy way... but maybe not the most efficient way, although if it only means a slight longer initial loading time... I'm fine with that.