Jump to content

  • Log In with Google      Sign In   
  • Create Account


Member Since 11 Apr 2005
Offline Last Active Feb 12 2016 01:43 PM

Posts I've Made

In Topic: ComputeShader Performance / Crashes

28 January 2016 - 12:55 PM

Thanks for taking time to wrestle through my code pieces Joe!


>> you forget to do a memory barrier on shared memory as well.

All right. Adding "memoryBarrierShared()" in addition to "barrier()" would do the job (to ensure the index-array is done filling before starting the second half)?


Btw, besides crashes, is it possible that bad/lacking usage of the barrier as suggested can cause such a huge slowdown? Like I said, on my computer all seems fine, another one works as expected as well, but just very slow.



>> because OpenCL was two times faster on Nvidia ans slightly faster on AMD 1-2 years ago

Now that concerns me. Especially because I used OpenCL before, removed it completely from the engine, and swapped it for OpenCL (easier integration, more consistency)...  Doh!


Is it safe to assume that modern/future cards will overcome these performance issues? Otherwise I can turn my Deferred Rendering approach back to an "old" additive style. Anyone experience if Tiled Difference Rendering is that much of a win? And then I'm talking about indoor scenes which have relative much lights, but certainly not hundreds or thousands.


The crappy part is that I'm adapting code to support older cards now, even though I'm far away from a release, so maybe I shouldn't put too much energy on that and bet on future hardware.



>> Unroll

I suppose that can't happen if the size isn't hardcoded (counts.x comes from an outside (CPU) variable)?



Well, let's try the shared-barrier, different workgroup size, and avoiding unrolling. And see if these video-cards start smiling... But I'm afraid not hehe.

In Topic: Many (IBL) CubeMaps / Bindless Textures

15 September 2015 - 04:15 AM

That doesn't sound crazy at all. Let's see if I get it straight:


It works a bit the same as ("old") Deferred Lighting, where light volumes (spheres/cones/...) were rendered into the scene, only affecting geometry it intersects. I'll use a "roughness" G-Buffer produced earlier to "blur" properly (taking multiple samples and/or picking lower mipmaps from the probe).


In the alpha channel we could store the weight (based on distance between pixel & probe centre and its radius; "attenuation"). It may happen 2 or even more probes overlap the same pixels, so we use additive blending to sum up both colors & weights. Finally we grab the produced "Reflection/GI texture", normalize it, and add it to the rest of the scene. Since I'm also using Screen Reflections (I can involve that here as well. Pixels that can make good use of realtime reflections, should use no or a lower weight for pre-baked reflections.




You know what, that sounds a whole lot easier than the tiled approach I had in mind. Only downside is that I'll have to resample the G-Buffer for each probe. Then again there won't be that many (overlapping) usually. And I guess its still a good idea to use a cubeMap array (or bindless textures) so we don't have to render probes one-by-one, switching cubeMap textures in between. But then I could first render the low quality (small-res cubemaps) array, then a high-quality array for example.

In Topic: Many (IBL) CubeMaps / Bindless Textures

15 September 2015 - 01:34 AM

>> Do you have enough cubemaps so that you have a Performance gain with compute shaders and tiled lighting?

Not sure what you mean with this...


The reason why I'm using a Tiled approach for IBL, is that I honestly don't know how else to do it. If I have a certain pixel, I need to know which 1 (or 2!) probes affect it. Looping through all 100 (the actual number might be a bit lower or higher) is not an option obviously. And since I'm doing tiled lighting already anyway, the same shader can just as well collect a list of probes per tile.


A tile has 32x32 or 64x64 pixels in my case, so that's at least 1024 pixels/tasks running in parallel, per tile. Each pixel will test 1 probe. If the sphere (with variable radius) intersects the tile frustum, it gets added to a list.


After that, each pixel loops through the "small", filtered list to do a final test and sampling. Does it work good? No idea, first I''ll need an array of CubeMaps to begin with, which is why I asked here :)



>> 2 arrays

I think I'll take that route. Bindless Textures is still fresh, new, (buggy?) and probably not supported by a whole lot of cards anyway. And if there are no clear advantages for this particular scenario... Easy does it!

In Topic: Many (IBL) CubeMaps / Bindless Textures

14 September 2015 - 01:29 PM

Ok, didn't know about that, thanks!


Can't read that fast. but using "ARB_texture_cube_map_array" requires all cubemaps to use the same size right (or use multiple arrays eventually). It's not a deal-breaker, but... being able to use a bigger cubemap at some special spots with very reflective surfaces would be nice.



Other than that, what benefits (or disadvantages) does the "Bindless Textures" approach bring compared to an array of cubemaps? In my context:

- 50..75 MB of cubemap data

- Just one pass (so I won't be toggling textures between render-calls)

- Compute shader is doing a "Tiled" approach, thus sorting out for each tile which probes may affect it, then apply the probes per pixel.

In Topic: PBR Specular reflections, next question(s)

21 August 2015 - 07:02 AM


Thanks again man, gonna try your advices tonight or tomorrow!




The compression sucks indeed, but I'm afraid I don't have much of a choice with OpenGL (afaik, it only supports DXT1 / 3 / 5). It's either huge files or compressed stuff. Then on the other hand, since most reflections are pretty blurry or distorted due normalMapping, you won't notice that quickly. Maybe like Hodgman said, using a pow instead of a multiplier gives a bit more detail in the (more common) lower color ranges.



@Frenetic Pony

Thanks for the papers. Don't know if its due pointlights, bad input from the HDR probes, faulty normals in the G-Buffer, or remaining bugs in the Cook Torrance code, but indeed my head sometimes explode when taking a look at glancing angles, or giving a look at a badguy pixel.


I probably got stuck in the past with ancient shader-code, but with a pointlight, you mean all light comes from a single (infinite small) point in space right? Thus, feeding your formula with a single position. Which is what I do for omnilights ("point lights" in my dictionary) & spotlights. Didn't read the papers yet, but do they suggest to sample from multiple points, or make adjustments in the formula? I mean, we should be able to using omniLights right?