Jump to content

  • Log In with Google      Sign In   
  • Create Account






Lens Flares: We're in business [demo included]

Posted by Bacterius, 06 October 2013 · 2,530 views

Hello everyone,
it's been a long time since my last update on lens flare rendering. Too long, arguably, since most of the work I am about to unveil in this journal entry was completed in the past two weeks or so, once everything clicked into place. First of all, here is the demo (File->Download), which requires Windows along with a DX11 capable graphics card, as the demo makes extensive use of the DX11 implementation of the Fast Fourier Transform (ID3DX11FFT). It comes bundled with the latest build of SlimDX straight from the SVN repository which you will need to run the demo (you cannot use the Jan 2012 runtime as it has a critical bug which renders the FFT implementation unusable, bug which was fixed in late 2012). It also comes with a set of hand-drawn apertures which you can play around with. Please let me know if the program does not work for you when it should so I can fix it.

How to use the demo

The tech demo is fairly self-contained and simple to use. Upon starting the program, you'll be asked to select an aperture you want to use, pick whichever one you prefer in the apertures folder. At this point the aperture has been preprocessed is ready to use. You are given four possible views (types of display):

Aperture Transmission Function: simply displays the aperture you are using, and formally represents the amount of light transmitted through each point of the aperture (white = all light passes through, black = light is blocked). The Load Aperture button lets you choose another aperture (other settings will be maintained).

Aperture Convolution Filter: shows the "convolution filter" which essentially shows the distribution of diffracted light around a central beam of light, for most apertures most light diffracts near the center, with most of the light remaining unperturbed exactly at the central pixel (this display is tonemapped and the brightness scale is not linear). This is an RGB image, with each channel representing a different diffraction distribution. Red tends to be diffracted farther away than green or blue as it has a larger wavelength.

Original Frame Animation: shows a simple synthetic scene, which is what you would observe if light did not diffract (note I did not bother with anti-aliasing).

Convolved Frame: shows the same scene as above, but with diffraction effects added in.

There are a few configuration options:

Observation plane distance: this represents, in some sense, the distance between the aperture and the sensor which collects the diffracted light, on an inverse scale, so the smaller it is the longer diffracted waves travel, and so the larger the lens flare appears (up to some limit).

Exposure level: this is self-explanatory and controls the exposure setting of the tonemapped displays (extreme values may lead to unrealistic and/or glitchy results).

Animation speed: controls the speed at which the synthetic scene plays out at (it can be set to zero to pause all movement).

Animation Selection: lets you select among a few (hardcoded) scenes.

The aperture definition settings are a bit more involved, but basically let you select at which wavelengths to sample the diffraction distribution of the aperture (other wavelengths are interpolated) and also let you associate a custom color to each wavelength if you so desire. The default settings are fairly close to reality, but are not perfectly calibrated. Right click the list to play around with these settings.

The demo should run at 60 fps on most mainstream cards, and it may be a bit slow for those of you with slower cards, but it should hopefully still be interactive. Let me know if it is unacceptably slow for you, as the bottleneck is believed to be in the FFT convolution stage which is essentially compute-bound, so I'd be interested to know where to focus optimization efforts with some hard statistics.

The code is not yet mature enough to be released, and there are still a few bugs in my implementation (in particular, a nasty graphical corruption of the central horizontal and vertical line of the diffraction distribution at high exposures which probably comes from a subtle off-by-one bug in one of the shaders) but overall the algorithm is rather robust. The cornerstone of the approach is of course the convolution step which uses the FFT and the Convolution Theorem to great effect to efficiently convolve the diffraction distribution with the image in order to achieve very convincing occlusion and diffraction effects.

Posted Image


For those of you who cannot run the demo, I have also compiled a short video to illustrate:


Closing notes

Note that this algorithm almost certainly exists in high profile renderers - I haven't checked but while I essentially derived the theory and implementation on my own, I am confident that this is already implemented in one form or another somewhere - and is not quite ready for video games yet in its current form. The preprocessing step done for each aperture is fine and while I opted for accuracy here, it can be significantly accelerated and apertures can feasibly be dynamically updated every frame, but the real killer is the convolution step which involves at least 6 large Fourier Transforms of dimensions at least equal to the dimensions of the target image + the dimensions of the aperture (minus 1). To give you an order of magnitude, we're looking at 2500 × 2500 transforms for a 1080p game with a 512 × 512 aperture, which is not happening today (but will tomorrow). So hacks are required to approximate the convolution, such as heavily blurring the diffraction distribution and pasting it on top of prominent light sources in the player's field of view, which is good enough for games (and happens to be a near perfect approximation for unoccluded spherical light sources).

One note on the Fourier Transform dimensions. Because all general purpose FFT algorithms are extremely sensitive to the size of their inputs, some dimensions work better than others (for instance, power of two dimensions are the fastest). Fortunately, the convolution step can accept dimensions larger than the minimum required without any loss of accuracy, which gives us some leeway in choosing transform dimensions which will give good performance. For instance, in the case of my demo, both the aperture and image were 600 pixels by 600 pixels, giving a minimum convolution transform dimension of (600 + 600 - 1) = 1199 pixels by 1199 pixels. However, such a transform is not efficient (and gave me around 12 fps). But by simply padding the transform to 1280 pixels by 1280 pixels, framerate shot up to a steady 60 fps. In practice, you want to select dimensions with many small prime factors, such as 2 or 3. As you can see 1199 = 11 × 109 while 1280 = 28 × 5.

If you are wondering what happens if you use smaller dimensions than the minimum required by the convolution, the answer is that because what we are doing is essentially a circular convolution, and that the Fourier Transform is periodic, the convolution would "wrap around" from the left edge to the right edge and from the top edge to the bottom edge and vice versa. This is not what we want. So if you can guarantee that no lens flare will ever come close enough to the edges to bleed offscreen, you can get away with a smaller convolution, but this is of course very situational.




Very nice. Congrats. Runs fine on my Geforce 560 by the way.

Nice demo! I implemented FFT-based lens flares for our engine at work about 2 years ago, and it's worked out pretty well for us. I made a tool to let the artists stack together multiple kernels that can be offset and scaled in screenspace, so I ended up having to settle for doing the convolutions at a fairly low resolution (512x512) to keep everything within a millisecond or so. Consequently your implementation looks quite a bit better!

Thanks guys! MJP, if I may ask, what FFT algorithm/implementation did you use for the convolution step in your engine? Also, I just checked with the reference device for the graphical artifact I had and it doesn't show, so I am thinking it's either due to floating point inaccuracy or a driver bug (do you have it? it looks like pseudorandom noise along the axis of the light source at very high exposure levels).

I based mine off the FFT Ocean sample from the Nvidia SDK. I don't see any artifacts on my PC, which has an AMD 7950. I'll try it on my work PC tomorrow, which has an Nvidia GTX 670.

Cheers MJP, thanks a bunch - just for reproducibility, this is the glitch I'm seeing on my HD 6950: http://i.imgur.com/ydo13ik.png

No glitch on my NVidia either (GTX 560 Ti). Hmmm, there's this IEEE strictness flag for the shader compiler. Does this affect float accuracy at all ? Can't find about the current cards compliance on wikipedia (and haven't played in this regard either). What about ID3DXFFT ? Anyway, if the reference is fine you can probably blame the driver.

 


...I based mine off the FFT Ocean sample...

 

Interesting. There still doesn't seem to be another public FFT for DX11 compute shaders available then (I have searched around earlier this year without success). Pity,  ID3DXFFT is quite nice, but very black box. If one had source one could e.g. omit the manual copy between textures and raw buffers among other stuff. Also, the one from NVidia doesn't use shared mem at all. It's about 5 times slower than ID3DXFFT if I can trust my profiling (but it's also cs_4_0).

Everything seems to be fine on my GTX 670.
 

No glitch on my NVidia either (GTX 560 Ti). Hmmm, there's this IEEE strictness flag for the shader compiler. Does this affect float accuracy at all ? Can't find about the current cards compliance on wikipedia (and haven't played in this regard either). What about ID3DXFFT ? Anyway, if the reference is fine you can probably blame the driver.

 

 


...I based mine off the FFT Ocean sample...

 

Interesting. There still doesn't seem to be another public FFT for DX11 compute shaders available then (I have searched around earlier this year without success). Pity,  ID3DXFFT is quite nice, but very black box. If one had source one could e.g. omit the manual copy between textures and raw buffers among other stuff. Also, the one from NVidia doesn't use shared mem at all. It's about 5 times slower than ID3DXFFT if I can trust my profiling (but it's also cs_4_0).

 

Yeah I had to make a few changes to make it fast enough for our purposes. IIRC correctly it ran quite a bit faster just by compiling the shaders as cs_5_0 instead of cs_4_0.

Hmm, thanks guys, I guess it really is my graphics card screwing up (it wouldn't be the first time). I'll try the IEEE strictness flag this evening and if it doesn't work I'll just.. pretend the artifacts are not there tongue.png

 

As for ID3DXFFT, I agree it is opaque and somewhat of a pain to use, and also imposes some tedious requirements on the user due to the use of raw buffers, though using existing implementations probably beats writing your own compute-based FFT (especially if you want to allow arbitrary dimensions, which is.. tricky to say the least). I wish there were more implementations of the FFT for the graphics card given the importance of the algorithm but here you have it.. things could be worse though, I am quite happy with ID3DXFFT at the moment.

Are the artifacts only on ATI cards? ATI and NVIDIA seem to have completely different software, such that things can work well on one but poorly on the other. It wouldn't be the first time.

@Shane I don't think so, since MJP also tested on his 7950 without issues. Ok, the IEEEStrictness flag did absolutely nothing and changing the optimization level did not help, so I'm chalking this one up to the driver or hardware. In any case I am very relieved it is not a problem with the code.

Doesn't run on my radeon 6000 series

Doesn't run on my radeon 6000 series

 

Sorry to hear that, does it say anything?

I get the same artifact on my 6950.

Looks fine on Radeon HD 7770.

 

Usually when a GPU fails (like the 6000 being reported) it's either a HW error (rare), a limitation on the number or size of UAVs, or plain simply a race condition in your code.

Looks fine on Radeon HD 7770.

 

Usually when a GPU fails (like the 6000 being reported) it's either a HW error (rare), a limitation on the number or size of UAVs, or plain simply a race condition in your code.

 

That is quite possible. I am probably doing some very bad things to the GPU with this program, it's pretty much the first time I used raw buffers in DirectX so it's likely I'm doing something wrong at some point that just happens to sort-of kind-of work. I'm working on an improved version at the moment so hopefully I'll get that sorted out and it works properly everywhere! :)

Recent Comments

Latest Visitors

PARTNERS