it's been a long time since my last update on lens flare rendering. Too long, arguably, since most of the work I am about to unveil in this journal entry was completed in the past two weeks or so, once everything clicked into place. First of all, here is the demo (File->Download), which requires Windows along with a DX11 capable graphics card, as the demo makes extensive use of the DX11 implementation of the Fast Fourier Transform (ID3DX11FFT). It comes bundled with the latest build of SlimDX straight from the SVN repository which you will need to run the demo (you cannot use the Jan 2012 runtime as it has a critical bug which renders the FFT implementation unusable, bug which was fixed in late 2012). It also comes with a set of hand-drawn apertures which you can play around with. Please let me know if the program does not work for you when it should so I can fix it.
How to use the demo
The tech demo is fairly self-contained and simple to use. Upon starting the program, you'll be asked to select an aperture you want to use, pick whichever one you prefer in the apertures folder. At this point the aperture has been preprocessed is ready to use. You are given four possible views (types of display):
Aperture Transmission Function: simply displays the aperture you are using, and formally represents the amount of light transmitted through each point of the aperture (white = all light passes through, black = light is blocked). The Load Aperture button lets you choose another aperture (other settings will be maintained).
Aperture Convolution Filter: shows the "convolution filter" which essentially shows the distribution of diffracted light around a central beam of light, for most apertures most light diffracts near the center, with most of the light remaining unperturbed exactly at the central pixel (this display is tonemapped and the brightness scale is not linear). This is an RGB image, with each channel representing a different diffraction distribution. Red tends to be diffracted farther away than green or blue as it has a larger wavelength.
Original Frame Animation: shows a simple synthetic scene, which is what you would observe if light did not diffract (note I did not bother with anti-aliasing).
Convolved Frame: shows the same scene as above, but with diffraction effects added in.
There are a few configuration options:
Observation plane distance: this represents, in some sense, the distance between the aperture and the sensor which collects the diffracted light, on an inverse scale, so the smaller it is the longer diffracted waves travel, and so the larger the lens flare appears (up to some limit).
Exposure level: this is self-explanatory and controls the exposure setting of the tonemapped displays (extreme values may lead to unrealistic and/or glitchy results).
Animation speed: controls the speed at which the synthetic scene plays out at (it can be set to zero to pause all movement).
Animation Selection: lets you select among a few (hardcoded) scenes.
The aperture definition settings are a bit more involved, but basically let you select at which wavelengths to sample the diffraction distribution of the aperture (other wavelengths are interpolated) and also let you associate a custom color to each wavelength if you so desire. The default settings are fairly close to reality, but are not perfectly calibrated. Right click the list to play around with these settings.
The demo should run at 60 fps on most mainstream cards, and it may be a bit slow for those of you with slower cards, but it should hopefully still be interactive. Let me know if it is unacceptably slow for you, as the bottleneck is believed to be in the FFT convolution stage which is essentially compute-bound, so I'd be interested to know where to focus optimization efforts with some hard statistics.
The code is not yet mature enough to be released, and there are still a few bugs in my implementation (in particular, a nasty graphical corruption of the central horizontal and vertical line of the diffraction distribution at high exposures which probably comes from a subtle off-by-one bug in one of the shaders) but overall the algorithm is rather robust. The cornerstone of the approach is of course the convolution step which uses the FFT and the Convolution Theorem to great effect to efficiently convolve the diffraction distribution with the image in order to achieve very convincing occlusion and diffraction effects.
For those of you who cannot run the demo, I have also compiled a short video to illustrate:
[youtube]
Closing notes
Note that this algorithm almost certainly exists in high profile renderers - I haven't checked but while I essentially derived the theory and implementation on my own, I am confident that this is already implemented in one form or another somewhere - and is not quite ready for video games yet in its current form. The preprocessing step done for each aperture is fine and while I opted for accuracy here, it can be significantly accelerated and apertures can feasibly be dynamically updated every frame, but the real killer is the convolution step which involves at least 6 large Fourier Transforms of dimensions at least equal to the dimensions of the target image + the dimensions of the aperture (minus 1). To give you an order of magnitude, we're looking at 2500 x 2500 transforms for a 1080p game with a 512 x 512 aperture, which is not happening today (but will tomorrow). So hacks are required to approximate the convolution, such as heavily blurring the diffraction distribution and pasting it on top of prominent light sources in the player's field of view, which is good enough for games (and happens to be a near perfect approximation for unoccluded spherical light sources).
One note on the Fourier Transform dimensions. Because all general purpose FFT algorithms are extremely sensitive to the size of their inputs, some dimensions work better than others (for instance, power of two dimensions are the fastest). Fortunately, the convolution step can accept dimensions larger than the minimum required without any loss of accuracy, which gives us some leeway in choosing transform dimensions which will give good performance. For instance, in the case of my demo, both the aperture and image were 600 pixels by 600 pixels, giving a minimum convolution transform dimension of (600 + 600 - 1) = 1199 pixels by 1199 pixels. However, such a transform is not efficient (and gave me around 12 fps). But by simply padding the transform to 1280 pixels by 1280 pixels, framerate shot up to a steady 60 fps. In practice, you want to select dimensions with many small prime factors, such as 2 or 3. As you can see 1199 = 11 x 109 while 1280 = 2[sup]8[/sup] x 5.
If you are wondering what happens if you use smaller dimensions than the minimum required by the convolution, the answer is that because what we are doing is essentially a circular convolution, and that the Fourier Transform is periodic, the convolution would "wrap around" from the left edge to the right edge and from the top edge to the bottom edge and vice versa. This is not what we want. So if you can guarantee that no lens flare will ever come close enough to the edges to bleed offscreen, you can get away with a smaller convolution, but this is of course very situational.
Very nice. Congrats. Runs fine on my Geforce 560 by the way.