Argh argh argh as a shader programmer the example code makes me want to claw my eyes out :| Achieving acceptable performance in graphics is all about batching, and CPUs are vastly better at rearranging things like this. For starters, I'd suggest making 'light bins' for your CPU representation; basically small queues of lights keyed to the 'type' (in this case, point/omni, spot, directional and maybe area if you're getting really fancy) in question. This works even better if you make this data distinction at an even higher level, say thinking about things at the absolute highest level in terms of global lists of lights of each type.
From there, you can basically increment counters to allocate spots in the queues, (don't need to sort them either, which is really nice) then just copy light information into the index you received. This has a number of benefits; it can be combined with the shader permutation ideas suggested above very naturally, and swapping between pass-per-light and something like pass-per-n-lights really just boils down to changing some loop conditions and shader constant update parameters. You could theoretically even combine the two in order to cut down on the number of shader permutations-- have a specialized version that handles four lights of the given type in parallel, and then once you've got less than four lights in the queue in question, do them one at a time.
As for updating the queues, the best way to do this, IMHO, is to recompute the set whenever an entity moves or is placed into the world. This is sort of a dumb trick I saw in a CoDBlOps presentation, and in hindsight it's really obvious.You can also do some of the caching tricks I'll touch on at the end of this too.
Re: doing the lighting, you have a few options. You can either compute lighting for each source individually, as has been mentioned, *or* you can try to merge sources into a low-frequency representation, or any combination thereof. As an example, Valve picked the ~2 most important lights and sent them into the shader for analytical evaluation, sticking the rest into the 'ambient cube' used for precomputed lighting. Computationally, this works out to doing ~6 dot products (one for each face of the cube) on the CPU, multiplying by light color, and then adding the result to the accumulated cube face contribution. Toss that into a shader and index, and voila-- very close to constant-time lighting in practical performance terms.
Epic does something slightly different with current-generation UE3. Unlike Valve, they project *all* lighting information into a low-frequency basis, this time using spherical harmonics. From there, they use some Stupid SH Tricks (EDIT: GDNet keeps eating my link; that should go to www.ppsloan.org/publications/StupidSH36.pdf)[font="arial, sans-serif"][color="#009933"] [/font]to pull out n of the most dominant directional lights for analytical evaluation and take the mostest importantest and feed that into their modulative shadow system, as described
here.
clb: At the end of 2012, the positions of jupiter, saturn, mercury, and deimos are aligned so as to cause a denormalized flush-to-zero bug when computing earth's gravitational force, slinging it to the sun.