I had implemented this back in... 2006 I think, on an entry-level GeForce6.
I was thinking a way around this would be to create an image at run time where each pixel represents the XYZ and intensity of the light as RGBA and send that instead to the shader. That way I could theoretically have a thousand lights with no problems.
Main problem with that kind of hardware is they cannot loop arbitrarily. They can loop 255 times at best, but even when fitting the HW resources, the shader might eventually time out and abort. Hopefully this is gone now. If memory serves, it was possible to process about... perhaps 50 lights per pass?
Performance was interactive, but far from viable.
Real problem (which holds even now) is float textures take quite a lot of bandwidth and float texture lookup is often slower than int lookup.