I'm quite sure that the cost will go up if you were to have divergence within a single warp. Try randomizing your index based on the pixel coordinate and see how it fares.
Just tried rendering a full screen quad that uses a hash of the screen coordinates to pick a random number to index an array of 262,144 sampler handles to 1x1 textures. Was only getting 8 fps. Would liked to have tried and even larger array, but I was getting GL_INVALID_OPERATION at 521,728 textures. With 256 textures I get 22 fps. My understanding is that Kepler supports this while AMD's GCN flat out does not.