Is your build distributed? 2 mins sounds fast for 240 shader compiles.
No, and there wasn't a network cache either. As in Krypt0n's experience, each permutation only took about a second.
Once you took compiling for multiple platforms into account, the number got higher! I really would've loved to have had a distributed build system, or at least a network cache at the time though...
In short the algorithm is as follows:
* Create render block (with geometry, shader, constants, texture list, flags,etc)
* Add to render queue
The problem is that there are so many objects that are rendered in different ways, its not as simple as just passing an index and vertex buffer. And what happens with all the data for that object that needs to be rendered?
As you just wrote, you don't just pass an index and vertex buffer, you also pass shaders, contstants, textures, flags, etc...

I've posted in some other threads
here and
here, but the approach I've taken is to create a "state group", which is a container that stores a (variable) number of render-states, such as blend modes, cbuffer bindings, texture bindings, high-level flags (like discussed in this thread), etc...
When you submit a draw-call, you also submit a stack of these state-groups. If the same state is set by multiple groups in the stack, then the version that's higher up in the stack will be used.
Typically the stack looks like:
* Per-instance group -- binds cbuffer containing world matrix, etc.
* Material group -- binds textures and material cbuffers.
* Shader group -- binds shader and cbuffers containing default values (which are usually overridden by material group).
When processing a group of submitted draw-calls, there's also a default group, which is implicitly put at the bottom of every stack. This group contains sensible defaults, like "AlphaBlend = false".
So I guess my "render blocks" are a draw-call paired with a stack of state-groups (and a sorting key, etc).
Generally, the pattern for drawing anything looks like:
At creation time: create any required cbuffers, create any required render-groups to bind them and set states.
At draw-time: copy new state into cbuffers, submit draw-calls paired with render-groups.
Depending on how you implement this stuff, it can either be very efficient or very slow. At an old job, we spent about 30% of our CPU time inside state-group related functions... In my current engine, it's about 1.5%. In my current implementation, I've made state-groups
immutable, which makes management a lot simpler -- even though state-groups are variable size, they can't be resized after they've been created.