The best option, and the one that is applied most frequently in production, is to simply use some mechanism that will sort your render graph by major state changes. A predicated tree, for instance, such that each level of the tree applies a predicate matching the next sorting parameter and deposits the elements into the appropriate sorted bucket for the next layers predicate. Then you simply have the leaf nodes be lists (pointer array, etc), and link the leafs in a linked list of sorts. State chances could then be applied by simply walking each leaf and changing states only when you move to the next leaf node. A example, listed by probably the best priorities, would be to sort in the following manner:
-Pixel shader changes.
-Pixel shader constant changes.
-Vertex shader changes.
-Vertex shader constant changes.
-Render target changes.
-Vertex format changes (SetVertexDeclaration).
-Sampler state changes.
-Vertex and index buffer changes (without changing the format).
-Texture changes.
Clearly this doesn't cover every possibilities, and there may be other things you might want to sort by (complex materials for instance, transparency, occlusion culling, z-ordering). The goal of this is to essentially batch up your state changes to minimize the total number of major states that have to change per frame.
I nabbed the list from
here, while the FAQ is for DirectX, the underlying behavior between OpenGL and DirectX will be pretty much identical in many respects for the hardware, as such anything as abstract as this list will apply to both platforms equally in most cases.