I've read that the order in which objects are rendered can make a big difference to performance, but there are two different strategies and I'd like to know if one is better than the other or "it depends" etc. They can be used together with major and minor sort orders, but which should be minor and which major? I'm most interested in OpenGL ES 2.0, but I think the same principles would apply to DirectX etc.
The first strategy is to aim to minimise the number of OpenGL state changes because they are allegedly expensive. I wouldn't have thought it would make much difference, but Mario Zechner (who wrote quite a good book about Android game development and libgdx, so I think he knows his stuff) says it can make a huge difference and advocates the use of sprite batchers for 2D rendering (where the alternative of depth sorting is irrelevant). So if you have a number of objects with the same mesh and material/ shader/ textures in different positions, you should only select their VBOs etc once per frame and render them all together before rendering objects of another type. You can go a step further and group all objects with the same shader but different mesh etc.
The other strategy (for 3D only) is depth sorting. Checking the Z-buffer and not overwriting a "nearer" pixel is supposedly much quicker than updating the framebuffer so, somewhat counter-intuitively you should render objects in near-to-far order. But are a few wasted writes to the framebuffer really slower than applying every object's MVP to its centre point and sorting? Can that be done on the GPU? I don't actually know whether GLSL variables can be uses as outputs to be read back by the CPU after running a shader, but I suspect not. And I presume that anything more complicated than sorting on centre points, ignoring whether objects overlap in XY camera space, only considerably increases pain for decreasing gain?