This question regards OpenGL in general, but I think it's more specific to the mobile side of things were efficiency is especially important. Over the years, I've developed my OpenGL ES games with the mindset that making any sort of gl* call will almost-always query the GPU as mobile implementations of OpenGL are completely hardware-accelerated, like modern implementations of OpenGL. That being said, any thread making a gl* call will be halted while the CPU queries the GPU, waiting for a response. Is this correct so far?
I understand that cutting down on glEnable/glDisable calls by calling by sorting renderable elements with similar states, and also wrapping those two in my own state manager is important. I also do the same when binding buffers and textures. You could call SetTexture(GL_TEXTURE_2D, 0, &tex->name) to bind to the first texture unit in 2D. If that particular texture name, for that target at that texture unit has already been bound, then it won't do it again. This would come in handy when rendering multiple instances of the same model multiple times because it'd call glBindTexture(), glBindBuffer(), etc once for the first model, but all subsequent calls wouldn't because they're all using the same texture/buffer parameters that's common to the loaded model they share. Same for checking shaders. It's pretty common that multiple models might use the same shader in a scene. Image rendering dozens of individual models to the screen, but only having to call glUseProgram() twice each frame instead of once per instance rendered. I mean, since I'm still using OpenGL 2.1 (OpenGL ES 2.0 for mobile), glDrawElements() is called once per mesh per instance of the model drawn. For example, drawing 12 instances of a model with 5 meshes would be 60 draw calls. This could be heavy on mobile until I learn about instancing in higher versions of OpenGL and support OpenGL ES 3.0 on mobile.
My question is: is my managing OpenGL contexts internally in my engine worthwhile? Is it a huge performance hit to call glBindTexture() constantly (especially on mobile), or do OpenGL implementations usually check this already. Should I just focus on keeping draw calls down, or is the way I'm managing my states pretty important too?
From what I've read about OpenGL 4.5, it's going through a significant rewrite to be closer to vender-specific implementations such as AMD's Mantle, NVIDIA's CUDA and even iOS's soon-to-be Metal API (ok, so that one's OS-specific working on providing efficient OpenGL drivers under-the-hood) so we could set a texture at a specified target at a specific active texture unit in one function call instead of 2.