That way you can just pack all your static meshes in one big buffer, managing the offsets yourself (which is fun ) and have only a couple VAO switches. Since you're essentially doing memory management there, you need to have in mind things like memory fragmentation (ie, what happens if you pack 500 meshes then remove 200 randomly from the same buffer, things get fragmented), so beware.
So, basically I need 3 "global" buffers (1 for vertices, 1 for normals, 1 for uv coordinates), then pack all static (Why just static? My dynamic meshes have the same format, can't I just include them as well?) mesh data in those three. During rendering I then just bind these three buffers once at the beginning (=1 vao switch) and use glDrawElementsBaseVertex for each mesh with the appropriate offset.
Is that about right?
How are you measuring time? I'm guessing that's total CPU per frame?
No, it's just the time for the render loop (The pseudo code). I've used std::chrono::high_resolution_clock to measure it, so it's just the CPU time. I'll give ARB_timer_query a try.
According to the profiler "Very Sleepy", the main CPU bottleneck is with "DrvPresentBuffers". I'm not sure if that means it's the GPU itself, or the synchronization/data transfer from CPU to GPU.
If your problem is that your GPU time per frame is the bottleneck, then you'll have to optimize your shaders / data formats / overdraw / etc.
If you problem is your CPU e per frame is the bottleneck, then it's a more traditional optimization problem. Measure your CPU-side code to see where the time is going.
I'm pretty sure the shader isn't the problem, the fps stay the same even if I simply discard all fragments and deactivate the vertex shader.
Changing the resolution also changes nothing (I've tried switching between 640x480 and 1920x1080, fps is the same), so I think I can also throw out overdraw as a possible candidate?