If its not caused by the state queries from glGet*() calls, then it probably related to the drawing calls, unless its something somewhere else.
When you changed to unsigned short, did you also make sure to change all the GL_UNSIGNED_INT's to GL_UNSIGNED_SHORT and all the sizeof(unsigned int) across all the buffer setup calls and draw calls?
You said the fps are not accurate in your first post, is the scene sluggish? can you profile where the time is being consumed? if not, make sure the fps are correct so you have a solid value to go by.
You might try bypassing vaos and just binding buffers and enabling arrays at draw time. I had a problem once with vertex array objects causing problems, it was related to the way the gl functions were being loaded by glew, it required glewExperimental set to true before calling glewInit(), not sure if its was on amd or nvidia. Im not sure how youre loading gl calls, maybe its relevant. Valve software released a doc "Porting source to linux" where they said vaos were slower then glVertexAttribPointer on all implementations.
Depending on your targeted GL version you might want to look at the gl_vertex_attrib_binding extension.