Think I found my problem!
In my display() function, I was calling glGetUniformLocation() for the projection, world and texture locations. That seems to be what blocks, not glBindBuffers() ( I had been overlooking that in the profile region ).
If I instead cache the uniform locations on init after linking the shader, I no longer stall in my display() function. Now SwapBuffers() is what takes the 16.6 ms as expected.
I am a little surprised that getting the uniform location would block like that, but maybe it's documented somewhere...?