So to answer some perf questions:
1. I am currently testing my engine/app in 2 settings.
A. One is where I let my game loops run hot with only a sleep(0) at the end of each to make sure they play well with the OS. In this setup on my laptop I get ~500 fps in an 800x600 window, and about double that in full screen mode. That number is big and abstract enough that it really means nothing. Unfortunately I don't have a low end machine to do real testing on so for now it will have to do. My goals have been to increase this number while keeping all actions smooth in game.
B. Two is where I clamp all of my game loops at 60fps and sleep the remaining time. While doing this I measure CPU utilization as one means of measuring perf. My goals have been to minimize this number while keeping all actions smooth in game.
2. Things get a little more complex because I am running in a multi-threaded engine environment. That means I can set my AI loop running hot while clamping the rest as an example. What this does allow me to do is to tackle one loop at a time and see how performance is affected. This also means I am able to balance things fairly well between my engine systems. Ie. I know my AI loop doesn't need to run nearly as often as my physics loop so I can adjust things around. So far I've found that my renderer and physics engine are the only things that really benefit from running as fast as possible. As such I'm targetting the renderer first as a potential perf target.
3. I am definitely CPU bound even when I set my renderer loop hot. Most of my scenes consist of 100-200 verts so the GPU has no problems at all with that. As such I actually have seen about a 10% improvement in frame rate after my latest batching work.
4. All this work may be moot as I am using my current laptop as my target machine. Things will undoubtedly change once I can get my hands on an older system. In that case the perf improvements might need to happen elsewhere.