Hi cozzie!
I tested on an Intel onboard chip that is not suited for gaming at 720p and it runs at 30-40 FPS. It is hard to tell because the framerate changes very rapidly. Maybe you should update it only once a second. Still, even on this PC, it is maybe just a bit too slow.
@Tom KQT: I would expect higher then 30 fps on your machine actually.
The scene's not that complex and I've been doing quite some profiling and optimizing
Could it be a laptop with a 8600M GPU?
No, it's a desktop PC and it can handle some games pretty well.
VSync IMHO is the problem, don't forget that with VSync enabled, if the PC is not able to handle 60 FPS but only let's say 55, then you won't really get 55, but half of the VSync value, which is 30.
I also verified that I had DirectX Debug Runtimes off because I normally they are on as I mostly program 3D graphics here
I don't get the VSync issue. It is either and engine configuration issue or a myth. If you have a refresh rate of 60 and the engine can't keep up, in order to get 30 FPS you would need the engine to miss the rate with a probability of 100%. If it only misses it once in a while, you should not have 30.
I have been writing engines for some while now (2D and 3D) and I always use VSync because screen tearing is annoying, and I am getting anywhere from 0.3 to 60 FPS values. When the engine is almost able to keep up, but not quite, I commonly get 53 FPS. Even NVidia in some of their documents say that you go from 60 to 30, but I have never experienced this. If I am overloading my engine with more than it can handle, it will commonly drop to 40-45, this with the in engine framerate counter, or other external framerate counters, like FRAPS.
Let's take a simple example and consider that frames occur at direct multiples of 16.66 ms for simplicity: 0, 16.66, 33.33, etc.
So your first frame should be ready to send to the GPU somewhere before the 16.66 mark. You do all the CPU and GPU work that is needed, and if you make the mark, you will get the full 60 FPS. If you never make the mark and wait for the GPU present statement to finish, you will have to wait two frames, with the second one wasted on idling and get 30 FPS. This is probably the origin of the myth or a simplification based on the simplest behaviors that are no longer valid today in modern engines/hardware.
But with double buffering, tipple buffering, adaptive vsync, etc. this simple scenario becomes much more complicated. You have at first the number of frames that you can produce, which is related and informed by vsync framerate, but not 100% coupled. This is CPU time + bus talk time + offscreen GPU rendertarget based postprocessing (which in modern games can take a lot of time if you add together HDR, FXAA/MLAA/SMAA/TXAA with DOF, SSAO, etc.). You may wish to artificially limit this one. You then have the number of frames that you can present to the GPU. This might be 60 or 30, but because of before mentioned decoupling it will be somewhere in between.
So basically, if every single frame you have takes the same time and this time is over 16.66 and you use the simplest of present schemes, you will get 30 FPS. Otherwise, you will get somewhere in between. The next step down from 30 is 20 FPS. If once every 9-10 seconds one single frame no longer takes that constant amount of time, but 3 times as much, you won't go down to 20 FPS. You'll loose a few frames once in that interval and you average framerate will barely be affected.