Proxy DLL update (See previous entry for what this is about). I've been doing what NVPerfHUD does to see what affects the frame rate (measured by PIX):
Forcing Z-culling has no effect. So I'm not pixel shader or fillrate bound.
Forcing a 1x1 scissor rect also has no effect.
Forcing 2x2 dummy textures has no effect. Not texture bandwidth bound
Not calling any DrawPrimitive() calls causes my FPS to go up from 1.9-2.0 FPS to > 100 FPS.
So, it looks like I'm vertex shader bound, which is bad, since it should be using the fastest vertex processing possible [sad], and I don't think there's anything I can do about it really [sad]
I've yet to try my cached vertex buffers idea, that requires deriving from IDirect3DVertexBuffer9, and I'm all derived-out after deriving from IDirect3DDevice9 and it's hundred-odd functions...
It's only an idea but maybe you could give it a try.
A couple of months ago I tested my framework/engine on an Acer Aspire One. The application is usually vertex shader bound since vertex shaders are software. I tried a reference scene and it could easily sustain 25-30 fps. As a note, the hardware is shader model 2.0 and there's no way to enable HW vertex processing. Powered by a 1.6Ghz Atom, I guess my rig is far worse than your laptop.
When enabling an optimized scene representation fps went down to 4-6 fps. That made no sense at all, as I never saw that behaviour before: the optimized scene was faster on every hardware I tested.
The problem was in my drawprimitive call: I didn't correctly pass MinIndex and NumVertices, they were set to 0 and NumberOfVerticesInVB.
By just sending the correct values the optimized scene now renders at an average framerate of 45fps, with peaks at 65.
The problem doesn't affect the reference unoptimized scene, which has a VB/IB pair for each piece of geometry, thus 0 and NumberOfVerticesInVB ARE THE CORRECT VALUES.
I guess you could check which values get passed to drawprimitive, if there are only 3 huge VBs and 160 settexture calls then this could be a similar problem.
I don't know if creating VBs every frame is your bottleneck, as I don't do such weird things in my pipeline.
My 2 cents.