there are some windows hotkeys that can get in the way of a game. I have problems with the windows key and Oblivion for example. Alt-tab might interfere with gameplay in Oblivion too. tab is your inventory, and you use it a LOT! even in the middle of combat to drink healing potions and select spells.
So add an option to your configuration screen where people can opt-in to disabling these keys.
however, unless the "restore" from system mem to vidram is triggered explicitly by the developer (and i believe its automatic), that would seem to require a check before using a resource in vidram, to make sure it was in sync with the system mem copy first.
No, it can use the same mechanism that you'd use on the application side to deal with vidram losses -- only when the lost device flag is raised (checked once per frame upon present), then iterate the list of managed resources and restore them using the sysram copy.
lost device wouldn't be an issue if i didn't have to use windows to talk to the vidcard. but then i wouldn't get the benefits of using windows to talk to the vidcard.
Exactly. On the PS3 I had the luxury of not having to go through the OS to talk to the GPU... and it was a nightmare. I had the option of using higher level APIs, but I was writing an engine, so I may as well go as close to the metal as I could, right?
* Packing bits and bytes manually into structures in order to construct command packets, instead of just calling SetBlahState() -- not fun. Yeah, slightly less clock cycles, but not enough to matter. Profiling the code shows it wasn't a hot-spot, so time-consuming micro-optimisations are a waste. I'm talking about boosting the framerate from 30FPS up to 30.09FPS, by a huge development cost. I could've spent that time optimising an actual bottleneck. Also, any malformed command packets would simply crash the GPU, without any nice OS notification of the device failure, or device restarts, or debug logs... The amount of time required to debug these systems was phenomenal, which again, means less time that I could use to optimize parts that actually mattered.
* Dealing with vidram resource management myself -- not fun. Did you know that any of your GPU resources, such as textures, may exist as multiple allocations? In order to achieve a good level of parallelism without stalls, the driver programmer (or poor console programmer) often intentionally introduces a large amount of latency between the CPU and GPU. When you ask the GPU to draw something, the driver puts this command into a queue that might not be read for upwards of 30ms. This means that if you want to CPU-update a resource that's in use by the GPU, you can either stall for 30ms (no thanks), or allocate a 2nd block of memory for it. Then you need to do all of the juggling that makes n-different vidram objects appear to be a single object to the application developer. The guys that write drivers for your PC are really good at this stuff and know how to do it efficiently. There's also lots of sub-optimal strategies that seem like a good idea to everyone else (i.e. your GPU driver probably solves these issues more efficiently than you would anyway).
* Then there's porting. Repeat the above work for every single GPU that you want to support...
Giving up a few clock cycles to the API has turned out to be a necessary evil. The alternative just isn't feasible any more.
Profile your code and optimize the bits that matter. Also, your obsession with clock cycles as a measure of performance is a bit out-dated. Fetching a variable from RAM into a register can stall the CPU for hundreds of clock cycles if your memory organization and access patterns aren't optimized -- on a CPU that I used recently, reading a variable from RAM could potentially be as costly as 800 float multiplications, if you had sub-optimal memory access patterns!
The number one optimisation target these days is memory bandwidth, not ALU operations.