Right now, I'm working on a method for quickly grabbing the average screen color and spewing it out to my arduino to control my room's RGB strips.
I've been using mostly device contexts like GetDC(NULL) and then using StretchBlt to "stretch" it down to one pixel. But that seems like a ludicrous number of calls in total. I see about 15 fps and 6% cpu usage. (When grabbing one pixel, it's probably 100+ fps with <1% usage.)
But, I've heard there's a way to directly access the GPU's framebuffer. In which case, if GDI uses the framebuffer, I could grab the frame and have the GPU do the averaging massively parallel via CUDA, then send the returned color directly to the Arduino.
But that would depend strictly on if the windows GUI uses the framebuffer. (And perhaps how much initialization it would take to access it.)