Typical loop in the window thread is getting all available input messages with PeekMessage() and doing SwapBuffers() (I've also seen GetMessage() with a timer instead of PeekMessage()).
I'm doing all processing with a thread pool and work-stealing task-based system, so all my processing is asynchronous.
The problem is that I don't want to wait until SwapBuffers() is finished to process any input messages that have come in while SwapBuffers() began and blocked waiting for the vsync; I want to have those events immediately, so I can minimize latency by processing them and preparing the data for the next frame already while the render thread is waiting for the vsync to unblock SwapBuffers(). As my input is touch, latency is more of an issue than with keyboard/mouse input because you can visibly see the delta between your finger and the object position (and using position prediction in practice doesn't give good results).
What's especially frustrating is that this problem is analogous to the problem of waiting for both WinAPI events (the kernel object) and for input messages. Yet, while the former problem is solved by the API using MsgWaitForMultipleObjects(), there's no provision for a way to wait for either an input message or buffer swap. Optimally, they should have allowed one to wait on an event triggered by the buffer swap.
So, I'm looking for workarounds. Calling SwapBuffers() on another thread, according to what I've found online, worked on XP, but performs horribly on Vista and newer (I'm targeting Windows 8.x). According to https://www.opengl.org/discussion_boards/showthread.php/182226-SwapBuffers%28%29-in-another-thread it works if the render thread calls glFinish() first before signalling the other thread to swap, but this function in itself blocks so it's a subpar solution (it would be better if glFlush() was sufficient, but I doubt it).
It seems that I'd have to have two windows, one with the GL context doing SwapBuffers(), and the other that gets the input events. The question then becomes how the hell I can get all the inputs to go to the window that is not the one doing the displaying (and I need this to work in both windowed and fullscreen modes). Is there some way to create the input-getting window to be invisible yet always active and on top of the displaying one?
I also looked at wglDelayBeforeSwapNV() but that doens't seem like a great idea, because the OS might very well preempt the thread after that call but before SwapBuffers(), thus missing the swap on that vsync.
I looked at hooks, but there's no touch equivalent to LowLevelMouseProc() which hooks those messages as they're about to be posted on the thread's queue. Hooking can also be WH_GETMESSAGE, but the documentation specifies that hooks runs when the thread calls GetMessage()/PeekMessage(), not when the message is posted. That might have been a solution if it could be set as a global hook, but global can't be used since RegisterTouchWindow() has to be called with a specific window's handle...