@Hodgman: L. Spiro seems to have answered the question you asked.
I'll only add that it is possible to synchronise GetMessageTime and QueryPerformanceTimer on a single thread, and I do it by using the MsgWaitForMultipleObjects function inside the message loop.
I don't have the code in front of me now, and I can't remember the exact math behind it, but I currently use MsgWaitForMultipleObjects to time both frames and inputs. First, I compute the dwMilliseconds parameter so that MsgWaitForMultipleObjects waits until the next frame needs to be drawn minus one millisecond (or was it 15? IIRC, I use the smallest value that timeBeginPeriod accepts, minus 1). timeBeginPeriod affects the resolution of all the windows apis, except the performance-timers - it even affects the resolution of MsgWaitForMultipleObjects.
Anyway, when MsgWaitForMultipleObjects returns because it reached this wait-timeout limit, then I use QueryPerformanceCounter in a loop to synchronize the thread to the current frame time (which I compute using QueryPerformanceCounter - this is my main game timer) - this will consume that remaining 1 millisecond (or 15) that I subtracted from the call to MsgWaitForMultipleObjects. After that, all I do is Draw and Present the scene using the currently computed game state.
If however, MsgWaitForMultipleObjects returns because it detected a message being added to the message queue, then I do the regular GetMessage(or PeekMessage))/TranslateMessage/DispatchMessage stuff, and if there are any input messages, I re-compute the game state based on GetMessageTime, and here I also do a check to make sure that GetMessageTime is behind my main QueryPerformanceCounter-based timer - if it's not, then I just use the value of QueryPerformanceCounter instead.
Now, the reason Microsoft doesn't recommend using GetTickcount (or other GetMessageTime-like apis) is because of the timeBeginPeriod - once called, it affects all running processes. By default, it has a frequency of ~ 15 ms, but if another process calls timeBeginPeriod(1), this will be used everywhere i nthe system, even for all of the thread-synchronisation APIs, but if I use timeBeginPeriod myself, then I can be sure that it's precision is the one I set - if another process changes it, then it will probably be a video game that changes it to the same value as I need (the smallest period reported by timeGetDevCaps) - but to make sure, I could also jsut call it every time I enter my message loop, before MsgWaitForMultipleObjects, and then also call timeEndPeriod after MsgWaitForMultipleObjects, or at the end of the message loop (since Microsoft recommend this) - this will keep the call to MsgWaitForMultipleObjects in high-precision mode, while not affecting the rest of the system (too much - ok, maybe it does affect it, but I don't care ).
Now, about using a separate thread for timing input events - even then you have to implement some kind of sync for accessing that "input queue" that L. Spiro mentioned, and you are going to be doing this using one of the thread-sync'ing apis or objects (critical sections, events, mutexes, etc.) - but as I mentioned, unless you use timeBeginPeriod, these apis will all be in low-precision mode (or whatever precision was set by other processes), so you are still basically affected by the GetMessageTime "delay-effect" when switching between the input thread and the rendering thread... AndI think the basic message-queue processing apis GetMessage/PeekMessage are also affected, so even if you do use QueryPerformanceCounter, your input timer is still being delayed by the GetMessageTime "delay-effect".
And of course, if you use DirectInput/XInput with Event objects, the same low-precision affects the time when your Event gets signaled (it will only get signaled at a 15ms-boundary). But if you use DirectInput/XInput by Polling (maybe in a separate thread), then you're not afected.
NOTE: As expected, I still have an issue when using VSYNC with this method (but then again, VSYNC would delay ANY timing method that runs in the same thread), since I'm also doing my own frame-sync'ing, and VSYNC will interfere with my own timing - I'm currently looking for a way to get the VSYNC-time to plug-it in into my method, but if there isn't one, I think I can still use this method by always passing a 1ms timeout to MsgWaitForMultipleObjects, and move the scene-Draw part such that I can rely on DirectX's Present method to either "wait for the next vsync and Present" or "discard the current Present if nothing new was Drawn since the last Present". I've already tested this, and it adds at most a 15% CPU-use overhead, whereas the original frame-sync method does a near-0% CPU use (as shown in Task Manager ). Ideally, the time-out value passed to MsgWaitForMultipleObjects should be the smallest common-denominator between the VSYNC (monitor refresh) rate and the "minimum period" returned by timeGetDevCaps.
Note also that by "re-calculating the game-state" above, I mean simply re-calculating all of the movement-vectors and positions of objects affected by user input, as if they happen at the time returned by GetMessageTme. My project doesn't have that many user-controlled objects currently, so this can be done without exeeding the frame-time limit, so it doesn't cause problems like spikes in FPS or anything. For objects that move by themselves, without need for user input, I still calculate their movement at every frame (right after MsgWaitForMultipleObjects returns WAIT_TIMETOUT and just before starting the QueryPerformanceCounter frame-sync'ing loop). Calculations for collision detection and other object-to-object interractions are done whenever the state of any type of object changes (optimised, based on what type of object it is), so I can't use PhysX or other physics engines that rely on fixed timesteps, but if I had to, I would probably just plug-in the PhysX timestep somewhere right after the QueryPerformanceCounter frame-sync'ing loop.