He only calls it once; it is a static.
there is no need to re-query the frequency as it is strictly not allowed to change at runtime
V-sync and other factors are why input actual in games are not handled in the way shown in your test application, so I disregard those results entirely and would only accept your data for your basic non-OpenGL test.
Last test up to 65 ms delays, Vsync is only 16 ms.
You are still timing the call to new; it just spans across multiple calls to WindowProc(). That is, you store the current time after WM_INPUT, then do stuff, but all that stuff you do is delaying the next WM_INPUT which could already be in the buffer and waiting.
Eliminate the call to new entirely.
Make a static buffer of an array of 3 RAWINPUT structures and if the first call to GetRawInputData() returns a size greater than (sizeof( RAWINPUT ) * 3) then print an error, dump the message, and increase the size of the static buffer if you want.
While this may not be what you would do in an actual game, the important point now is to find the bottleneck.
If it improves the timings then you know at least one of the main culprits. If not, it won’t be a problem in a real game to do it properly but you need to keep searching for the answer before you add back the allocations (which are leaking, just so you know).
You should also be prepared to accept that your timings are accurate. Maybe you aren’t hitting 2 keys as closely together as you think.