Hodgman, as you said - it changes performance and behavior drastically, so it's not acceptable.
The point is to measure current latency, and by doing what you suggest, not only performance is altered, latency itself is altered drastically since i no longer allow queuing of frames.
Sharing the options i can think of:
1. Using GPU time_stamp queries and somehow learning how to match between cpu time_stamp and gpu time_stamp (tricky...)
2. Polling event queries on a dedicated thread in around 1000 getdata queries per minute.
I must check how much it eats from the core it runs on... hopefully not too much since it's not a full busy polling.
3. Probably the best method remains waiting on low level events, the same way as GPUView does.
BTW - this code will not run on my own application, it is injected to other applications. But any solution that works on my own application, without altering the original latency/performance is acceptable as well