For what its worth, if anybody knows this software:
I'm the co-author of the opengl rendering together with christophe riccio (http://www.g-truc.net/).
So basically, the opengl preview viewports of Vue has a GUI-thread message producer; and Render-thread message consumer queue system, based on a dual std::vector that swaps during consumption (each frame); and each message "push" takes the lock (boost mutex + lock) and each consumption also before swapping the queue. It just so happens that in common situations, more than 200 000 messages are pushed by second, and it is by no way a bottleneck.
I just mean to say, if you are afraid of 512 locks per frame... there is a serious problem in your "don't do too much premature optimization" thinking process, that needs fixing.
I agree that its total fun to attempt to write a lock free queue, but, if it was production code, frankly not worth the risk; and plainly misplaced focus time.
Now just about the filter thing, one day, just to see, I made a filter to avoid useless duplication of messages, it worked but was slower than the raw, dumb queue. I idon't say its absolutely going to be the same in your case; just try... but in my case being clever was being slower.