Thanks a lot, for this reply....I knew about the old/deprecated fixed function feedback system, but didn't realize there was an official replacement for the shader world. I'll do some more reading before diving in, but it looks to be a great solution.
I know the transformation is relatively cheap, but in my current implementation it is happening for 6 stages of render, per model, with potentially thousands of models. I'm also going to be doing something similar for label rendering, but will need to be able to generate the NDC coord buffer and potentially read it back for de-clutter processing on the CPU. Having a shader stage that will just populate an NDC coord buffer for readback/post-processing would be awesome.
There's also a lot you can do to avoid needing so many stages of redundant vertex transformation. For instance, you can render the geometry once and then write material IDs and properties to another set of buffers and do passes over those (whether or not this is faster will depend on a lot of factors, so as always, measure and see).
Sorry, but I don't quite follow you here...can you describe a bit more, or link some reading material?