Sorry for the delayed reply, holiday festivities and family can tie up a lot of time...
Thanks for the ideas, in the end after playing around with it all I went with a different approach. I'll describe it here briefly in case anyone's still reading this thread.
The general idea is that I wanted to send as little information to the video card as possible, and that items to be drawn are variable sized glyphs. The ideas above were attempts to expand the data on-the-fly. Eventually I decided to go another route. I render enough vertices to cover all the glyphs that need to be drawn, but the vertices themselves contain no actually data. In the vertex shader, with the help of a few constant buffers, I map the vertex id to its instance id and its glyph vertex number. So say the first glyph requires 12 vertices to render and the 2nd glyph 8, then vertex #2 maps to the second vertex of the first glyph, and vertex #14 maps to the second vertex of the second glyph. Once I have the instance data and glyph vertex number I just combine the two to get the final vertex data which is sent on down the pipeline.
This way I only need to upload the vertex -> instance mapping (a simple table with 1 32 bit entry per instance), and the per-instance data. All of this is alot less than doing it the normal way of just uploading the vertices directly.