I just do the naive thing with 2D and it seems quite fast. Even on older hardware I could throw tens of thousands of them around like it was nothing.
I pre-fill a static 16 bit index buffer and create a dynamic vertex buffer for my sprites. I create the sprite geometry each frame and do some basic sorting and batching. It's pretty easy to batch large amounts of the sprites into a single draw call. Buffers for it are created once up front and destroyed only at the end.
Since the index buffer was 16 bit, I capped the number of sprites at 10922 per draw call.
(10922 * 4) * sizeof(Vertex2D)... which was 32. So 1398016 bytes for the vertex buffer. About 1.4mb.