I've just written a completely new Direct3D wrapper that I've converted Imp over to use.
My previous wrapper was heavily based on the idea of putting as mant of the sprites as possible on a single texture and filling a single vertex buffer with all the sprite quads, in order to have as few as possible calls to DrawPrimitive() et al as possible.
However, I've been struggling to get the texture co-ordinates working properly. This wasn't apparent until I started trying to draw a triangular mouse cursor and I noticed that the pixels were getting deformed. I guess it is to do with inaccuracies in calculating the texture co-ords on a massive texture.
So the new wrapper loads each image onto its own texture and just takes the (apparently negligable) performance hit of lots of DrawPrimitive() calls. I'm actually using DrawPrimitiveUP(), since I see no benefit in using a vertex buffer since I'd have to lock and regenerate it every frame, so it's going to get copied onto the graphics card every frame anyway. Now I can just use a std::vector and I don't have to Lock() before I update it.
My god, this wrapper is easier to use though. That'll teach me to optimise prematurely.
I did find the fog caused the game to drop framerate, so I added in an optional QuadBuffer class that allows you to add multiple quads to a vertex buffer as long as they are all using the same texture. Solved the frame problem with the fog.
And since the wrapper supports drawing partial rectangles from textures, there is no reason you could not use the new wrapper for atlasing. I think if that was done but with a bunch of smaller atlases instead of one huge one, the texture co-ord problem would be avoided.
Obviously the new wrapper wastes a lot of texture memory since it is forcing everything up to power of two (my card doesn't support non-power of two). A 257x257 image will generate a 1024x1024 texture with all the remaining space wasted.
But that's okay. I just need to be careful with the sizes of large images that the game uses. For small images, like the letters of a font for example, where I can't control the size, I'm not really bothered about the small amount of wastage.
Interestingly, I loaded a 48x72 texture, which my CTexture converted to 64x128, but the Direct3D Texture Lock method indicated that Direct3D had created a 128x128 texture on my card.
Actually, that COULD imply that a font would generate a 128x128 texture for each letter. Yikes! Or maybe my card is just forcing square textures.
Guess I can atlas fonts as a special case (do the atlas in the generator program rather than dynamically in the game). Weirdly, I never seemed to get any pixel deforming on small font letters under the old system.
And yes, a lot of graphics cards do force square textures.