There are 4 256x256 background textures and 12 sprite textures(due to various sizes). To write a pixel I had to lock each one, plus I had to do the conversion between B5G5B5(GBA format) and A1R5G5B5 for each write. When (finally) done with the texture I would use the D3DXsprite interface to draw it. I got stuck here when it came to rotation and alpha blending.
The screen is one big 256x256 byte array and I'm using a D3DFMT_P8 surface. It was great at first and really fast. Then I got to adding the second palette. I had to put 512 colours on a P8 surface and well that didn't work too well.
A big 256x256 long array using A8R8G8B8 format. Rather than do the conversions per pixel I just have a 512 long array and calculate them at vblank. So to write a pixel I just do: "ScreenMemory[Y][X]=Palette[Number]" not as fast as P8 but multiple palettes and alpha blending shouldn't be an issue.
And to render the screen now instead of multiple D3DXSprite calls I can just:
(I made my own surface class)
Here's another screenshot to arouse you:
(again this isn't the actual game, just a dump of the video ram)
(the shitty FPS is from D3D streching the backbuffer, at 1x it's 60FPS)