So I'm reading this and wondering. Would the best way to do this be to have 1 particle and instance it with separate transformation matrices? Sorry if I just hijacked your thread.
I'd use a list of particles, updating their positions in a dynamic vertex buffer, calculate the world matrix once and then draw with the D3DDevice->DrawIndexedPrimitive (triangle list) method. But depending on your target hardware and version of DX, you can do almost all of it on the GPU these days.