Writing an easily maintainable, powerful and flexible particle system (new: efficient particle data packing)

Started by
12 comments, last by IntegralKing 12 years, 10 months ago
Alright then, I have transform feedback properly up and working now and I'm trying to figure out the most efficient way to pack my particle data in GPU memory. Here's what I have thus far:

Total: 3 + 4 + 3 = 10 floats = 40 bytes per particle (vec3, vec4, vec3)

I

vec3 = unpacked positional information

II

vec4 = packed source and destination colors: float0 = src color, float1 = src alpha, float2 = dst color, float3 = dst alpha

III

data packing is done as:


//particle data packing into vec4
//float0
// bits function
// 0 - 4 texture index in the atlas (0-15)
// 4 - 8 stage
// 8 - 16 persistence (0-255) (essentially decay rate)
// 16 - 20 target scale (0-15)
// 20 - 24 current scale (0-15)
// 24 - 32 rotation (0-255)
//float1
// stage age not packed
//float2
// velocity not packed
//float3
// packed direction vec3 packed into a float

//Rotation speed is deduced from particle stage and persistence.

The vec3->float->vec3 packing functions are:



vec3 Float2Vec3(in float f)
{
vec3 color;
f *= 256.0;
color.x = floor(f);
f -= color.x;
f *= 256.0;
color.y=floor(f);
f -= color.y;
color.z = floor(f*256.0);
return color*0.00390625; // color/256
}


float Vec32Float(vec3 color)
{
const vec3 byte_to_float = vec3(1.0, 1.0 / 256.0, 1.0 / (256.0 * 256.0));
return dot(color, byte_to_float);
}



If anyone has any suggestions on how to do this even better or how to increase precision, by all means please post some feedback! :)
Advertisement
GPUs only have 4 element data types. So u can use vec4 instead of your vec3 for free.

GPUs only have 4 element data types. So u can use vec4 instead of your vec3 for free.


Oh, thanks for mentioning that! This shouldn't matter for storage, though, as I'm defining a fixed length stream of N floats (which is numParticles * sizeof(particle) of bytes in floats). That is to say, by using two streams of vec3's and one stream of vec4, I can still save 8 bytes of storage per particle (which as my aim), even if the readback from each of the the streams is done 4 floats at a time. Right?
Xbox360 games are usually GPU bound, not CPU bound.

Indie developers that are forced to use XNA to develop on the Xbox 360 are typically bound by the CPU, as they have something like 1/2 - 1/10th the CPU power that one might have with a C++ devkit.


Microsoft jumps through hoops to make C# run in an environment where modification of code is prohibitied for security reasons, and C# being a managed language gets some of it's perf benefits from self modification.

The xbox uses the compact .NET CLR, which is a piece of crap, but adapting the more robust desktop interpreter to the xbox OS is a big job.

XNA does not provide access to the Xbox's floating point unit.



That said, Halo: Reach has particles (computed on the GPU) that collide with each other and the environment, and their stated upper limit on these particles is something like 25000. There's no way that could be done on the CPU.

This topic is closed to new replies.

Advertisement