Writing an easily maintainable, powerful and flexible particle system (new: efficient particle data packing)

Max Optional · 2011-06-04T21:17:10

I already have a particle system, but some of its functionality doesn't perform as well as I wish it did and there are a number of questions the answers to which warrant some research, which would probably take a lot of time to do. Which is why I think a public discussion on this might benefit more people than just myself. What I need is: - a dedicated particle systems whose parameters can be set individually and programmatically - emitter parameters throughout a system need to be accessible from a script, which requires them to be named - independence of the rendering code from the iteration/stepping code - post-fact extensibility (adding new particle system classes after the code is compiled) As right now I'm doing everything in a simple way (eg I have a particle system class that manages a list of particles), I need to extend this a fair bit. My idea is to implement the particle system as a self-referential emitter class that accepts a emitter configuration class as a parameter. //this is loaded from an XML file class EmitterConfiguration { float args[256]; char* names[256]; int iNumArgs; Shader* updateShader; }; class ParticleEmitter { std::vector<ParticleEmitter> particles; EmitterConfiguration* cfg; }; ParticleEmitter::Update() { cfg->EnableUpdateShader(true); for all active particles cfg->UpdateParticle(p); cfg->EnableUpdateShader(false); } ParticleEmitter* i_am_a_particle_system; This is all fine and dandy and should cover the extensibility and flexibility parts. However, I can see a number of speed bottlenecks here, which lead me to the following questions: 1) is moving particle updating off the CPU a good idea in a general sense? I mean, CPU cores are dime a dozen on many newer systems and can be expected to only become more widely spread. 2) how much of the iteration/updating should I move to the GPU? Everything? Everything except spawning? 3) how should I go about writing the GPU side of the system? The only solution I can see for storage is textures, but do they justify giving up the streaming cost? 4) how should I go about syncing between the CPU and the GPU? If most work is done on the GPU, I still need pretty detailed information about the system on the CPU and texture read-backs are probably the worst idea to opt for. 5) I'm getting no perceptible speed increase from using geometry shader billboarding - is it worth it to tie up additional GPU resources with it? 6) in systems with several particle textures, which would you recommend: sort the particles into individual lists (might require sorting in all cases as the particles are no longer drawn sequentially); suck it up and draw the particles individually (uh oh); store them in a single array, but parse the array once for all textures (same problem as in the first case); something else? (actually, come to think of it, the easiest and cheapest way is probably to instead build a texture atlas for each system when it is first created) There are probably a number of other questions that I can't think of right off the bat, but I'm really most curious about how people have managed to pull of the updating bit. Right now I'm getting a 5-20 frame FPS drop in debug mode for a handful of particles (a few hundred to a few thousand). I'm not sure if I'm fill-rate limited (as the particles are relatively large) or the bottleneck is in the fact that I'm rendering them from a linked list instead of a fixed array.

Graphics and GPU Programming Programming

Started by irreversible May 26, 2011 02:56 PM

12 comments, last by IntegralKing 12 years, 10 months ago

irreversible

2,900

Author

June 02, 2011 07:49 AM

Alright then, I have transform feedback properly up and working now and I'm trying to figure out the most efficient way to pack my particle data in GPU memory. Here's what I have thus far:

Total: 3 + 4 + 3 = 10 floats = 40 bytes per particle (vec3, vec4, vec3)

I

vec3 = unpacked positional information

II

vec4 = packed source and destination colors: float0 = src color, float1 = src alpha, float2 = dst color, float3 = dst alpha

III

data packing is done as:



		//particle data packing into vec4

		//float0

		//	bits   			function

		//	0  - 4			texture index in the atlas (0-15)

		//	4  - 8			stage

		//	8  - 16			persistence (0-255) (essentially decay rate)

		//	16 - 20			target scale (0-15)

		//	20 - 24			current scale (0-15)

		//	24 - 32			rotation (0-255)

		//float1

		//	stage age        not packed

		//float2

		//	velocity   		not packed

		//float3

		//	packed direction	vec3 packed into a float



//Rotation speed is deduced from particle stage and persistence.

The vec3->float->vec3 packing functions are:





vec3 Float2Vec3(in float f)

{

	vec3 color;

	f *= 256.0;

	color.x = floor(f);

	f -= color.x;

	f *= 256.0;

	color.y=floor(f);

	f -= color.y;

	color.z = floor(f*256.0);

	return color*0.00390625; // color/256

}





float Vec32Float(vec3 color)

{

	const vec3 byte_to_float = vec3(1.0, 1.0 / 256.0, 1.0 / (256.0 * 256.0));

	return dot(color, byte_to_float);

}

If anyone has any suggestions on how to do this even better or how to increase precision, by all means please post some feedback!

Danny02

279

June 03, 2011 12:39 AM

GPUs only have 4 element data types. So u can use vec4 instead of your vec3 for free.

irreversible

2,900

Author

June 03, 2011 01:17 PM

GPUs only have 4 element data types. So u can use vec4 instead of your vec3 for free.

Oh, thanks for mentioning that! This shouldn't matter for storage, though, as I'm defining a fixed length stream of N floats (which is numParticles * sizeof(particle) of bytes in floats). That is to say, by using two streams of vec3's and one stream of vec4, I can still save 8 bytes of storage per particle (which as my aim), even if the readback from each of the the streams is done 4 floats at a time. Right?

IntegralKing

105

June 04, 2011 09:17 PM

Xbox360 games are usually GPU bound, not CPU bound.

Indie developers that are forced to use XNA to develop on the Xbox 360 are typically bound by the CPU, as they have something like 1/2 - 1/10th the CPU power that one might have with a C++ devkit.

Microsoft jumps through hoops to make C# run in an environment where modification of code is prohibitied for security reasons, and C# being a managed language gets some of it's perf benefits from self modification.

The xbox uses the compact .NET CLR, which is a piece of crap, but adapting the more robust desktop interpreter to the xbox OS is a big job.

XNA does not provide access to the Xbox's floating point unit.

That said, Halo: Reach has particles (computed on the GPU) that collide with each other and the environment, and their stated upper limit on these particles is something like 25000. There's no way that could be done on the CPU.

Development Blog

Writing an easily maintainable, powerful and flexible particle system (new: efficient particle data packing)

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

Writing an easily maintainable, powerful and flexible particle system (new: efficient particle data packing)

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

Reticulating splines