Sign in to follow this  
Kamil

GPU slower then CPU?

Recommended Posts

Recently I started playing with Cg language and made a vertex shader what animates my particles. The code looks like this :
void main(float4 position : POSITION, float3 color : COLOR0, float3 normal : NORMAL,
	 out float4 oPosition : POSITION, out float3 oColor : COLOR0,
	 uniform float3 startColor,uniform float3 endColor)
{
	float3 grav = {0,-10,0};	
	
	float4x4 ModelViewProj = glstate.matrix.mvp;
	
	grav.y *= color.z;
	
	position.xyz += normal * color.y + grav;	

	oPosition = mul(ModelViewProj, position );
	oColor = startColor + color.x * (endColor - startColor);
}
in parameters are : position : position of particle in time of its birth normal : initial velocity color.x - normalized time of particle life (value between 0-1) color.y - time of particle life color.z - squared time of particle life startColor - color of particle in time of its birth endColor - color of particle in time of its death Best FPS using shader version is about 28fps, version based on cpu runs at 36fps. I'm not asking why fps are that low (because I know why), what I'm asking is : is it normal that shader version is slower then cpu version or maybe I'm doing something wrong here? My card is Gf 6600 GT.

Share this post


Link to post
Share on other sites
In general, it would be abnormal for the CPU to do this task slower than the GPU, but there are so many complex factors that go into program optimization that it is impossible for anybody to know what's going on without looking at your entire source code and knowing something about your system setup. We know you have a 6600 GT, but we still don't know anything about your CPU. Given that the 6600 is an older card, you might be vertex bound, but you can't really tell without profiling. Get a good profiler, or at least use some sort of timing mechanism to get some good data about your workload, and you'll probably figure out what's going on.

Share this post


Link to post
Share on other sites
You don t pass the information to the vertex shader in intermediate mode do you?

Vertex arrays are the way to go, otherwise its really abnormal since 6600 is a SM3.0 complaint card

Share this post


Link to post
Share on other sites
Well..ekhm ;) ok I rewriten it to use vertex arrays. Now the difference between CPU and GPU is not such high, but still CPU tends to be about 2-3 fps faster. My machine is 64bit Sempron 3000+ , 1 GB of ram.

cwhite:
Abnormal to cpu to do it slower? I thought that GPU do vector based arithmetics faster then CPU and equation such as pos = startPos * time + acc * time^2 * 0.5 where pos, startPos and acc are vectors, should be faster on GPU.

Share this post


Link to post
Share on other sites
How many particles do you have ?

Fillrate issues excepted, i'd expect the vertex shader ran on the GPU to be tens of times faster.

Y.

Share this post


Link to post
Share on other sites
Quote:
Original post by Kamil
I have 10000 particles drawn twice, once for left eye and once for right eye (stereo vision).


This might explain it. On the CPU, you could be performing the animation code once ahead of time and then re-transforming in the VS, but when you move it to the GPU, you're executing the animation code twice.

This probably wouldn't happen on a beefier GPU.

JB

Share this post


Link to post
Share on other sites
Well not exactly.. here is my rendering code (C#) :


public void Render()
{
if(emiterProgram.Usable == false)
{
Vector3 gravity = new Vector3(0,-10,0);
for(int i = 0; i < particles.Count; i++)
{
Particle p = particles[i];
Vector3 pos = p.InitialPosition + p.InitialVelocity * p.TimeOfLife + (p.TimeOfLife*p.TimeOfLife*gravity * 0.5f);
float lifeTimeNormalized = p.TimeOfLife / p.TimeOfDeath;
Vector3 color = startColor.Lerp(endColor,lifeTimeNormalized);

Renderer.Instance.DrawPoint(pos,color,pointSize);
}
}
else
{
RenderCg();
}
}

private void RenderCg()
{
emiterProgram.StartProgram();

Vector3 timeOfLife;
float [] positions = new float[particles.Count * 3];
float [] timeOfLifes = new float[particles.Count * 3];
float [] velocities = new float[particles.Count * 3];
for (int i = 0; i < particles.Count; i++)
{
Particle p = particles[i];
float lifeTimeNormalized = p.TimeOfLife / p.TimeOfDeath;
timeOfLife.x = lifeTimeNormalized;
timeOfLife.y = p.TimeOfLife;
timeOfLife.z = p.TimeOfLife * p.TimeOfLife * 0.5f;

positions[i*3] = p.InitialPosition.x;
positions[i*3+1] = p.InitialPosition.y;
positions[i*3+2] = p.InitialPosition.z;

velocities[i*3] = p.InitialVelocity.x;
velocities[i*3+1] = p.InitialVelocity.y;
velocities[i*3+2] = p.InitialVelocity.z;

timeOfLifes[i*3] = timeOfLife.x;
timeOfLifes[i*3+1] = timeOfLife.y;
timeOfLifes[i*3+2] = timeOfLife.z;
}

Renderer.Instance.DrawPoints(positions,timeOfLifes,velocities,pointSize);
emiterProgram.EndProgram();
}



Render is called twice per frame render, if cgProgram is loaded and compiled properly then emiterProgram.Usable returns true and RenderCg is called. As you can see both versions don't store any calculations. More then that, CPU version don't use vertex arrays, just glVertex3f..


as for emiter.StartProgram ane EndProgram :

public void StartProgram()
{
if(usable == true)
{
if(vertex) Cg.Instance.EnableVertexProfile();
else Cg.Instance.EnableFragmentProfile();

//Cg.Instance.LoadProgram(cgProgram);
Cg.Instance.BindProgram(cgProgram);
// check for no error
if(Cg.Instance.CheckCgError() == true)
{
usable = false;
}
}
}

public void EndProgram()
{
if(usable == true)
{
if(vertex) Cg.Instance.DisableVectexProfile();
else Cg.Instance.DisableFragmentProfile();
}
}

Share this post


Link to post
Share on other sites
Quote:
Original post by Kamil
cwhite:
Abnormal to cpu to do it slower? I thought that GPU do vector based arithmetics faster then CPU and equation such as pos = startPos * time + acc * time^2 * 0.5 where pos, startPos and acc are vectors, should be faster on GPU.


My bad, I wrote that on little sleep. I meant it would be abnormal for the GPU to do it slower.

Share this post


Link to post
Share on other sites
Hmm do I have an "abnormal" GPU? I'm using c# and tao cg binding, could it be the bottleneck? Can anyone point me to a good free C# profiler so I can check where is the problem.

Share this post


Link to post
Share on other sites
I am almost certain that you are CPU bound in both cases, so you will see almost no benefit by shipping some arithmetic onto the GPU.

Firstly, your render loop allocates a bunch of huge arrays on each RenderCG call. This will fill up your memory until the garbage collector comes in and cleans things up. This will mean that your app is probably swapping out to disk, not to mention the overhead of the allocating and de-allocating.

Secondly, the CPU code does the iteration via one pass through your data. Whereas your CG version, copies the data into a bunch of non-interleaved temporary buffers, which probably have to be copied several times more until they get into the right form for the GPU to read them.

This is WAY more work than the little bit of maths required to move your particles a single step.

If you want to do particles on the GPU efficiently, you will want to use the pixel shaders (since these traditionally have more computing power), and get them to render to a "vertex buffer" - from which you can render your point sprites.

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this