GPU slower then CPU?

Started by
9 comments, last by crowley9 17 years, 4 months ago
Recently I started playing with Cg language and made a vertex shader what animates my particles. The code looks like this :

void main(float4 position : POSITION, float3 color : COLOR0, float3 normal : NORMAL,
	 out float4 oPosition : POSITION, out float3 oColor : COLOR0,
	 uniform float3 startColor,uniform float3 endColor)
{
	float3 grav = {0,-10,0};	
	
	float4x4 ModelViewProj = glstate.matrix.mvp;
	
	grav.y *= color.z;
	
	position.xyz += normal * color.y + grav;	

	oPosition = mul(ModelViewProj, position );
	oColor = startColor + color.x * (endColor - startColor);
}
in parameters are : position : position of particle in time of its birth normal : initial velocity color.x - normalized time of particle life (value between 0-1) color.y - time of particle life color.z - squared time of particle life startColor - color of particle in time of its birth endColor - color of particle in time of its death Best FPS using shader version is about 28fps, version based on cpu runs at 36fps. I'm not asking why fps are that low (because I know why), what I'm asking is : is it normal that shader version is slower then cpu version or maybe I'm doing something wrong here? My card is Gf 6600 GT.
Advertisement
In general, it would be abnormal for the CPU to do this task slower than the GPU, but there are so many complex factors that go into program optimization that it is impossible for anybody to know what's going on without looking at your entire source code and knowing something about your system setup. We know you have a 6600 GT, but we still don't know anything about your CPU. Given that the 6600 is an older card, you might be vertex bound, but you can't really tell without profiling. Get a good profiler, or at least use some sort of timing mechanism to get some good data about your workload, and you'll probably figure out what's going on.
You don t pass the information to the vertex shader in intermediate mode do you?

Vertex arrays are the way to go, otherwise its really abnormal since 6600 is a SM3.0 complaint card
http://www.8ung.at/basiror/theironcross.html
Well..ekhm ;) ok I rewriten it to use vertex arrays. Now the difference between CPU and GPU is not such high, but still CPU tends to be about 2-3 fps faster. My machine is 64bit Sempron 3000+ , 1 GB of ram.

cwhite:
Abnormal to cpu to do it slower? I thought that GPU do vector based arithmetics faster then CPU and equation such as pos = startPos * time + acc * time^2 * 0.5 where pos, startPos and acc are vectors, should be faster on GPU.
How many particles do you have ?

Fillrate issues excepted, i'd expect the vertex shader ran on the GPU to be tens of times faster.

Y.
I have 10000 particles drawn twice, once for left eye and once for right eye (stereo vision).
Quote:Original post by Kamil
I have 10000 particles drawn twice, once for left eye and once for right eye (stereo vision).


This might explain it. On the CPU, you could be performing the animation code once ahead of time and then re-transforming in the VS, but when you move it to the GPU, you're executing the animation code twice.

This probably wouldn't happen on a beefier GPU.

JB
Joshua Barczak3D Application Research GroupAMD
Well not exactly.. here is my rendering code (C#) :
public void Render(){			   if(emiterProgram.Usable == false)   {	Vector3 gravity = new Vector3(0,-10,0);	for(int i = 0; i < particles.Count; i++)	{	        Particle p = particles;		Vector3 pos = p.InitialPosition + p.InitialVelocity * p.TimeOfLife + (p.TimeOfLife*p.TimeOfLife*gravity * 0.5f);		float lifeTimeNormalized = p.TimeOfLife / p.TimeOfDeath;		Vector3 color = startColor.Lerp(endColor,lifeTimeNormalized);                Renderer.Instance.DrawPoint(pos,color,pointSize);	}   }   else   {	RenderCg();   }}        private void RenderCg()        {               emiterProgram.StartProgram();                        Vector3 timeOfLife;                float [] positions = new float[particles.Count * 3];            float [] timeOfLifes = new float[particles.Count * 3];            float [] velocities = new float[particles.Count * 3];            for (int i = 0; i < particles.Count; i++)            {                Particle p = particles;                float lifeTimeNormalized = p.TimeOfLife / p.TimeOfDeath;                timeOfLife.x = lifeTimeNormalized;                timeOfLife.y = p.TimeOfLife;                timeOfLife.z = p.TimeOfLife * p.TimeOfLife * 0.5f;                                positions[i*3] = p.InitialPosition.x;                positions[i*3+1] = p.InitialPosition.y;                positions[i*3+2] = p.InitialPosition.z;                                velocities[i*3] = p.InitialVelocity.x;                velocities[i*3+1] = p.InitialVelocity.y;                velocities[i*3+2] = p.InitialVelocity.z;                                timeOfLifes[i*3] = timeOfLife.x;                timeOfLifes[i*3+1] = timeOfLife.y;                timeOfLifes[i*3+2] = timeOfLife.z;            }                        Renderer.Instance.DrawPoints(positions,timeOfLifes,velocities,pointSize);            emiterProgram.EndProgram();        }



Render is called twice per frame render, if cgProgram is loaded and compiled properly then emiterProgram.Usable returns true and RenderCg is called. As you can see both versions don't store any calculations. More then that, CPU version don't use vertex arrays, just glVertex3f..


as for emiter.StartProgram ane EndProgram :
public void StartProgram()		{			if(usable == true)			{								if(vertex) Cg.Instance.EnableVertexProfile();				else Cg.Instance.EnableFragmentProfile();								//Cg.Instance.LoadProgram(cgProgram);				Cg.Instance.BindProgram(cgProgram);				// check for no error				if(Cg.Instance.CheckCgError() == true)				{										usable = false;														}			}		}				public void EndProgram()		{			if(usable == true)			{								if(vertex) Cg.Instance.DisableVectexProfile();				else Cg.Instance.DisableFragmentProfile();							}		}
Quote:Original post by Kamil
cwhite:
Abnormal to cpu to do it slower? I thought that GPU do vector based arithmetics faster then CPU and equation such as pos = startPos * time + acc * time^2 * 0.5 where pos, startPos and acc are vectors, should be faster on GPU.


My bad, I wrote that on little sleep. I meant it would be abnormal for the GPU to do it slower.
Hmm do I have an "abnormal" GPU? I'm using c# and tao cg binding, could it be the bottleneck? Can anyone point me to a good free C# profiler so I can check where is the problem.

This topic is closed to new replies.

Advertisement