Archived

This topic is now archived and is closed to further replies.

I need help proving this

This topic is 5050 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

Me and my "friend" disagree on which is more strenous on the graphics card for 3D graphics, rasterization, or T/L? Please voice your opinion. Mine is that T/L/vertex shaders are WAY more intense than rasterization. This disagreement may result in something, as I''m trying to convince my "friend" to add h/w accel using T/L functions to his engine. void Signature(void* Pointer) { PObject(Pointer)->ShowMessage("Why do we need so many pointers?"); };

Share this post


Link to post
Share on other sites
In most cases, rasterization is the bottleneck. This is especially true if you''re using complex fragment shaders, or if you have so many textures that you start thrashing the cache.


"Sneftel is correct, if rather vulgar." --Flarelocke

Share this post


Link to post
Share on other sites
There is absolutely no reason not to use hardware T&L if you can.


"Sneftel is correct, if rather vulgar." --Flarelocke

Share this post


Link to post
Share on other sites
I stand corrected, but I still think that my "friend" should use hardware T/L if possible. It''s the difference between running really really slow due to doing T/L in software, and going fast.

void Signature(void* Pointer)
{
PObject(Pointer)->ShowMessage("Why do we need so many pointers?");
};

Share this post


Link to post
Share on other sites
Depends... if your CPU isn''t doing ANYTHING, and you have lets say... an AMD64 FX. You probably want to do ProcessVertices() (always on the CPU), and then feed the transformed verts to the GPU. That way your pixel shaders and "vertex shaders" are running in parallel. Parallel is always good =]. But in a game that''s CPU intensive already, I''d definately push as much work as I can to the GPU.

Basically you want to find out what you''re not using, if your GPU is dyyyyyying because of all the pixel shaders, maybe you should do T/L on the CPU. But, if your CPU is dying, throw everything u can at the card.

--Navreet

Share this post


Link to post
Share on other sites
Guest Anonymous Poster
If using deferred lighting (http://www.beyond3d.com/articles/deflight/) then you could get away with transforming vertices only once (not counting shadow passes) and writing the needed data to multiple render targets (positions, normals etc). The subsequent lighting passes then basically only involves rendering a quad where the the pixel shader does all the magic. I think this method has great advantages and will be used more in the future but it is heavly fillrate limited.
Screenspace post processing is another increasingly popular technique that can increase image quality significantly and is practically fillrate only.
As others mentioned, your friend should ofcourse use T/L if available.

Share this post


Link to post
Share on other sites
but isnt it shocking to see everybody come up with new techniques that eat fillrate like chips, but everybody is worried about submitting geometry with max speed when it just doesnt matter? its like buying the most expensive and fastest car when you already know you will spend all the time in one huge traffic jam. so the one and only benefit i see is that the cpu doesnt need to feed all vertices and can just say "ey, draw that stuff over there" and go do something else.

also, as we''re talking about a pipeline. is it even remotely useful to do vertex stuff on the cpu because your gpu is having a hard time with fragment processing? i was somehow under the impression that they work in parallel, the next vertices being prepared while the last triangle is rasterized. in that case it would be pointless to do vertex processing on the cpu, as it''s "free" on the gpu if fragment processing for a triangle takes longer than vertex processing (so more or less: always).

someday i really need someone to give me an indepth explanation of whats going on in those cards.

Share this post


Link to post
Share on other sites
It is much faster, per frame, to do T&L on the graphics card instead of the CPU. However if lighting or other things are static then you should do that on the CPU so the card doesn''t repeatedly recalculate the same things each frame. So vertex lighting and lightmaps can be calculated on the CPU (or in the level editor) for static parts of the level, but anything dynamic should be done on the 3D card.

~CGameProgrammer( );

Screenshots of your games or desktop captures -- Post screenshots of your projects. There''s already 134 screenshot posts.

Share this post


Link to post
Share on other sites
You''d be amazed, but 90% of games out there (give or take) are still CPU limited. That means that the CPU is working harder than the graphics board.

However, if you set up your render loop correctly, the CPU can be doing stuff while the pixels are drawing. Something like:


ClearBuffer
RenderAllTriangles

DO CALCULATION

FlipBuffers



What this does is sets up the render (by passing all the vertex data to the card) then while the pixels are all rendering, it does all the calculations (which need to be done anyway). Then, it flips the buffer to the screen.

FlipBuffers (or whatever it is that your API uses) has to block until the options are done. When you call DrawPrimitive (or whatever), that queues up the data to the card, but calling Flip forces the card to finish all rendering before it proceeds. Since the fillrate has been declared as the problem, doing most of your CPU calculation (sound, AI, physics, etc) after all the geometry is queued up is the best place to put it.

On that note, if you use Hardware T&L (vs. do-it-yourself transforms), that gives you more time for the "DO CALCULATION" part of the game, meaning you can add more physics detail, better AI, more complex sound processing, whatever. Because if you''re not using hardware for the T&L, your "RenderAllTriangles" code takes longer on the CPU, because it has to do all those transformations.

Point is, the more you can offload from the CPU to other devices, the more stuff you can do using the CPU that can''t be done with other things.

Finally, as the Vertex processor and pixel processor are generally two seperate chips on the graphics board, yes they can work in parallel.

Another interesting thing to note (that I got from an ATI optimization paper), is that you can determine which part of the program is the bottleneck, and if you can''t decrease the amount of time that the bottleneck is active, then you can add more to the others (as long as they''re parallel). For instance, if your application really IS fillrate limited, you can generally add more vertices to each object at no cost (up to a point). This is untrue when:

1. The number of vertices increases significantly enough to become a bottleneck (this is rare - on the GPU, vertex processing is, I believe, the least common problem)

2. The number of DrawPrimitive calls increases - each time you call DrawPrimitive, that''s CPU intensive. Thus, the less DrawPrimitive calls, the better. And if you increase the number of calls to add polygons (vertices), then you''ve increased the amount of CPU usage, which DOES become a problem very quickly.


Hope that clears stuff up, and I hope I typed that all correctly. It''s really very late

Share this post


Link to post
Share on other sites
One thing people are failing to mention is that the GPU is highly pipelined. It is set up to process the vertices of so many triangles while previous triangles are being filled to the screen. The GPU doesn''t simply stop processing vertices to do some pixel filling and vice versa. It does both sequentially like production running down a conveyer belt. What this means is that certain parts of the pipeline can get bored waiting for bottlenecks to clear upstream. I can''t imagine not using T&L if it''s there, and I wouldn''t offload some vertex processing to the CPU ever. I''d look for why I''m processing more vertices than the fragment shader can keep up with. I''d look to simplify meshes or cull more geometry. If I''m more fill bound, I might look to increase the resolution of meshes as the vertex processor is begging for more work. Either that, or I''d look to fill less pixels by reducing overdraw where possible or using simpler pixel shaders.

Share this post


Link to post
Share on other sites