Archived

This topic is now archived and is closed to further replies.

Prozak

3D Engine Acceleration Techniques

Recommended Posts

Prozak    898
Hi, I''m trying to compile a list of ways in which a programmer can speed up the FPS of its engine. Currently I have: - Triangle Strips - Display Lists (OpenGL) - Binary Space Partioning - Frustrum Culling (?) - Octrees (?) - Quadtrees (?) there must be a few more ways, both in opengl and directx. Could you guys help me make this list a bit more comprehensive?

[Hugo Ferreira][Positronic Dreams]
Need [3D Artist] & [Sound Designer]
for small project. Contact me plz...

Share this post


Link to post
Share on other sites
duhroach    225
Rather than "acceleration techniques" i consider most of those things you mentioned, quite standard.
For any rendering engine, you really need to break down the scene into tree distinct functions.
1. Spacial subdivion: (Octree, quadtree, BSP, ABT, etc) Divides the scene space up into sections that can be easily managed and tested against.
2. Culling: (frustum, Occlusion: HOM, PVS, PLP, cPLP etc) Handles checking if geometry is visible to the viewer. If it''s not visible, don''t render it. In this case, the check of visibility is far less than the actuall rendering of an object.
3. LOD: (CLOD, ROAM, GeoMipMap etc) Distance based tesselation of elements. The farther you move from something, the less detailed it needs to look.

Those three I consider standard for an engine. They are called "acceleration algorithms" but that''s only in contrast to, what i would suppose, poly soup rendering.

As far as the other things you listed, vArrays, Display lists, TriStrips, VAR, VAO, CVA etc.. Those I consider acceleration algorithms, mainly because they directly change the way you render your geometry, rather than the above 3 systems, which reduce the amount of geometry rendered. There''s practicly thousands of ways to speed up an engine, you have to sit down and pick an area specificly to speed up in order to get a detailed list.

~Main

==
Colt "MainRoach" McAnlis
Programmer
www.badheat.com/sinewave

Share this post


Link to post
Share on other sites
Raduprv    997
From what I noticed, the slower things, in OpenGL are alpha testing and [alpha] blending. So, try to avoid them, as much as possible
I implemented once frustrum culling, in my engine, and got a drastical 0% increase in the frame rate.. (even tho the polygon count decreasted by 60%) :D
Also, I tried some display lists once, but they actually slowed down(!!) the engine...


Height Map Editor | Eternal Lands | Fast User Directory

Share this post


Link to post
Share on other sites
billybob    134
i''ve found that with HW T&L that LOD doesn''t really help much except the most simple types of LOD like geomipmapping/LOD involved with a LOT of geometry (terrains). using LOD meshes on little things seems to be slower than just drawing it.

of course, it might have been my crappy mesh programming as well.

Share this post


Link to post
Share on other sites
Ingenu    1629
-View Frustrum Culling via Spatial Subdivision (any kind of tree, portals...)

-Occlusion Culling (remove invisible objects which are in the frustrum)

-Indexed primitive, they are very cache friendly. (Triangles, or strips)

-Front To Back Sorting up to 600% increase in performance compared to Back To Front, while Random is just 180% perf increase.

-Alpha Testing way faster than Alpha Blending but not always usefull.

-Some form of geometry reduction, depending on your poly count.
(with a mid-low poly count that won''t be that needed)


Prefer to remove what''s invisible than reducing poly count to compensate for failing to remove invisible objects.

-* So many things to do, so little time to spend. *-

Share this post


Link to post
Share on other sites
Prozak    898
could u give a brief description of each of these methods,
for those of us now trying to further developed their engine
with more advanced methods?

thanx,

[Hugo Ferreira][Positronic Dreams]
Need [3D Artist] & [Sound Designer]
for small project. Contact me plz...

Share this post


Link to post
Share on other sites
technobot    238
quote:
Original post by Raduprv
I implemented once frustrum culling, in my engine, and got a drastical 0% increase in the frame rate.. (even tho the polygon count decreasted by 60%) :D
Also, I tried some display lists once, but they actually slowed down(!!) the engine...


It's likely that your app was framerate-limited (i.e. framerate was the bottelneck, rather than geometry processing). If an app is framerate-limited, than two rules apply (among others):
1. Reducing the amount of rendered geometry won't speed thigns up (and increasing it won't slow things down - that is, until the app becomes geometry-limited).
2. Since display lists tend to require quite a bit of fillrate (or smt like that), they're likely to decrease performance (depends on the actual situation, e.g. it may depend on the amount of memory needed to store the display list).

As a general rule, you should always identify your bottleneck(s) first, and then deal with them accordingly. There are whole papers on this at NVIDIA's and ATI's developers pages.

A few techniques of the top of my head:
1. If possible, make sure all your geometry is made of indexed, vertex-cache-friendly triangle strips (triangle lists will also be fine, if AGP bus bandwidth is not much of a concern). See this paper.
2. Silhouette clipping is a great way to reduce geometry without sacrificing visual quality (almost).
3. Whenever possible, make sure the relevant geometry and textures are in video memory.
4. Don't use any textures that are bigger than the minimum that is necessary to acheive the desired visual quality.
5. Don't use more texture units than you need. In particular, you may want to turn off detail maps where they're too far to contribute much (although you shouldn't bother doing this if you're not fillrate-limited).
6. Make sure all irrelevant fragments get discarded as early as possible in the pipeline. Use depth-testing (with rough front-to-back sorting), stencil-testing, and whatever else is relevant to acheive this.
7. Minimize expensive state changes.
8. Strive for maximum GPU/CPU concurency.
9. Put as much work as possible on the GPU, and as little as possible on the CPU (you'll need the CPU for physics etc.). However, if after all non graphical work is done, your CPU is "underemployed", consider moving some of the load onto the CPU, to free the GPU for more geometry/fragment processing.
10. Don't simulate cloth, particle systems, and similar if they're not visible.
11. If you can pipeline some expensive CPU-bound process, do so, and use a separate thread for each stage.

Yann, I think you can contribute a lot to this thread...


Michael K.,
Designer and Graphics Programmer of "The Keepers"



We come in peace... surrender or die!

[edited by - technobot on March 6, 2003 4:01:43 PM]

Share this post


Link to post
Share on other sites
Raduprv    997
My problem is that my engine is isometric, so I don''t need most of the culling techniques, since I kind of know what is visible and what not, what is supposed to be in front of something else, etc...
But, still, after trying all the optimizations I can think of, I am unhappy with the frame rate. I mean, seen some other 1st/3rd person engines (which are supposed to be slower), but they are actually faster, and more detailed... That, or people lie about their FPS :D

Height Map Editor | Eternal Lands | Fast User Directory

Share this post


Link to post
Share on other sites
duhroach    225
Well, most acceleration techniques are based upon the assumption that there''s stuff you don''t want to render beacuse it costs to much on the card. For Iso engines etc, the checks to determine visibillity cost MORE than actually rendering the geometry. In those cases, the best place to speed up your algorithms is in design of the engine. Go through and re-consider your structures, how many checks are you doing, rewrite your hit collision, divide different operations between frames, rather than every frame, etc etcc etc.

FPS push a lot of polies, therefore those checks to determine visibility save TONS in relation to rendering all those extra textured, lit, shaded polygons.

Now, if a given FPS scene is rendering the same amount of polies as one of your ISO scenes, and you''re still slower, well, rework the engine my friend, rework the engine.

~Main

==
Colt "MainRoach" McAnlis
Programmer
www.badheat.com/sinewave

Share this post


Link to post
Share on other sites
Raduprv    997
It is not the engine design, because the limitation is placed on the GPU, not on the CPU (that is, there is no difference between 750 MHZ and 2 GHZ computers, if they both have the same video board (well, in fact, there is a difference, but VERY small)).
And i also don''t have too much overdraw either (except in some cases, when the player is near water), but, in general, the overdraw is around 20% or so...

Height Map Editor | Eternal Lands | Fast User Directory

Share this post


Link to post
Share on other sites
Ysaneya    1383
quote:

2. Since display lists tend to require quite a bit of fillrate (or smt like that), they''re likely to decrease performance (depends on the actual situation, e.g. it may depend on the amount of memory needed to store the display list).



*Cry* Please explain how display lists could have any impact on the fill-rate, since the rendered geometry is exactly the same than if you just used IM or standard VAs.

DLs are not likely to decrease your performance; they''re likely to increase it. It can happen that they slow down things a bit, but i''d blame it on the drivers, or a misuse of them (like trying to put 200 Mb of data in display lists, or rebuilding them every frame).

Y.

Share this post


Link to post
Share on other sites
technobot    238
quote:
Original post by Ysaneya
*Cry* Please explain how display lists could have any impact on the fill-rate, since the rendered geometry is exactly the same than if you just used IM or standard VAs.

DLs are not likely to decrease your performance; they''re likely to increase it


Yes, they will increase performance, if they are used correctly. For example, if you''re using vertex arrays, encapsulating the glDrawElements calls in display lists makes little or no sence. I don''t remember all the thoery exactly, since I haven''t used display lists in quite a while...

You are correct that the geometry is exactly the same in display lists as in IM or VA (assumming you use your DLs for geometry - you might just as well use them to encapsulate state changes and such), but that geometry is stored in a different way, which may affect performance in varying ways, depending on the implementation of the DLs, the information (and the amount of it) that the DL contains, and your bottlenecks. I''m not sure it is fillrate that is the problem here, perhaps its memory bandwith or some other related thing... I don''t remember.

Btw, the OpenGL FAQ has some good pointers regarding this issue under "Display Lists and Vertex Arrays".


Michael K.,
Designer and Graphics Programmer of "The Keepers"



We come in peace... surrender or die!

Share this post


Link to post
Share on other sites
GameCat    292
So you really have no idea, right? Well I have an idea. There is no way in hell a dl could decrease fill performance. The only (far fetched) scenario that could possibly impact framerate in the way you describe is if the DL eats up that final piece of video memory, forcing you to swap textures in and out every frame. Putting drawElements in a display list will probably help, at least with good drivers, since it will transfer the vertices to more optimal memory (like AGP or video mem). If there is a difference between immediate mode and vertex arrays inside a dl is not as clear, but it's usually simpler to have a single way to render everything, i.e vertex arrays. DLs are a simple way to dramatically increase geometry performance on most implementations, but might not be feasible since they're immutable. Of course, this only matters *if you're geomtery limited*, which 99% of the time you aren't. If you're interested in geometry performance I suggest you check out Ysaneya's excellent OpenGL geometry benchmark here. Here are a few practical acecleration tips for the original poster:

* Use vertex arrays, specifically glDrawRangeElements
* Use display lists or the extensions NV_vertex_array_range or ATI_vertex_array_object.
* Make sure your triangles are roughly in triangle strip order, i.e tris that are close in the index stream are close in the mesh. NB: You don't have to use actual strips, just send regular tris in rough strip order.
* Decrease resolution, you're probably fill limited.
* Send large batches of stuff to the graphics card. Don't tinker around with individual triangles, just render them.
* Avoid setting redundant state and avoid changing state if you can. Dont go overboard though, like sorting *triangles* by texture (see previous point).
* Don't render what you can't see. Google worlds: spatial partitioning, oct-tree, quad tree, BSP, occlusion culling, dPVS manual, frustum culling.
* Avoid blending, it eats memory bandwidth. If you have to use it, enable alpha test as well and use it to discard fragments that have little impact on the framebuffer.
* Avoid using esoteric OpenGL functions that are often slow like e.g. two sided lighting, edge flags, selection, feedback and polygon antialiasing.
* Use mip mapping.
*Finally, make it work first, then make it fast

[edited by - GameCat on March 7, 2003 9:33:34 AM]

Share this post


Link to post
Share on other sites
GreatOne    122
I''m writing a writing a 3d terrain engine and was getting 570 fps on a geforce4 ti4600 with 32768 textured polys (tri-stripped, in locked vertex arrays). I then just decided to restructure my engine, just modify it a little so it was just a tad bit more speedy in loading the terrain. Lo and behold once I was done, I got 620 fps. All I did was move a few index arrays to a static class, make sure they were only set up once, and stuff like that that happens at loading that I would have never thought would help my fps. Sometimes, a simple restructuring, just very simple code changes can speed it up.

-------------------
Realm Games Company

Share this post


Link to post
Share on other sites
walkingcarcass    116
There''s a load of dirty detail like the angle of the texture to the camera can affect the speed of texel retrieval, the overhead of re-transforming shared vertices etc etc.

Visibility tests for objects can be sped up several times by subdividing space. Once the world is in areas you can do a dead fast "if object x is not where it was last time, search" and then for each visible area, consider only it''s own objects.

A* pathfinding can be sped up a lot with pooled memory, partial paths etc. There''s a great article by Dan Higgins.

If you can afford the memory, have plenty of variable levels of detail for models.

Tell Windows to knock your threads'' priorities up a notch, you can get a big speedup if other applications are running in the background.

One of the biggest bottlenecks is memory. Someone''s written an AMD MMX memcpy() which is about 3x faster than movsd. It accomodates arbitary alignment and sizes, so you could write an even faster blit version which expects a cache-aligned start address and a length which is an integer multiple of 4 or 32 or whatever.

If you use a lot of inefficient standard library functions (especially math and string manipulation) it may be worth writing your own.

A little software rendering eg for the HUD can reduce eg Direct calls. Your own blit is likely to be faster when used on many small areas.

Sometimes loops and recursive functions can be eliminated eg sum of a series.

Write performance-monitoring code. I have a function definition macro that counts the number of times a function is called in the debug build.

Critically examine your data structures for cache-thrashing.

my brain hurts...

********


A Problem Worthy of Attack
Proves It''s Worth by Fighting Back

Share this post


Link to post
Share on other sites
63616C68h    122
quote:
Original post by Raduprv
From what I noticed, the slower things, in OpenGL are alpha testing and [alpha] blending. So, try to avoid them, as much as possible
I implemented once frustrum culling, in my engine, and got a drastical 0% increase in the frame rate.. (even tho the polygon count decreasted by 60%) :D



Raduprv, you can''t change the refresh rate of your monitor (actually you can adjust it in BIOS menu, it''s always going to be 60 FPS, 75, or maybe 80 FPS....Sound familar? If you notice your frame rate in limbo at 60 FPS no matter what, then you have "VSYNC enabled." This has been talked about a lot in the forums, especially when public testers report "60 FPS." That''s a false statement, but isn''t made with knowing any better. No, I don''t know how to disable it, so that I could see the actual engine performance, but many people here do. More than likely, your frustrum culling abated the strain on your CPU and GPU, but you just couldn''t notice because of the mechanics of VSYNC, which I can''t explain to you at the GPU pipeline level(or maybe driver?), because I don''t know anything about it. But, yep, that''s probably your problem.

Share this post


Link to post
Share on other sites