Archived

This topic is now archived and is closed to further replies.

SimmerD

Modern Graphics Engine Design

Recommended Posts

If anyone has questions about the slideset, I''d like to answer them here, so others can benefit as well. BTW, I''m speaking at the Indie Game Con tomorrow. The subject will be "Practical Per-Pixel Lighting". I will discuss averaged L bump mapping, diffuse & specular bump mapping techniques and tradeoffs, including bumpiness, an alternate way to do shininess, and specular color shifts.

Share this post


Link to post
Share on other sites
Just finished reading through it. It was great!

The slide discussed the AABB-trees alot, for collision detection. But what about rendering, and frustum culling? Are they any good for that, compared to Octrees, Quadtrees, etc.

Share this post


Link to post
Share on other sites
I''m glad you enjoyed it.

They AABB Trees certainly can be used for all of those purposes, but what you need to do for efficiency is to break out the rendering and collision at different levels of the tree.

The culling & rendering need hundreds of triangles per call to be efficient, and the collision needs about a dozen triangles or less to be efficient.

So, you could do rendering & culling at a higher level node of the tree with more triangles, and the collision at the leafs or a low-count node.

What I did was use a grid for the high-level, where I do culling and rendering, and an AABBTree in each grid cell just for collisions and raycasts.

In retrospect, I think you could just use an AABBTree for the whole thing and still be just about as efficient ( and take up less memory ).

Share this post


Link to post
Share on other sites
Nice slides, I've been looking forward to these since I first read the XGDX report. Is there anywhere we can download your test engine? Also, the screenshots are all overhead and show very gridlike geometry. Does this mean your grid structure only supports 3D tile engines or is it flexible enough to be used as a first person or third person engine?

One thing you didn't seem to go into is the issue of overdraw and occlusion culling. What would you recommend for that?

[edited by - impossible on October 10, 2003 5:13:26 PM]

Share this post


Link to post
Share on other sites
Hi, i really enjoyed it too, and i think the idea of answering questions for those who didnt here the talk is really good.

My question is this, i understand the idea to cull polygons efficiently, and to minimize the number of draw calls, but i want to know more about the rendering states switching, and mesh sorting for rendering.
Do you suggest to sort the meshes by shaders first, so switching shaders will be minimal? what about the second order sorting? and should i send the maximum number of triangles possible in each call? I mean, is the most efficient amount of triangles for batching is the maximum number of primitives the API allows for (around 65K), or is it less?

Oh, and is it possibile to download you per-pixel lighting talk somewhere?

thanks.

[edited by - pickups on October 10, 2003 5:18:56 PM]

Share this post


Link to post
Share on other sites
I'm interested in moving away from the old style BSPs (I'm using Q3 maps right now) and doing some sort of new format based on AABB. This is a little off topic, but where should I start in terms of editors, resources, etc.?

Also, how did you create the demo level?

P.S. What's the name of that flipcode tutorial you mentioned?

[edited by - Promit on October 10, 2003 5:55:59 PM]

Share this post


Link to post
Share on other sites
OK, to answer each question :

The engine can support any type of 3d geometry ( see the statue in the water ). For simplicity, I just use a text file to generate the walls procedurally, and right now they are very gridlike. It''s sort of like laying out an old d&d map in a text file. Each letter or symbol represents a wall, floor, etc.
I have an option to add ''noise'' to the geometry, and eventually I will add options to make things looked old and cracked, etc.

The engine is full 3d, but I take advantage of the top-down view. Because of this, there is no occlusion culling needed or implemented. Because of the far camera, I can have simple characters with just vertex lighting. Because of the simple characters, I can have many on-screen. etc. etc.

If you need occlusion culling, I imagine rendering using occlusion query at a high-resolution from many points in one AABB node looking at another node, you could generate an approximate PVS fairly easily from one AABB node to others.

Making levels for your own engine is hard. I have always sucked at making tools myself. I think the best approach is to make the simplest tool that matches the gameplay style you are going for. For instance, in my engine, a 2d tile-based level layout system would be ok ( with a 3d preview ). For an fps, use an existing q3 or ut level editor, and import things into your engine.

The engine is not available at this time; I''m not finished yet. Next is to finish the particle system and the collision detection & response.

Share this post


Link to post
Share on other sites
Guest Anonymous Poster
Ok, I just briefly read through the article. (don''t really have time to look at in depth, since i''m supposed to be working.) I''m looking into writing an engine as well. But i''m trying to gear more towards "lower end" machines. With graphics cards without pixel shaders. Do you believe that much of this still applies?

Also, why didn''t you choose to leave you''re collision data at bounding spheres/boxes/cylinders? for most games i would imagine that would be enough. Maybe you have something else in mind? Maybe i''m way off.

Anyway, I was planning on forcing a topdown''ish view as well but using loose quadtrees for both collision and rendering. Assuming that coarser collision data would be good enough for me, do you see anything wrong with going with those?

Share this post


Link to post
Share on other sites
Regarding lower-end machines :

In many engines, the #1 thing you can do to improve performance and quality on lower end machines is to use vertex shaders ( even in software ).

By trying to use fixed-function T&L, you can waste more CPU with low batch sizes ( due to constantly switching world matrices, and turning lights on & off ) than you spend in the SW vertex shader routines.

By using vertex shaders, you can also do a better job setting up the fixed function texture blending pipeline than the fixed T&L pipe could. For instance, you could never do averaged L bump mapping in the fixed pipe, but it''s easy in vertex shaders.

I plan to make low-end shaders for my engine, perhaps by dropping a couple of features. I''ll probably do diffuse bump mapping in one pass, then emissive, then specular in pass three. Perhaps I''ll kill the specular bump mapping, do just glossy specular instead, and combine passes 2 & 3 into one.

Alternately, I could break out emissive objects as a special case ( since they are so rare ), and do two passes by default.

I''ll have to kill some of the subtleties in the lighting ( like averaging n.l and n.l^2 ), but I believe it''s possible to get a decent version of the lighting done in 2 texture stages over 2 or 3 passes.

Share this post


Link to post
Share on other sites
The idea of using the same tree for rendering, collision, etc using different levels in the tree and so different granularities is an idea I''d been thinking about. My plan was to have dynamic nodes, e.g. you can change the rules for splitting down to lower levels at each node. Once you''ve got to the required granularity for rendering operations you can change the rules on lower nodes to split on different parameters to produce leaves that contain data more suitable for collision testing. You can flag the lower collision nodes so that the rendering system never knows that they exist and totally ignores them.

Share this post


Link to post
Share on other sites
Guest Anonymous Poster
quote:
Original post by SimmerD
Regarding lower-end machines :

In many engines, the #1 thing you can do to improve performance and quality on lower end machines is to use vertex shaders ( even in software ).




Interesting, I was un-aware of this. which version vertex shaders? basically, should I go for DX9 or would DX8 be enough?

Share this post


Link to post
Share on other sites
Guest Anonymous Poster
quote:
Original post by SimmerD
Regarding lower-end machines :

By trying to use fixed-function T&L, you can waste more CPU with low batch sizes ( due to constantly switching world matrices, and turning lights on & off ) than you spend in the SW vertex shader routines.



I''m confused, how can vertex shaders eliminate world matrix switches? I switch for every major mesh object in my engine (and transform billboards in software). Can this be done smarter using vertex shaders?

Share this post


Link to post
Share on other sites
Re: Vertex Shaders - Dx8 vs.1.1 is enough to get most of the benefits. That''s what I use today.

One reason why vertex shaders can help is that by recoding things to batch well using vertex shaders, you can save CPU during primitive submission.

Re : Vertex Shader Batching.

Yes, treething is right.

What you do is burn memory to save CPU. Instead of having 1 vb for each tree. and drawing each one separately, you write an ''instancing shader''.

You make a copy of each tree''s vertices, and add an index to each vertex of each tree. So, for a billboard, you might have a vertex that looks like this :

xyz, uv, Index 0
xyz, uv, Index 0
xyz, uv, Index 0
xyz, uv, Index 0

xyz, uv, Index 1
xyz, uv, Index 1
xyz, uv, Index 1
xyz, uv, Index 1

Before drawing each clump of trees, you can set up a bunch of matrices in the vertex shader constants.

In the vertex shader, you use the ability to do indexed constant reads to use the index to choose the proper world matrix for that tree or billboard, etc.

Another approach is just to put many trees in a clump, and cull and draw them as a unit. This way you only change world matrix every dozen trees, rather than every tree.

You could combine the two techniques as well, so you have a dozen trees in a clump, and each clump is stored in ''clump space'', and each vertex in the clump has a matrix index. The vertex shader applies the appropriate matrix based on the index. This way you can frustum cull on a larger scale.



Share this post


Link to post
Share on other sites
Will there be any pdf/ppt focusing only on "Averaged L bump mapping"? Would be realy nice.

You should never let your fears become the boundaries of your dreams.

[edited by - _DarkWIng_ on October 13, 2003 1:29:31 PM]

Share this post


Link to post
Share on other sites
Heya, Great presentation, I''ve been rolling the entire thing around in my head for a few days now, tyring to work through bottlenecks and redesigns in my own engines. Few questions

1. How do you determine what light is lighting what polygon? It seems as though, per frame, in a dynamic world, this would require a full scene traversal down to polygon level.

2. Is your shadow algorithm based on a current "main stream" algo? (Shadowmaps/shadow volumes) or are you allowed to pull off your method due to the overhead nature of the engine you show.

3. You take some time to discuss the AABBox, although i understand what you''re saying, I''m a bit lost if you''re anti-KDtree? From the tests we''ve found (even here at the fourms with Yann''s ABT) the KDtree setup offers some great chances for proper scene division. Also, in your engine, do you split a polygon, ever?

Thanks again, I''ll have more questions later, but gotta jet off to class.
~Main

==
Colt "MainRoach" McAnlis
Programmer
www.badheat.com/sinewave

Share this post


Link to post
Share on other sites
hmm..

"Stencil shadow (on CPU or GPU). Comment: Limited to 3 lights per surface."

Wtf? Since when? Shadow volumes just mean you need another rendering pass per light. Where''d you pluck this limitation from?

Share this post


Link to post
Share on other sites
The lighting I use is semi-dynamic. Occlusion is precalculated for world geometry, so lights can''t move around. They can decrease in intensity, and change color for free at runtime.

The occlusion is calculated in a preprocess and stored in the vertex world geometry, via 5-17 raycasts per vertex.

For moving entities ( which are fairly small ), 9 raycasts are done per light for occlusion, and the vertex lighting is done in the normal way.

Re: Lights per polygon. Only 7 lights can be touching each 3d grid cell ( plus ambient ). So each cell has a list of which 7 lights are touching it.

Re : KD Trees. I actually coded these up, and there were so many splits generated, that it became infeasible for collision purposes. Because I wanted to use the same tessellation for collision and rendering, kd trees wouldn''t be the best approach. I started to change it to a loose-kd tree, and then realized this is essentially an aabb tree.

The engine only splits to the 3d grid cells, and the AABB Tree never splits.


Re: "Stencil shadow (on CPU or GPU). Comment: Limited to 3 lights per surface."

This is where the difference between hearing a talk and reading the .ppt comes in. I meant for performance reasons, you need to limit the # of shadowed lights that touch each surface. My understanding is that many stencil shadow engines in development today go to considerable trouble to limit the # of light/surface interactions for speed reasons.




Share this post


Link to post
Share on other sites
Hey thanks for the previous answers. Been thinking over some concepts, and have another question (open to everyone really)

How do you design your engine to be able to handle the multiple types of materials that exist? Say, extra processing required by BDRF vs an object with a reflective property requirnig a cube map, vs your standard one texture data. I''ve been tyring to devlop a generalized answer for this problem, but the only thing i can come up with is "You have to taylor make the engine to the game" which basicly comes down to reducing the type of materials that can exist in the engine. I''m wondering if there''s another / better way around this?

~Main

==
Colt "MainRoach" McAnlis
Programmer
www.badheat.com/sinewave

Share this post


Link to post
Share on other sites
You have to decide if you''re making an engine or a game.

It''s fun on an engineering level to create a general engine, but it can become hard later on to ship the game with good perf if there are too many material switches, etc.

One way to look at it is that you get about 250 draw calls per frame, depending on your CPU and frame time budget. By switching materials or shaders in ways that can''t be batched, you have to spend one of your draw calls for each object with that material or shader.

That''s fine, if the material or shader really adds something to the scene.

It''s not fine if it could have been reasonably collapsed to a similar material and done all in one go.

My test engine is designed around a game concept, so that I can make the tradeoffs I need for good speed. Currently it gets ~160 fps on a geforce4 4400, and 100fps on a gffx 5200 at 800x800 resolution.

What I do is to separate the things to be drawn into groups : opaque world geometry, entities ( characters, etc. ), particles, layers ( fog & mist & water ), and decals ( shadows and blood and scorchmarks ).

The opaque world is drawn by material in world grid cells ( ~250-500 polys per batch ). These do the averaged L and H bump mapping, emissive, gloss, vertex lighting and diffuse texturing in one pass. Water or fog density is stored in DestAlpha.

The alpha layers are drawn as 2 or 4 texture layers moving against one another, blended into dest alpha from 1st pass.

The characters are vertex lit only, but similarly to the way the world is lit, to make them match. They are skinned on the CPU into a dynamicVB.

The particles will be drawn a system at a time. All particles in the system will use the same texture page or texture cubemap.

The decals will be drawn all at once from a dynamicVB, using a cubemap to store 6 different decal textures.

By doing things in few batches, you can have more things in your scene for the same or better performance.

So, I am recommending you create your engine more tailored for your game. You really pay for generality in terms of performance and scene complexity. Just be sure the tradeoff is worth it.

Share this post


Link to post
Share on other sites
SimmerD: there is a discussion going on in this thread http://www.gamedev.net/community/forums/topic.asp?topic_id=185226 that is discussing using Genetic Algorithms to generate an efficient poly grouping hierarchy (specifically for culling, but I don''t see why it couldn''t be extended to collision and AI). It is being compared specifically to your approach with AABB-trees. I would love to hear your thoughts on this approach (either in this thread or the other one).

Share this post


Link to post
Share on other sites