Sign in to follow this  

Objects render themselves or does something render them?

This topic is 4861 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

I'm (still) working on my game engine and I've to come to rendering. Right now I'm working with 2D surfaces that can have as many vertices that they want. Should each surface render itself or should I somehow combine its vertices into a vertexbuffer? I'm pretty sure that each object rendering itself (about 100 objects) would be VERY slow, but I don't know how to do it any other way. Any help would be appreciated! Thanks PS: I'm using DirectX9

Share this post


Link to post
Share on other sites
I use openGL but this shouldnt matter in the design sense.

each object loads the vertex info, sends it to a "renderer"
and the renderer manages the data internaly and returns a
unique id to the object (so data can be addressed from outside
the renderer)

here is my Renderer.h file
(with some closed source stuff removed)

namespace render {
using namespace std;

template<class type>
struct Polygon {
struct {
type a,b,c;
} poly, tangentSpace;

types::word shader;
types::word ownerBufferId;
};

struct VertexBuffer {
vector< math::Vector3f > verts;
vector< math::Vector3f > norms;

math::Vector2f tc[ constants::render::MaxMultiTextureUnits ];
};

struct PolygonBuffer {
vector< Polygon<types::uint32> > polygons;
VertexBuffer *vb;
};

class Renderer : public IRenderDevice
{
public:
pushPolygons( Polygon *p, int count );
pushVerticies( int pbId, int count, int tcOnly, math::Vector3f *v = NULL, math::Vector3f *n = NULL, math::Vector2f *tc = NULL);

int createPolybuffer();


// called in this order...
void prepareRender(); // sorts shader state etc.
void flushBuffers(); // does rendering to screen
void finish(); // clean-up
}; // Renderer
}; // Namespace




obviously thats a heavily cut down version, but you should
be able to see what im doing.

Share this post


Link to post
Share on other sites
something along the lines of what silvermace said. It's always best to batch thigns up and render them all at once then render many small things individually. Instead of each object rendering themselves, you can make each object take a pointer to your renderer. Your renderer can have a render queue. Each object can put itself on this queue. A queuable object would need to give certain information about itself to the render queue - ie: What texture ID is it using, the vertex/pixel shader ID it uses, it's material ID and etc any kind of information you need to render any object in your world. The render queue can then sort the objects by textureID/vsID/psID/whatever and can render the objects in batches.

Share this post


Link to post
Share on other sites
Okay, I'm not quite understanding what is going on, so I'm going to try and make up a situation.

I have 2 quads
One is at (0,0)
Two is at (64,0)
Both are 32x32
8 vertices (using tri-strips)

One and Two are instances of the Quad class.

I have a Renderer class, which contains a list of Quad's.
When I call the Renderer's DrawAllQuads() function, what do I do?

Do I combine all of the Quads vertices into a single vertex buffer, then make calls to DrawPrimitive(vertexBuffer, StartingVertex) increasing StartingVertex by 4 each time?

Or do I make a vertex-buffer for each quad?

Sorry if I'm being a pain, and thanks!

Share this post


Link to post
Share on other sites
Typically you don't create vertex buffers during the frame, only at init. It may take a while.

Data which remains constant, and only changes as a result of World/View/Proj transforms, or shader work should go into static vertex buffers. This uploads the data to the card once, and refers to the data on the card there-after. This is fast.

Data which changes often via CPU calculation should go into a dynamic vertex buffer. This uploads the data as it's needed, each frame. This is slower, but is necessary is the data is changing.

Draw calls are slow. Drawing less than 1000 polygons at a time is wasteful. Sometimes you need to, other times you can work around it. For example. If I load a tree mesh, and it's 100 polygons, I put 10 copys into the vertex buffer. When I render, I can render upto 10 trees at once using a vertex shader to select a different transform matrix.

Quads are special. Even if you batch them with a shader, you're still not rendering many polys. Often it's better to world transform them on the CPU, set the D3DTS_WORLD to identity, copy them all (all that share the same texture, etc.) into a dynamic VB, then render that. This allows you to render hundreds or thousands of quads without killing your framerate.

Objects can setup their render properties (textures, texture stages, render states). Actual draw calls either need something else, higher up, to handle it, or to have an interface in your objects to allow optimal batching. Either way will work.

Share this post


Link to post
Share on other sites
Ok, I've figured out that I don't want to use tri-strips for anything and that they're whats causing me headaches.

@Namethatnobodyelsetook: "Quads are special. Even if you batch them with a shader, you're still not rendering many polys. Often it's better to world transform them on the CPU, set the D3DTS_WORLD to identity, copy them all (all that share the same texture, etc.) into a dynamic VB, then render that. This allows you to render hundreds or thousands of quads without killing your framerate."

By this, do you mean render them in 3D? if so, then don't I have to worry about my other objects walking "through" my quads?

Thanks for all the help guys!!!

Share this post


Link to post
Share on other sites
Bad advice:

I figure any sort of buffer and rendering orginization I do is going to suck. As my current project is my first engine, and my first use of DirectX even I'm doing it the naivest way possible that works. So far, everything is going perfectly fine; if terribly inefficient.

One quad drawing class has a static vertex buffer per quad and switches textures per frame per quad. [because naively, changing memory locations since the textures are loaded in memory should be fast, and creating buffers should be slower than changing them]

Another has a dynamic vertex buffer [created/destroyed per frame] but has static texture and many many quads.

So far the 2nd viariant gets me around 20,000 polys before becoming noticably slow. I've not seen the first become noticably slow. More than enough for my turn based 2d game. And when it stops being enough, I'll look into fixing some things, now that I know better.

Share this post


Link to post
Share on other sites
You've still got your Z buffer handling depth issues for solid objects. All solid quads can be rendered "whenever".

For semi-transparent objects you need to maintain back to front ordering, as with non-quads. If you're just rendering quads that all have the same texture, just ensure you've Z sorted them before going into the dynamic buffer. If you have a mix of different objects, or objects with varying textures, you might not be able to batch them efficiently. Whether it's worth the trouble of trying to batch at all, while maintaining strict back to front Z ordering, really depends on your app and your art.

I'm just pointing out that one-size-fits-all doesn't work... and batching is critical if you're dealing with many small objects, like quads. How you batch, and how you render really depends on what you're doing. An engine doing a tiled 2D sidescroller will be designed "a bit" different than an engine doing Doom 3. If you're 2D, whether you're doing lots of alphablending, only alphatest, or just solid squares changes things somewhat. What's your art like? Do you have texture sheets, or a texture per frame/tile? Texture sheets really help you build a good batching system.

What I can tell you is that each object having it's own VB isn't good. There is overhead in making them. There is overhead in switching between them. The choice of static and dynamic buffers really depends on the app. The choice of how to do batching depends on the app. We don't have enough information to recommend handling things in any specific way... but with what we've said you can hopefully see how your choices will affect things.

Share this post


Link to post
Share on other sites
Okay, here's some info before I ask any more questions:

I plan on using the engine for simple 3D models (including levels), a particle engine, and a GUI.
The GUI and particle engine are going to allow for transparency.
The particle engine will use a single texture for every particle, while the GUI is going to use a texture-sheet.

I'm just having a little problem understanding the logic behind some of this. Using what you said earlier (about dynamic and static VBs) if I had 4 characters on the screen and 1 left (dies and is never coming back), would I have to recreate the entire vertex-buffer (thus making it a dynamic VB)?

Do I sort all of the same type of objects into a VB (i.e. all quads in one VB, all character models in another, the level in its own, etc)?

To learn more about this, should I buy one of the following books:
3D Game Engine Programming
3D Game Engine Design
Real-Time Rendering
Or is there another book that would help?

Thanks for your patience and for helping!

Share this post


Link to post
Share on other sites
You shoulden't create vertex buffers on a per-object basis (like one for cars,one for trees,one for the skybox...) but on a per-state basis (one for objects with texture coords, one for objects without normals,one for objectc with lighting...). If all your objects have the same data type then use only one vertex buffer (it donesn't have to be large enough to contain all the objects at the same time, take a look at the PointSprite d3d sample).

Share this post


Link to post
Share on other sites
my engine does dynamic buffer filling because it makes heavy use
of multi-pass shaders.

1. buffers all empty
2. push in data on visibility basis
3. sort (my engine sorts as data is inserted but i use a free (ie. not time taken) sort that is very very memory intensive)
4. render in as few calls as possible and as few state changes as possible.
5. clear buffers and hand control over to engine

rinse and repeat

Share this post


Link to post
Share on other sites
You don't have to recreate your VB if a player leaves, or even make it dynamic because of it. A VB is just a collection of vertices. You don't have to render all of them, and you can render any part of it multiple times if you'd like.

So, lets say each player has a unique mesh, and you put all 4 meshes in a single VB. Player 2 quits. Ok, so we render player 1, 3, and 4, and ignore the data for player 2's mesh. We don't delete it.

Lets say we have 12 enemies of a single type. It's data only needs to be in a VB once. When we render, we ask the GPU to use the same vertices in each draw call.

That's the basic stuff. The more advanced part is trying to batch many render calls together to improve performance. If you're not at this point yet, you can ignore this, but keep it somewhere in the back of your mind. Your design needs to take this into account somehow.

The options for batching are:

1. Use a 6800 and the 9.0c instancing API, and ignore the 99% of the market where this won't work.
2. Use fixed pipeline indexed skinning and ignore the 75% of the market where this won't work.
3. Use a vertex shader to apply various transforms based on an ID, exactly like a boned character. Software emulation of shaders means this works on 100% of all cards.
4. Manually transform each object and use a dynamic VB. Works on 100% of all cards.

You can use #1, or #2 in hobby code. You can even use #1 as a high end feature, like FarCry. You'll need to support #3, and/or #4 though, if you want a decent sized market.

Now, lets assume vs.1.1 capable hardware. We have 96 constants. Lets assume we're not lighting, fogging, etc., just trying to get world, view, and proj transforms done. We need 4 register for view*proj. Then 3 registers per world matrix, so we can handle a maximum of 30 matrices at once. So, if you're going to render quads, the maximum batch size is still only 60 polygons. Batching 30 will certainly help, but if you have many to render, the overhead of DrawIndexedPrimitive takes it's toll. It's better to use #4. That why I say quads are special.

Lets say each of those 12 enemies is only 100 polygons. Rendering 100 polygons at a time isn't optimal because each DrawPrimitive has a *huge* overhead. We *could* put 30 copies of the mesh into the VB, but now we're using lots of video ram. Depending on your game this might be okay. After all, we're only putting in extra copies of small meshes. When we render we render the 12 enemies at once. Great. Now lets assume we don't have as much video ram to spare, and we put in 5 copies. We render 5, render the next 5, then render the last 2. It took 3 draw calls, but that's better than 12. Remember, nVidia advises a maximum of 300-500 draw calls for your entire scene. Once you start adding in all the subtle details, you'll need any performance boost you can get.

You're probably not ready for that level of performance tuning just yet, BUT it does show that objects shouldn't just blindly render themselves one at a time. I agree with Krun. Make one VB per vertex type. If you're dynamically loading objects and don't know how big of a VB to make, just create it larger than you need. As you load more meshes, load and place the new mesh at the end of the previous data. If you fill the VB, make a new one, and start filling that. Try to share your VBs among many objects.

Share this post


Link to post
Share on other sites
At what point does the performance loss from DrawPrimitive or VB/texture swapping become more than the performance loss of doing all the batching logic though?

Meh, perhaps I'm being too contrary; but as a beginner programmer it's difficult enough to just get things working, let alone worry about performance issues.

Share this post


Link to post
Share on other sites
Don't worry about performance until you're comfortable with D3D. Until you start doing something approaching a commerical game you won't need to optimize very much.

There's a difference between trying something, doing test apps, etc, and writing an engine. Programmer16 started off saying he was working on his engine. That implies he's got a fair grip on what he's doing, and wants to make a chunk of solid reusable code. If you're at that stage, then even if you're not going to put batching in yet, it's still good to be aware of it, how it will affect things, and how it can affect your design.

It would be fairly difficult to make your batching logic slower than doing individual draw calls. What I do is, I give each object that can batch a unique number, ie: Enemy1=1, rock=2, tree=3. My sorting routine first sorts by transparency, then, for solid objects, by this batch id, then material, then Z. When I go to render, I just need to check how many objects in a row have identical ids and identical material pointers. I just program a bunch of world matrices into the constant registers (and opacity values in my case), and then do a draw. The overhead of batching like this is almost 0. The sort routine takes 0.1% of my CPU time (merge sort of an Amiga style doublelinked list).

Share this post


Link to post
Share on other sites

This topic is 4861 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this