Sign in to follow this  
JorenJoestar

Rendering design thoughts

Recommended Posts

Hello guys, I'm here to share some thoughts about rendering design! This is a really wide topic, with many different way of doing it, but rather I want to try and explore some concepts that are not known to me. Maybe you will find something interesting here! N.B. This can be considered a sort of brainstorming that I want to share! The key ideas behind the will to redesign my renderer is that, even if it is successufull from a shader management point of view (the effects are easy to implement, using cg) it lacks multithreading and scene management, even if there is some sort of culling. This is a list of keys: - Multithreading - Multiplatform (API-independent) - Data oriented - Flexible but fast The first thoughts is about multithreading, and I found in Command Buffers (as read in Emergent paper (Practical Parallel Rendering with DirectX 9 and 10, and also post on RealtimeCollisionDetection blog, or http://c0de517e.blogspot.com/2009/03/my-little-rendering-engine.html) and other really a good way of handling multithreading. The big question is : at which abstraction level is good to create commands? Gamebryo solution is to create a command recorder, that is a modified direct3ddevice, but I don't think it is the only solution. I think that it can be good also to provide many abstract-low level methods in your renderer, and then use them to provide API-indipendent rendering multi-threading. So basically design your "render device" with many low level methods that can be called as commands. Commands cannot create or destroy resources, but only set resources as current and draw. Going up in abstraction, there must me someone that submit commands (multiple queues per thread, then merged can be a solution). Who is responsible of submitting commands? The one who knows about informations to be sent. Another GREAT question: what informations can be submitted? And what relationship is between commands and those informations? How can you access them? This part is rather tricky for me. Emergent create the command buffer that really duplicates calls to direct3ddevice, so basically the command contains the stack of all the parameters to be sent to the real call. Other way of doing this is to create internal pool and use indexes (the ID???) inside these. (More post to follow...) [Edited by - JorenJoestar on February 9, 2010 5:15:59 AM]

Share this post


Link to post
Share on other sites
I've always loved the idea of what I called 'draw atoms': a really small structures that contains the only needed data to rendering. This include a shader effect, shader parameters, a mesh (for vertex/index buffers) and the worldmatrix (taken from the current mesh instance).
The rendering reduce to render a list of sorted draw atoms.
Conceptually it is different but it is close to command buffers and stuff like that.
The context in which they are rendered, and how, is handled by rendering STAGES: each STAGE can be a render to texture, but it is not necessary; stage is a way of rendering draw atoms.
Each stage may have its own camera.
Each stage can access a list of visible objects to be rendered.
Thinking about shadow mapping, I found really useful to apply the same effect to many different meshes, rendering to a texture.
BUT this lacks the handling of draw atoms: they are data only for a material-based rendering.
So far, a list of stages live inside a pipeline, that is the TOTAL render of the scene. For each stage that is rendered to a texture, this can be accessed by other stages inside the pipeline, so order is really important.
With this really simple rendering design, I successufully created both a light-prepass and deferred renderer with very little problems.
Problem here is the lack of multi-threading.

An interesting point here is that every complex-scene rendered can be described in a simple way (rendering of visible objects, so after culling).
Basically you can render object on a per material basis or applying a shader to all. You can render fullscreen quad as postprocess.
Try to figure out all the complex rendering pipelines and how they use these concepts.

(I'm trying to abstract the render process and then find a new way to handle different situation.)

Share this post


Link to post
Share on other sites
I moved to an even simpler abstraction for my current project. The render queue contains only a sort parameter and a functor - when the functor is called, it performs whatever drawing is needed (bind textures, shaders, render geometry, etc.).

I build a render queue by traversing the scene graph, performing culling, etc. as I go. The finished queue is sorted and handed to a renderer, which deals with all the low-level details.

I haven't expended the effort to multi-thread this process, because it doesn't make a significant presence in the profiler. If it does show up at some point I might consider it, but I imagine physics will continue to overshadow scene graph/render queue management for some time to come.

Share this post


Link to post
Share on other sites
Hi,

First of all excuse my English.

You are almost there. With the concepts of Draw Atom and Stages you describe could easily implement an deferred renderer.

You just need the command buffer abstraction. If I'm not misinterpreting you, the Stages are the ones submiting render orders. So think if that orders are just commands to a local command buffer... All your stages could be renderer in parallel.

So for example:

DrawAtom
{
BoundingVolume volume;

KeyInt key;
Shader* shader;
ShaderParamsBuffer* shaderParamsBuffer;
VertexStreams vertexStreams;
PrimitiveAssembler primitiveAssembler;
}

DrawCmdData
{
KeyInt key;
Shader *shader;
ShaderPramsBuffer* shaderParamsBuffer;
VertexStreams vertexStreams;
PrimitiveAssembler primitiveAssembler;
}

NormalStage
{
Frustrum viewFrustrum;
FrameBuffer fb;
Int stageKey;
vector<DrawAtom> visibleAtoms;
vector<DrawCmdData> visibleCmds;

void CollectVisibleDrawAtoms(SpatialOrganizer<DrawAtoms> scene)
{
visibleAtoms.clear();
scene.Cull(viewFrustrum, visibleAtoms);

visibleCmds.clear();
foreach (DrawAtom da in visibleAtoms)
{
DrawCmdData data(da);
data.key = AlterStageKey(data.key, stageKey);
visibleCmds.insert(data);
}
}

void Render(CommandBuffer& cb)
{
Int currentShaderKey = InvalidKey;
Int currentParamsKey = InvalidKey;

sort(visibleAtoms);

cb.SetFrameBuffer(fb);
cb.ClearFrameBuffer(...);

foreach (DrawAtom atom in visibleAtoms)
{
Int shaderKey = GetEffectKey(atom.key);
Int paramsKey = GetEffectKey(atom.key);

if (shaderKey != currentShaderKey)
{
cb.SetShader(atom.shader);
currentShaderKey = shaderKey;
}

if (shaderParamsBuffer != currentShaderParamsBufferKey)
{
cb.SetShaderParams(atom.shaderParamsBuffer);
currentShaderParamsBufferKey = paramsKey;
}
cb.Draw(atom.primitiveAssembler);
}
}
}

ShadowMapStage
{
...
Int shadowMapShaderKey;
Int shadowMapParamsKey;
Shader* shadowMapShader;
ShaderPramsBuffer* shadowMapShaderParamsBuffer;

void CollectVisibleDrawAtoms(SpatialOrganizer<DrawAtoms> scene)
{
visibleAtoms.clear();
scene.Cull(viewFrustrum, visibleAtoms);

visibleCmds.clear();
foreach (DrawAtom da in visibleAtoms)
{
DrawCmdData data(da);
data.key = AlterStageKey(data.key, stageKey);
data.key = AlterShaderKey(data.key, shadowMapShaderKey);
data.key = AlterShaderParamsKey(data.key, shadowMapParamsKey);
data.shader = shadowMapShader;
data.shaderParamsBuffer = shadowMapShaderParamsBuffer;
visibleCmds.insert(data);
}
}

...
}



The key is somehow an indexing mechanism (not fully exploted in the example) used to sort for material, transparency, etc. It could be exploted so that all the info is encoded in the key and the real data (Shader, ShaderParamsBuffer, etc) is accesed indexing data tables(Like in c0de517e link).

The command buffer just stores the data to be executed later in a format ready to use by the API.


CommandBuffer
{
Command
{
Int type;
union
{
...
SetFrameBuffer setFrameBuffer;
SetShaderCmd setShader;
SetShaderConstantCmd setShaderConstant;
SetSamplerStateCmd setSamplerState;
DrawPrimitiveCmd drawPrimitive;
...
}
}
vector<Commad> commands;

...

void SetShader(Shader* shader)
{
Command cmd;
cmd.type = SetShaderCmdType;
cmd.setShader.shader = shader;

commands.insert(cmd);
}

void SetShaderParamsBuffer(ShaderParamsBuffer* paramsBuffer)
{
foreach (ShaderParam param in paramsBuffer->params)
{
Command cmd;
cmd.type = SetShaderConstantCmdType;
cmd.setShaderConstant.data = param.data;
commands.insert(cmd);
}

foreach (SamplerState sampler in paramsBuffer->samplers)
{
Command cmd;
cmd.type = SetSamplerStateCmdType;
cmd.setSamplerState.texture = sampler.texture;
cmd.setSamplerState.filer = sampler.filter;
...
commands.insert(cmd);
}
}

void DrawPrimitive(PrimitiveAssembler pa)
{
Command cmd;
cmd.type = DrawPrimitiveType;
cmd.drawPrimitive.primitive = pa.primitive;
cmd.drawPrimitive.numPrimitives = pa.numPrimitives;

commands.insert(cmd);
}

...
}



Things to take into account is that the cmds generated inside the command buffer are ready to use by the underlying API (OpenGL, Direct3d9, Direct3d10, consoles own command/push buffers formats, etc). So for example in d3d10 the SetShaderParamsBuffer could be translated to a Constant buffer setting while in d3d9 a command setting a shader constant is submited per param.

The command buffer could be pregenerated to render a fixed scene. So no time is wasted culling, ordering calls, etc.

Finally the renderer interprets that command buffer and sends the commands to the API.


Renderer
{
void Render(CommandBuffer& cb)
{
foreach (Command cmd in cb.commands)
{
switch (cmd.type)
{
...
case SetFrameBuffer:
{
glBindFramebufferEXT(GL_FRAMEBUFFER_EXT, cmd.setFrameBuffer.bufferId);
break;
}
case DrawPrimitiveType:
{
glDrawElements(cmd.drawPrimitive.primitive, cmd.drawPrimitive.numPrimitives, cmd.drawPrimitive.primitive, ...);
break;
}
...
}
}
}
}

Share this post


Link to post
Share on other sites
Hola alagtriste! :)

Yes, actually stages calls for engine methods (abstracted) that performs rendering.
Do you think that I had to store also the informations needed to render in the command buffer? Have you tried an approach like that (Deferred function calls)?
I'm thinking of using function pointers to renderer commands indexed by the command id...

Thanks!

Share this post


Link to post
Share on other sites
Another aspect I want to explore is the culling process.
As you already know, culling is the process of cut-away not-interesting objects (based on a matching criteria).

Thinking about complex scenes, what is rendered on screen?
Do you render everytime only a single list of objects?

Actually there are different list of objects that can be created by different culling process.
Eg. shadow casting objects, lights.
So, potentially, there are n possible "views" of the same scene.
You cull out the lights you don't need based on the current frustum.
Consider a scene with an object like a security camera. There will be a scene rendered by that camera with complitely different objects: okay, maybe you can consider (as for reflections) objects that doesn't cast shadows, so you need to fill only a single list, but still there is another list to render.

In a multithreaded environment...how could you manage culling? A simple double-buffered list is a simple yet effective solution? You work on the next-frame visibility...can be a good solution?

Also...there can be different bounding hierarchies based on the type of objects you want to draw (static meshes, terrain, dynamic ...) thus the need to an abstract way of thinking about it.
A mesh, model or whatever, could contain informations for visibility.

And skinning?

There is one part of the c0de517e post that give me some time to think...the feature manager.
Basically it is something in a component-centric way of viewing objects.
The model to be rendered has different components, each one with a different update (visibility, skinning) THEN the model could submit render commands.
Even if the responsabilities for rendering are different (here the object itself send command, not the stage (as I understood well :P)), the concept of giving different "views" of the object to different rendering managers (classes that handle different components...) is really good for me.
And it is really data oriented.
In parallel you can update all the skinned objects, and in parallel again all the visibility.


Try to describe a complex scene (take whenever game with good graphics you like) in terms of stages, components, commands.
I'll do it to understand if there is a good flexibility with that design, but also power.

Share this post


Link to post
Share on other sites
I'll continue writing down ideas and thoughs.
I invite EVERYONE to share his opinion.


I'll return to the rendering only part (not culling, skinning...).
If you see take a Pix run, you can see what is a real command list!
The command types you found are:

- Set of Resources
- Render Target
- Texture
- Vertex Buffer
- Index Buffer
- Vertex/Geometry/Pixel wherever shader
- Shader params (ok, this is not a resource)
- Set of Render States
- Lock/Unlock
- Draw

The only problem I found in this list of commandtypes (there are more commands, like begin/endscene, clear buffers...) is in lock/unlock commands, or maybe command that needs to know informations about other resources.
The solution I've thinked of is something related to double buffering, even if it can be quite memory heavy to achieve (I don't know for sure, I have to profile it).
Removing Lock/Unlock lead to a stateless and thread-safe way of handling rendering.
Think of a skinning mesh, with two dynamic buffers, one that is readonly, and another that is updated by a task in parallel.

This view is really atomic, these are the lowest possible commands you can give (near 1o1 with an API device, near to Emergent implementation).
Starting bottom-up, it's easy to describe a complex scene with primitives like that.
Abstracting these commands can lead to something like a "drawatom" method, or methods, that uses different rendering buckets to render, each method specialized with a different drawatoms.

With this view in mind, stages can submit different drawatoms, even if can be slow to choose different methods based on the type of drawatom.
Maybe in the command there can be some bits dedicated to handle this:

- DrawStaticMeshAtom
- DrawSkinnedMeshAtom
- DrawShadowMeshAtom

that cast the drawatom to a specialized version and perform all needed operation.


Still the abstraction possibility of commands leads to very different results.

Share this post


Link to post
Share on other sites
Most of the times tou don't want to record directly directX command buffers, because then you can't sort rendercalls, meaning that each thread has to work on parts of the scene that are independent than the others. Often that's not the case.

If you want to go for the DirectX command recording (or anyway, record native rendercalls) then at least you probably want to have worker threads that prepare all the rendering data in parallel, then the data becomes read only, and the command-recording threads access the read only data to create the different scene segments.

Also, when you are thinking about such an archictecture, pay also attention on cache misses - branch mispredictions. Some source I've seen in this thread is naive in that respect.

-deadc0de @ c0de517e

Share this post


Link to post
Share on other sites
Quote:
Original post by kenpex
Most of the times tou don't want to record directly directX command buffers, because then you can't sort rendercalls, meaning that each thread has to work on parts of the scene that are independent than the others. Often that's not the case.


This is ok, I don't want to...but I do want to let different threads to add commands to a (maybe) per-thread queue, that is then merged and sorted!
At what level of granularity are you creating commands?
Say a very basic design that contains an abstract renderer and some implementations (DirectX9, OpenGL...), do you provide atomic operations that became our commands?

Quote:
Original post by kenpex
If you want to go for the DirectX command recording (or anyway, record native rendercalls) then at least you probably want to have worker threads that prepare all the rendering data in parallel, then the data becomes read only, and the command-recording threads access the read only data to create the different scene segments.

This is the exacy implementation made by gamebryo (and full of sourcecode, examples and paper), but this is something I don't like very much.
API abstraction is essential for me, I do want to create something that can handle different APIs!

Quote:
Original post by kenpex
Also, when you are thinking about such an archictecture, pay also attention on cache misses - branch mispredictions. Some source I've seen in this thread is naive in that respect.


This is a REALLY GOOD point.
The source posted is to only give the idea, but do you suggest something to pay attention to cache misses and branch misprediction?
I feel that using function pointers to speed up the command execution (command id = index in an array of funcion pointers...) could be a nuclear bomb for branches.

What are your thoughts about code execution? Would you like to explain further your thoughs about your design?

Thanks!

Share this post


Link to post
Share on other sites
Quote:
Original post by JorenJoestar
In a multithreaded environment...how could you manage culling? A simple double-buffered list is a simple yet effective solution? You work on the next-frame visibility...can be a good solution?

Also...there can be different bounding hierarchies based on the type of objects you want to draw (static meshes, terrain, dynamic ...) thus the need to an abstract way of thinking about it.
A mesh, model or whatever, could contain informations for visibility.


You don't need to abstract or centralize those concepts. I'd say, keep your architecture simple, and flexible, features will come in easily.

For example, having visibility information in a mesh is usually not great for cache locality, as you don't need the mesh while computing visibility. Also a mesh is too abstract to understand visibility. As you pointed out, you might use meshes for many things.

I like to have rendering entitites that manage the entire rendering of a given feature, for example the players in a soccer game. The entity will know what data to load and how to perform culling and so on.

Then, if that functionality is needed across entitites, you'll abstract it into a shared service. Working in a MT environment is not complicated this way. You could simply run in parallel the evaluation of the "prerender" pass of each entity. If you have shared services, you will update them in parallel before their users (the entities again).

We tend to look at the bigger picture too much when designing engines. For most games, it's not worth it. You want to have the ability of running some concepts in parallel, but not formalize what those things are. Because each game requires different technologies. Only if you're making a FPS or such of an heavily artist driven game then you'll go towards something generic. And anyway, I like always to opt-out and code my specific rendering stuff when needed.

Share this post


Link to post
Share on other sites
Quote:
Original post by JorenJoestar
The only problem I found in this list of commandtypes (there are more commands, like begin/endscene, clear buffers...) is in lock/unlock commands, or maybe command that needs to know informations about other resources.


Eh. In practice you'll face many more problems :)

First of all, DirectX and OpenGL are state machines. So recording a command buffer means that you have to fix an assumption about the state of the machine at the beginning of the recording, and keep that valid when you play the recorded stuff. That might lead to unnecessary state setting, in order to reset each time before you play stuff, your state to default. Also, choosing the default can be not easy.

And then, when you start doing it in practice, you'll notice even more problems, depending on the platform you're targeting, not all commands can be recorded.

In the end is not impossible, far from being impossible. But it is not so convenient.

Share this post


Link to post
Share on other sites
Quote:
Original post by JorenJoestar
Quote:
Original post by kenpex
Most of the times tou don't want to record directly directX command buffers, because then you can't sort rendercalls, meaning that each thread has to work on parts of the scene that are independent than the others. Often that's not the case.


This is ok, I don't want to...but I do want to let different threads to add commands to a (maybe) per-thread queue, that is then merged and sorted!
At what level of granularity are you creating commands?
Say a very basic design that contains an abstract renderer and some implementations (DirectX9, OpenGL...), do you provide atomic operations that became our commands?

Quote:
Original post by kenpex
If you want to go for the DirectX command recording (or anyway, record native rendercalls) then at least you probably want to have worker threads that prepare all the rendering data in parallel, then the data becomes read only, and the command-recording threads access the read only data to create the different scene segments.

This is the exacy implementation made by gamebryo (and full of sourcecode, examples and paper), but this is something I don't like very much.
API abstraction is essential for me, I do want to create something that can handle different APIs!

Quote:
Original post by kenpex
Also, when you are thinking about such an archictecture, pay also attention on cache misses - branch mispredictions. Some source I've seen in this thread is naive in that respect.


This is a REALLY GOOD point.
The source posted is to only give the idea, but do you suggest something to pay attention to cache misses and branch misprediction?
I feel that using function pointers to speed up the command execution (command id = index in an array of funcion pointers...) could be a nuclear bomb for branches.

What are your thoughts about code execution? Would you like to explain further your thoughs about your design?

Thanks!


Eh, I can't really say much because it depends on your application. You have multiple choices, and no "right way" there are always trade-offs. In general it's not true that using command buffers to record native commands is API dependent, you could simply abstract the recording API, and issue the native commands from a device abstraction layer that you probably already have. So even if you're recording native stuff, still you can be platform indep.

To me the major drawback of using command buffers directly is that you can't sort them. So I prefer to have another layer, record some drawing primitives information, then sort them, then in parallel generate command buffers, then play them.

I've already explained in my blog one scheme to do that. From what I can see, it's good, the only drawback that I see in it is that you are basically working with handles all the times, so you pay that on the cache when going from the hashes to the resources...

An alternative to avoid that is to record abstracted commands that embed pointers to the native resources. In my engine test, a draw command is a short bit string made of handles, that is both the command and the sorting key for it.

I.e a command is, for example
Framebuffer handle...Texture handle...mesh handle

An alternative is to record commands/pointers + a sort key for all of them. That takes more space, but avoids the indirection. To do the same draw, you'll record something like

settexture...pointer + sortkey
setmesh...pointer + sortkey

If the sort is stable, then you can rearrange your recorded abstracted commands (something you can't do with the native ones) and not pay any cache hits. The downside is that your record buffer can be longer (more hits!), and the whole thing is less abstracted (that could be good!).

Notice that in this scheme, all the sortkeys can be stored in a separate array, as they're only used in the sorting pass, that makes sense. Also you could still cull redundant commands when recording, thus making sure your recorded stuff is not too big. Deriving the right sortkeys can be a bit of a problem though.

Share this post


Link to post
Share on other sites
Quote:
Original post by JorenJoestarA simple double-buffered list is a simple yet effective solution? You work on the next-frame visibility...can be a good solution?


How much latency can you afford? I'd say, if your game is at 30hz, then probably you don't want to add much, if it's at 60, it can be just fine. In general, this choice should depend on the game and not be fixed in the engine. The engine should only provide a way of doing work in MT that is not just directly using threads. The engine should provide services, the rendering should depend on the specific game.

Share this post


Link to post
Share on other sites
Quote:
Original post by kenpex
Eh. In practice you'll face many more problems :)

First of all, DirectX and OpenGL are state machines. So recording a command buffer means that you have to fix an assumption about the state of the machine at the beginning of the recording, and keep that valid when you play the recorded stuff. That might lead to unnecessary state setting, in order to reset each time before you play stuff, your state to default. Also, choosing the default can be not easy.

And then, when you start doing it in practice, you'll notice even more problems, depending on the platform you're targeting, not all commands can be recorded.

In the end is not impossible, far from being impossible. But it is not so convenient.


Yes, this is an option I don't like very much, I prefer (as you wrote on your blog) an abstraction layer with sortable drawing commands, and then the commands calls API-dependent code.
Looking at Emergent's presentation, is clear that methods that retrieves informations and locks are not possible in low-level commands (the d3ddevice recorder), but I think that in an abstract command configuration this is possible.
I have defenitely to try it out! Testing is the best test!



Quote:
Original post by kenpex
Eh, I can't really say much because it depends on your application. You have multiple choices, and no "right way" there are always trade-offs. In general it's not true that using command buffers to record native commands is API dependent, you could simply abstract the recording API, and issue the native commands from a device abstraction layer that you probably already have. So even if you're recording native stuff, still you can be platform indep.

To me the major drawback of using command buffers directly is that you can't sort them. So I prefer to have another layer, record some drawing primitives information, then sort them, then in parallel generate command buffers, then play them.

I've already explained in my blog one scheme to do that. From what I can see, it's good, the only drawback that I see in it is that you are basically working with handles all the times, so you pay that on the cache when going from the hashes to the resources...

An alternative to avoid that is to record abstracted commands that embed pointers to the native resources. In my engine test, a draw command is a short bit string made of handles, that is both the command and the sorting key for it.

I.e a command is, for example
Framebuffer handle...Texture handle...mesh handle

An alternative is to record commands/pointers + a sort key for all of them. That takes more space, but avoids the indirection. To do the same draw, you'll record something like

settexture...pointer + sortkey
setmesh...pointer + sortkey

If the sort is stable, then you can rearrange your recorded abstracted commands (something you can't do with the native ones) and not pay any cache hits. The downside is that your record buffer can be longer (more hits!), and the whole thing is less abstracted (that could be good!).

Notice that in this scheme, all the sortkeys can be stored in a separate array, as they're only used in the sorting pass, that makes sense. Also you could still cull redundant commands when recording, thus making sure your recorded stuff is not too big. Deriving the right sortkeys can be a bit of a problem though.


Maybe you can wrap different API-dependent resources and use wrappers allocated with a custom memory manager, so you can take play with pointer logic (base + id) to take the direct pointer to the wrapper.


Quote:
Original post by kenpex
How much latency can you afford? I'd say, if your game is at 30hz, then probably you don't want to add much, if it's at 60, it can be just fine. In general, this choice should depend on the game and not be fixed in the engine. The engine should only provide a way of doing work in MT that is not just directly using threads. The engine should provide services, the rendering should depend on the specific game.


Yeah, my feeling about it is to have different worker threads and task assigned to them by a scheduler, just like BadCompany2 or Capcom's MT Engine (and many other). Flexibility is fundamental to adapt to different situations!

Share this post


Link to post
Share on other sites
(I'll continue with the brainstorming...)

What commands are possible?

How can you describe a scene?


Questions are important because your brain will always find an answer. (NLP)

I'll begin with the description of a scene: this is a bottom down approach.
Take a scene in which you have:
- Shadow mapping (VSM, CSM...)
- Refraction effects (water, ice, glass?)
- Reflection effects
- Opaque and transparent geometries
- Particles
- Static meshes, skinned meshes, morphed meshes
- PostProcess effects (HDR, Motion Blur, DOF, Bloom, SSAO, SSGI)
- Dynamic lights and shadows
- Static lightmaps
- Radiosity Normal Mapping
- More ???

The scene, depending on light complexity and geometry complexity, can be handled in the following way
- Forward Rendering
- Deferred Rendering
- Light Prepass Rendering
- Deferred Light Rendering (as in STALKER, Deferred Rendering for opaque geometries and forward pass for transparent...) (I don't know if this is the correct name)

Basically all the effects above are a combination of:
- Render to texture;
- Render geometry;
- Set geometry;
- Set texture;
- Set shader and params;
- Set render states;

The rendering possible situations are:
- Render a list of objects with the same shader (like shadows...)
- Render a list of objects with their material

The real matter of all these effects is the DEPENDENCIES between the intermediate steps.
You can imagine the rendering as a Petri's net in which you need some resource created by previous steps.
Consider for example an in-game scene from Gears Of War 2.
With a FORWARD APPROACH, it is only a matter of:
- Creating shadow maps, rendering to texture a list of objects with the same shader;
- Render the opaque geometries with their materials;
- Render the transparent geometries with their materials;
- Render post-process effect;

If I remember well Unreal Engine 3 uses a Lightbuffer, that accumulates shadows and light contribution, so it is more a
deferred approach.
With a DEFERRED APPROACH, using for shadows a technique like Deferred Shadow Maps:
- Render the G-Buffer;
- Render the light/shadow buffer, adding each contribute;
- Render the post-process fx;

Maybe it can seem to simple, but in practice the rendering became only a combination of those bricks; the important thing is
the timing, but if you have dependencies in mind you can easly understand which rendering step came before each other.

If you consider that transparent objects (like water, glass, distortions) needs to distort the opaque geometries (and other
transparent ones, sorting is always necessary) first you have to render opaque and then transparent object ordered.
Shadows need to apply the same shader to all geometries; Postprocesses needs at least one fullscreen quad and a shader, and maybe
some other infos (like motion blur and eye adaptation).
Rendering became a dependency graph based on steps and texture.
This is what I thought about rendering, and with that in mine I developed the Stage/Pipeline model: really easy, but extensible
and really EXPRESSIVE.

I think there are many other better methods to render, and I know that hard-coding is the faster way to render, but I think that
the real power of this approach is to understand the MENTALITY and the DEPENDENCIES behind any rendering.


What do you think about it?

Share this post


Link to post
Share on other sites
Quote:
Original post by JorenJoestar

Maybe you can wrap different API-dependent resources and use wrappers allocated with a custom memory manager, so you can take play with pointer logic (base + id) to take the direct pointer to the wrapper.


Good idea, I thought about that. But it won't work. Textures and meshes are big, so even if you redirect their allocation to a linear pool and you subtract the base address, you'll still have a large gap between them, and in the end I suspect you won't save many bits (respect to storing a full 32bit pointer).

Share this post


Link to post
Share on other sites
Other thoughts on the subject.
I'm thinking about a more abstract architecture, like command + command data.
Inside the command, there are informations like rendertarget, material/z order bits and an index to a data structure allocated in a pool.

The commands are not so many, the main distinction is in set/draw commands, and the set have different types, like set geometry (that can include vertex buffer, index buffer), set material (shader + params), set render target.
The clue is that rendering can be really be described by simple commands like those.

[Shadow map example]
- SetRenderTarget (render target: shadow)
- For each casting shadow object
- set geometry infos
- set vertex buffer
- set index buffer
- set material infos
- set shader
- set shader params
- draw
- SetRenderTarget (render target: main)
- For each visible object
- set geometry infos
- set material infos
- draw

Geometry informations can be contained inside a structure containing pointers to the REAL resources (D3D buffers, Shaders) and commands are handled by API-Specific devices.
Obviously all this flow is generated by sorting the commands based on render targets and other params.

Another fundamental thing is that each command has its own "do" method, in which uses API-Specific code to set resources or draw.


About parallelism.
Based on the design of Stages I've written before, each stage can be considered a macro task. Then this macro task can be subdivided in more fine-grained tasks.
Eg. ShadowStageTask. To draw to a shadow map, we set the current render target,
then for each visible+castingshadow object set the shadow shader and draw all the geometries using this shader.
There must be someone who knows about informations to create a commandkey in the proper manner: someone who knows the stage for example.
Also consider using a parallel for to subdivide the cycling through shadowcasting objects.
And then, you create a command "setstage", another "setshader" and many "draw", with the bits of the stage that are known by the current stage.

The rendering itself is to be considered:
for each stage
draw stage

Here we can use a parallelo for also, and for each stage use a mentality like that.


What do you think about it?

[Edited by - JorenJoestar on February 26, 2010 6:12:36 AM]

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this