Rendering design thoughts

Started by
16 comments, last by JorenJoestar 14 years, 1 month ago
Hello guys, I'm here to share some thoughts about rendering design! This is a really wide topic, with many different way of doing it, but rather I want to try and explore some concepts that are not known to me. Maybe you will find something interesting here! N.B. This can be considered a sort of brainstorming that I want to share! The key ideas behind the will to redesign my renderer is that, even if it is successufull from a shader management point of view (the effects are easy to implement, using cg) it lacks multithreading and scene management, even if there is some sort of culling. This is a list of keys: - Multithreading - Multiplatform (API-independent) - Data oriented - Flexible but fast The first thoughts is about multithreading, and I found in Command Buffers (as read in Emergent paper (Practical Parallel Rendering with DirectX 9 and 10, and also post on RealtimeCollisionDetection blog, or http://c0de517e.blogspot.com/2009/03/my-little-rendering-engine.html) and other really a good way of handling multithreading. The big question is : at which abstraction level is good to create commands? Gamebryo solution is to create a command recorder, that is a modified direct3ddevice, but I don't think it is the only solution. I think that it can be good also to provide many abstract-low level methods in your renderer, and then use them to provide API-indipendent rendering multi-threading. So basically design your "render device" with many low level methods that can be called as commands. Commands cannot create or destroy resources, but only set resources as current and draw. Going up in abstraction, there must me someone that submit commands (multiple queues per thread, then merged can be a solution). Who is responsible of submitting commands? The one who knows about informations to be sent. Another GREAT question: what informations can be submitted? And what relationship is between commands and those informations? How can you access them? This part is rather tricky for me. Emergent create the command buffer that really duplicates calls to direct3ddevice, so basically the command contains the stack of all the parameters to be sent to the real call. Other way of doing this is to create internal pool and use indexes (the ID???) inside these. (More post to follow...) [Edited by - JorenJoestar on February 9, 2010 5:15:59 AM]
---------------------------------------http://badfoolprototype.blogspot.com/
Advertisement
I've always loved the idea of what I called 'draw atoms': a really small structures that contains the only needed data to rendering. This include a shader effect, shader parameters, a mesh (for vertex/index buffers) and the worldmatrix (taken from the current mesh instance).
The rendering reduce to render a list of sorted draw atoms.
Conceptually it is different but it is close to command buffers and stuff like that.
The context in which they are rendered, and how, is handled by rendering STAGES: each STAGE can be a render to texture, but it is not necessary; stage is a way of rendering draw atoms.
Each stage may have its own camera.
Each stage can access a list of visible objects to be rendered.
Thinking about shadow mapping, I found really useful to apply the same effect to many different meshes, rendering to a texture.
BUT this lacks the handling of draw atoms: they are data only for a material-based rendering.
So far, a list of stages live inside a pipeline, that is the TOTAL render of the scene. For each stage that is rendered to a texture, this can be accessed by other stages inside the pipeline, so order is really important.
With this really simple rendering design, I successufully created both a light-prepass and deferred renderer with very little problems.
Problem here is the lack of multi-threading.

An interesting point here is that every complex-scene rendered can be described in a simple way (rendering of visible objects, so after culling).
Basically you can render object on a per material basis or applying a shader to all. You can render fullscreen quad as postprocess.
Try to figure out all the complex rendering pipelines and how they use these concepts.

(I'm trying to abstract the render process and then find a new way to handle different situation.)
---------------------------------------http://badfoolprototype.blogspot.com/
I moved to an even simpler abstraction for my current project. The render queue contains only a sort parameter and a functor - when the functor is called, it performs whatever drawing is needed (bind textures, shaders, render geometry, etc.).

I build a render queue by traversing the scene graph, performing culling, etc. as I go. The finished queue is sorted and handed to a renderer, which deals with all the low-level details.

I haven't expended the effort to multi-thread this process, because it doesn't make a significant presence in the profiler. If it does show up at some point I might consider it, but I imagine physics will continue to overshadow scene graph/render queue management for some time to come.

Tristam MacDonald. Ex-BigTech Software Engineer. Future farmer. [https://trist.am]

Hi,

First of all excuse my English.

You are almost there. With the concepts of Draw Atom and Stages you describe could easily implement an deferred renderer.

You just need the command buffer abstraction. If I'm not misinterpreting you, the Stages are the ones submiting render orders. So think if that orders are just commands to a local command buffer... All your stages could be renderer in parallel.

So for example:
DrawAtom{	BoundingVolume volume;		KeyInt key;	Shader* shader;	ShaderParamsBuffer* shaderParamsBuffer;		VertexStreams vertexStreams;	PrimitiveAssembler primitiveAssembler;}DrawCmdData{	KeyInt key;	Shader *shader;	ShaderPramsBuffer* shaderParamsBuffer;	VertexStreams vertexStreams;	PrimitiveAssembler primitiveAssembler;}NormalStage{	Frustrum viewFrustrum;	FrameBuffer fb;	Int stageKey;	vector<DrawAtom> visibleAtoms;	vector<DrawCmdData> visibleCmds;		void CollectVisibleDrawAtoms(SpatialOrganizer<DrawAtoms> scene)	{					visibleAtoms.clear();		scene.Cull(viewFrustrum, visibleAtoms);				visibleCmds.clear();		foreach (DrawAtom da in visibleAtoms)		{			DrawCmdData data(da);						data.key = AlterStageKey(data.key, stageKey);			visibleCmds.insert(data);		}	}		void Render(CommandBuffer& cb)	{				Int currentShaderKey = InvalidKey;		Int currentParamsKey = InvalidKey;				sort(visibleAtoms);				cb.SetFrameBuffer(fb);		cb.ClearFrameBuffer(...);				foreach (DrawAtom atom in visibleAtoms)		{			Int shaderKey = GetEffectKey(atom.key);			Int paramsKey = GetEffectKey(atom.key);						if (shaderKey != currentShaderKey)			{				cb.SetShader(atom.shader);				currentShaderKey = shaderKey;			}						if (shaderParamsBuffer != currentShaderParamsBufferKey)			{				cb.SetShaderParams(atom.shaderParamsBuffer);				currentShaderParamsBufferKey = paramsKey;			}			cb.Draw(atom.primitiveAssembler);		}	}}ShadowMapStage{	...	Int shadowMapShaderKey;	Int shadowMapParamsKey;	Shader* shadowMapShader;	ShaderPramsBuffer* shadowMapShaderParamsBuffer;		void CollectVisibleDrawAtoms(SpatialOrganizer<DrawAtoms> scene)	{					visibleAtoms.clear();		scene.Cull(viewFrustrum, visibleAtoms);				visibleCmds.clear();		foreach (DrawAtom da in visibleAtoms)		{			DrawCmdData data(da);			data.key = AlterStageKey(data.key, stageKey);			data.key = AlterShaderKey(data.key, shadowMapShaderKey);			data.key = AlterShaderParamsKey(data.key, shadowMapParamsKey);			data.shader = shadowMapShader;			data.shaderParamsBuffer = shadowMapShaderParamsBuffer;			visibleCmds.insert(data);		}	}		...}


The key is somehow an indexing mechanism (not fully exploted in the example) used to sort for material, transparency, etc. It could be exploted so that all the info is encoded in the key and the real data (Shader, ShaderParamsBuffer, etc) is accesed indexing data tables(Like in c0de517e link).

The command buffer just stores the data to be executed later in a format ready to use by the API.

CommandBuffer{	Command	{		Int type;		union 		{				...			SetFrameBuffer setFrameBuffer;			SetShaderCmd setShader;			SetShaderConstantCmd setShaderConstant;			SetSamplerStateCmd setSamplerState;			DrawPrimitiveCmd drawPrimitive;			...		}	}	vector<Commad> commands;		...		void SetShader(Shader* shader)	{		Command cmd;		cmd.type = SetShaderCmdType;		cmd.setShader.shader = shader;				commands.insert(cmd);	}		void SetShaderParamsBuffer(ShaderParamsBuffer* paramsBuffer)	{		foreach (ShaderParam param in paramsBuffer->params)		{			Command cmd;			cmd.type = SetShaderConstantCmdType;			cmd.setShaderConstant.data = param.data;						commands.insert(cmd);		}				foreach (SamplerState sampler in paramsBuffer->samplers)		{			Command cmd;			cmd.type = SetSamplerStateCmdType;			cmd.setSamplerState.texture = sampler.texture;			cmd.setSamplerState.filer = sampler.filter;			...			commands.insert(cmd);		}	}		void DrawPrimitive(PrimitiveAssembler pa)	{		Command cmd;		cmd.type = DrawPrimitiveType;		cmd.drawPrimitive.primitive = pa.primitive;		cmd.drawPrimitive.numPrimitives = pa.numPrimitives;				commands.insert(cmd);	}		...}


Things to take into account is that the cmds generated inside the command buffer are ready to use by the underlying API (OpenGL, Direct3d9, Direct3d10, consoles own command/push buffers formats, etc). So for example in d3d10 the SetShaderParamsBuffer could be translated to a Constant buffer setting while in d3d9 a command setting a shader constant is submited per param.

The command buffer could be pregenerated to render a fixed scene. So no time is wasted culling, ordering calls, etc.

Finally the renderer interprets that command buffer and sends the commands to the API.

Renderer{	void Render(CommandBuffer& cb)	{		foreach (Command cmd in cb.commands)		{			switch (cmd.type)			{				...				case SetFrameBuffer:				{					glBindFramebufferEXT(GL_FRAMEBUFFER_EXT, cmd.setFrameBuffer.bufferId);					break;				}				case DrawPrimitiveType:				{					glDrawElements(cmd.drawPrimitive.primitive, cmd.drawPrimitive.numPrimitives, cmd.drawPrimitive.primitive, ...);					break;				}						...			}		}	}}
Good, good. What about data? Is it contained in some sort of structure?
Can you handle complex scenes, like one with multiple shadowmapping, reflection, refractions, opaque, transparent, particles?

Thanks for sharing!
---------------------------------------http://badfoolprototype.blogspot.com/
Hola alagtriste! :)

Yes, actually stages calls for engine methods (abstracted) that performs rendering.
Do you think that I had to store also the informations needed to render in the command buffer? Have you tried an approach like that (Deferred function calls)?
I'm thinking of using function pointers to renderer commands indexed by the command id...

Thanks!
---------------------------------------http://badfoolprototype.blogspot.com/
Another aspect I want to explore is the culling process.
As you already know, culling is the process of cut-away not-interesting objects (based on a matching criteria).

Thinking about complex scenes, what is rendered on screen?
Do you render everytime only a single list of objects?

Actually there are different list of objects that can be created by different culling process.
Eg. shadow casting objects, lights.
So, potentially, there are n possible "views" of the same scene.
You cull out the lights you don't need based on the current frustum.
Consider a scene with an object like a security camera. There will be a scene rendered by that camera with complitely different objects: okay, maybe you can consider (as for reflections) objects that doesn't cast shadows, so you need to fill only a single list, but still there is another list to render.

In a multithreaded environment...how could you manage culling? A simple double-buffered list is a simple yet effective solution? You work on the next-frame visibility...can be a good solution?

Also...there can be different bounding hierarchies based on the type of objects you want to draw (static meshes, terrain, dynamic ...) thus the need to an abstract way of thinking about it.
A mesh, model or whatever, could contain informations for visibility.

And skinning?

There is one part of the c0de517e post that give me some time to think...the feature manager.
Basically it is something in a component-centric way of viewing objects.
The model to be rendered has different components, each one with a different update (visibility, skinning) THEN the model could submit render commands.
Even if the responsabilities for rendering are different (here the object itself send command, not the stage (as I understood well :P)), the concept of giving different "views" of the object to different rendering managers (classes that handle different components...) is really good for me.
And it is really data oriented.
In parallel you can update all the skinned objects, and in parallel again all the visibility.


Try to describe a complex scene (take whenever game with good graphics you like) in terms of stages, components, commands.
I'll do it to understand if there is a good flexibility with that design, but also power.
---------------------------------------http://badfoolprototype.blogspot.com/
I'll continue writing down ideas and thoughs.
I invite EVERYONE to share his opinion.


I'll return to the rendering only part (not culling, skinning...).
If you see take a Pix run, you can see what is a real command list!
The command types you found are:

- Set of Resources
- Render Target
- Texture
- Vertex Buffer
- Index Buffer
- Vertex/Geometry/Pixel wherever shader
- Shader params (ok, this is not a resource)
- Set of Render States
- Lock/Unlock
- Draw

The only problem I found in this list of commandtypes (there are more commands, like begin/endscene, clear buffers...) is in lock/unlock commands, or maybe command that needs to know informations about other resources.
The solution I've thinked of is something related to double buffering, even if it can be quite memory heavy to achieve (I don't know for sure, I have to profile it).
Removing Lock/Unlock lead to a stateless and thread-safe way of handling rendering.
Think of a skinning mesh, with two dynamic buffers, one that is readonly, and another that is updated by a task in parallel.

This view is really atomic, these are the lowest possible commands you can give (near 1o1 with an API device, near to Emergent implementation).
Starting bottom-up, it's easy to describe a complex scene with primitives like that.
Abstracting these commands can lead to something like a "drawatom" method, or methods, that uses different rendering buckets to render, each method specialized with a different drawatoms.

With this view in mind, stages can submit different drawatoms, even if can be slow to choose different methods based on the type of drawatom.
Maybe in the command there can be some bits dedicated to handle this:

- DrawStaticMeshAtom
- DrawSkinnedMeshAtom
- DrawShadowMeshAtom

that cast the drawatom to a specialized version and perform all needed operation.


Still the abstraction possibility of commands leads to very different results.

---------------------------------------http://badfoolprototype.blogspot.com/
Most of the times tou don't want to record directly directX command buffers, because then you can't sort rendercalls, meaning that each thread has to work on parts of the scene that are independent than the others. Often that's not the case.

If you want to go for the DirectX command recording (or anyway, record native rendercalls) then at least you probably want to have worker threads that prepare all the rendering data in parallel, then the data becomes read only, and the command-recording threads access the read only data to create the different scene segments.

Also, when you are thinking about such an archictecture, pay also attention on cache misses - branch mispredictions. Some source I've seen in this thread is naive in that respect.

-deadc0de @ c0de517e
Quote:Original post by kenpex
Most of the times tou don't want to record directly directX command buffers, because then you can't sort rendercalls, meaning that each thread has to work on parts of the scene that are independent than the others. Often that's not the case.


This is ok, I don't want to...but I do want to let different threads to add commands to a (maybe) per-thread queue, that is then merged and sorted!
At what level of granularity are you creating commands?
Say a very basic design that contains an abstract renderer and some implementations (DirectX9, OpenGL...), do you provide atomic operations that became our commands?

Quote:Original post by kenpex
If you want to go for the DirectX command recording (or anyway, record native rendercalls) then at least you probably want to have worker threads that prepare all the rendering data in parallel, then the data becomes read only, and the command-recording threads access the read only data to create the different scene segments.

This is the exacy implementation made by gamebryo (and full of sourcecode, examples and paper), but this is something I don't like very much.
API abstraction is essential for me, I do want to create something that can handle different APIs!

Quote:Original post by kenpex
Also, when you are thinking about such an archictecture, pay also attention on cache misses - branch mispredictions. Some source I've seen in this thread is naive in that respect.


This is a REALLY GOOD point.
The source posted is to only give the idea, but do you suggest something to pay attention to cache misses and branch misprediction?
I feel that using function pointers to speed up the command execution (command id = index in an array of funcion pointers...) could be a nuclear bomb for branches.

What are your thoughts about code execution? Would you like to explain further your thoughs about your design?

Thanks!
---------------------------------------http://badfoolprototype.blogspot.com/

This topic is closed to new replies.

Advertisement