Modern Renderer Design

Started by
10 comments, last by haegarr 8 years, 7 months ago

Hi,

in the current state of my engine, I have an interface called "IRenderable" and a bunch of classes implementing this interface.

The interface (being a pure virtual class) let's every derived class define it's own rendering logic. However, I don't like the fact that those classes know how to render themselves. Just imagine porting to another backend.

I am thinking of an another approach with cache locality in mind. I've created structures like the following:


struct Light{
	enum class LightType{
		SPOT;
		DIRECTIONAL;
		POINT;
	} lightType;
	union{
		// ...
	} lightData;
};

struct Mesh{
	glm::mat4	worldMatrix;
	int		vertexBuffer;
	int 		indexBuffer;
};

I want to feed the renderer with this data (which is possibly stored as a chunk of continous memory).
But I have no idea how to feed the renderer with this data. My naiv approach is/was to have methods which add those classes into a vector and a render method which processes over the structs.

TL:DR, the question:

However, I have no idea how a modern renderer is designed in C++. I never touched a C++ engine (yet).

All my projects involving large scale rendering where done using Unity (Drag'n'Drop the Mesh, done.). My 3D-demos were all written in C# using the "IRenderable" approach.
Are there any papers, blogs, articles (and what so ever) that can give me an impression on how to design a renderer?

LG Julien
P.S.: Before the "Dont-Make-Engine-Make-Games"-People declare Jihad against me, I'd like to mention, that I am simultaneously working on a game RTS game (does anyone remember the "Jungle Troll Mod" for WC3?)

It's not a shame to make mistakes. As a programmer I am doing mistakes on a daily basis. - JonathanKlein

Advertisement

Game Engine Architecture, Second Edition
by Jason Gregory
Link: http://amzn.com/1466560010

This is most likely your best resource for doing it yourself. I don't know of any thorough online resource.

--Edit--

You may also find this helpful: While we wait – Approaching Zero Driver Overhead http://cg.alexandra.dk/?p=3778

I think, therefore I am. I think? - "George Carlin"
My Website: Indie Game Programming

My Twitter: https://twitter.com/indieprogram

My Book: http://amzn.com/1305076532

#1
To address the elephant in the room, “modern rendering” excludes the possibility of using OpenGL. Step 1 is to use Metal, Vulkan, or Direct3D 12.
Although you can design a modern workflow with any API (almost), it certainly helps to use a modern one as a guideline so that you properly create your renderer around the use of command buffers.

#2
The next issue to address is the use of pure virtuals in certain ways (here I am talking about index buffers, the render device, textures, etc., not on models/meshes/other high-level cases, which I address in #4). They achieve nothing unless you need to change the API at run-time, which you never will. On any given platform you either must use only 1 API or you will anyway because OpenGL is just a bad idea on Windows®.
A better way is to have a base class, an API-specific class in the middle, and the actual class on top.
CIndexBufferBase
CDirect3D12IndexBuffer CVulkanIndexBuffer CMetalIndexBuffer COpenGlEs2IndexBuffer
CIndexBuffer


CIndexBufferBase contains all data common to all forms of index buffers (such as how many indices there are, how many bytes per index, and optionally a CPU copy of the indices)
Each API class inherits from CIndexBufferBase and handles API-specific functionality, such as creating the index buffer and drawing with it.
CIndexBuffer inherits from one of the API classes depending on which macro is set.



bool CIndexBuffer::CreateIndexBuffer( const void * _pvIndices, size_t _sSizeOfInices, size_t _sTotalIndices ) {
    // Error-checking (bad pointers, bad index sizes, etc.)

    // Copy data into members provided by CIndexBufferBase.
    m_sIndexSize = _sSizeOfInices;
    m_sTotalIndices = _sTotalIndices;
    // Etc.

    // Call API-specific creation function (no need to pass data; it can access m_sIndexSize, m_sTotalIndices, etc.)
    if ( !CreateIndexBufferApi() ) { return false; }

    // Anything else.  Clean-up, etc.
    return true;
}

Each of the API-specific classes implements CreateIndexBufferApi(), and there is no need for virtual interfaces at all.

#3
As for what the rendering module does, it provides these types of classes (index buffers, textures, vertex buffers, shaders, samplers, render-queues, etc.) and a wrapper interface for performing draw commands (set culling, set render targets, draw, etc.)
The last thing you want to do is make your renderer aware of models, terrain, etc. Models, terrain, foliage, water, procedural clouds, 2D sprites, etc. all use the renderer to draw themselves.

That means they create the index buffers, vertex buffers, shaders, textures, any resources they need by themselves. They manage, update, and destroy these resources by themselves. They activate textures where they know they are needed.

This only makes sense. Having a centralized location (a renderer module) trying to manage how all of these types of objects render is a gross violation of the single-responsibility principal and invariably leads to monolithic spaghetti code.

The renderer module is low-level. Everything can access it and do what they want. It’s only job is to provide a universal interface so that models, terrain, etc. don’t have to worry about which API is being used.

#4

Finally, the high-level flow of the engine, which necessarily must include other modules besides the rendering module.

All objects in the scene are in an array inside the scene manager. The scene manager does exactly what it says. It manages objects and passes data around to different modules so that physics can run, rendering can happen, etc. It lives inside the engine module itself (the highest-level module in an engine).

When it is time to draw, the scene manager may do different things for different types of objects (terrain culling and rendering is vastly different from models and foliage, for example), but for now we will only focus on rendering models.

It gathers a list of meshes (multiple meshes create a model) by traversing your world’s spacial partitioning scheme (typically and octree) with the camera frustum. The objects in this list may get a “pre-draw” command to allow them to prepare for rendering. This could be executed on a separate thread while the scene manager continues preparing.

Each mesh may require multiple draw calls to render (different materials on a single mesh, multiple layers, etc.) The scene manager goes over each mesh and passes them 2 render-queues (one for opaque, one for translucent). Each mesh knows how many passes it takes to render itself, so it adds as many render-queue items to the queues as needed. Each item has a shader ID, base texture ID, distance from camera, etc. (anything useful for sorting). There are many topics on this site about how to use render-queues.

The render-queues are sorted and then the scene manager goes over the opaque first and then the translucent. The meshes are then told to render each submission they made to the render-queue in the now-sorted order. This means the meshes set their own vertex/index buffers, textures, shaders, etc. This ensures that you can have objects rendering in wildly different ways (water, terrain, clouds, meshes, imposters, foliage, volumetric fog, etc.)

In this case, meshes etc. are using virtual functions (to address your specific usage of virtual functions).

Don’t break design by having too much of a focus on cache locality etc. The mesh structure you proposed, even in its short size, already has a flaw, since the world matrix of an object is not related to the mesh. A mesh is for rendering. The world matrix is only borrowed for rendering, but is also used for physics, etc. Objects can have world matrices and not be renderable items. Your proposal suggests that in order to exist in the world it must also have a vertex and index buffer, which is simply not the case.

It is much more important for objects to have good logical design and connections with each other than to have better cache utilization.

L. Spiro

I restore Nintendo 64 video-game OST’s into HD! https://www.youtube.com/channel/UCCtX_wedtZ5BoyQBXEhnVZw/playlists?view=1&sort=lad&flow=grid


I have an interface called "IRenderable" and a bunch of classes implementing this interface.
The interface (being a pure virtual class) let's every derived class define it's own rendering logic. However, I don't like the fact that those classes know how to render themselves. Just imagine porting to another backend.
Yeah I hate that design. Different types of "renderables" should not have to write backend-specific code.

In my engine, I've made a base "DrawItem" structure (which is ported to every backend). Different types of "renderables" can then be composed of DrawItems (not inherit from them).

I've made lots of posts about this so I'll just link one :P http://www.gamedev.net/topic/666419-what-are-your-opinions-on-dx12vulkanmantle/#entry5215127


I have an interface called "IRenderable" and a bunch of classes implementing this interface.
The interface (being a pure virtual class) let's every derived class define it's own rendering logic. However, I don't like the fact that those classes know how to render themselves. Just imagine porting to another backend.
Yeah I hate that design. Different types of "renderables" should not have to write backend-specific code.

In my engine, I've made a base "DrawItem" structure (which is ported to every backend). Different types of "renderables" can then be composed of DrawItems (not inherit from them).

I've made lots of posts about this so I'll just link one tongue.png http://www.gamedev.net/topic/666419-what-are-your-opinions-on-dx12vulkanmantle/#entry5215127

But in my case my "DrawItem" isn't in API (D3D/GL/ect) "native form". (not sure if this is better or worse)

So everytime I execute a DrawCall(or actually for every type of call ClearBuffer/Execute a compute program/ect.) I translate that strcture to the Native API and then execute it.


struct DrawCall {
    // ResourceProxy is just a pointer...

    ResourceProxy<ShadingProgram> m_shadingProg; // I "link" all the programs in one "shading program" in order ot reduce the size of that strcture.
    ResourceProxy<BufferResource> m_vertexBuffers[GraphicsCaps::NUM_VERTEX_BUFFER_SLOTS];
    uint32 m_vbOffsets[GraphicsCaps::NUM_VERTEX_BUFFER_SLOTS];
    uint32 m_vbStrides[GraphicsCaps::NUM_VERTEX_BUFFER_SLOTS];
    boost_small_vector<VertexDecl, 3> m_vertDecl;
    PrimitiveTopology::Enum m_primTopology;
    ResourceProxy<BufferResource> m_indexBuffer;
    UniformType::Enum m_indexBufferFormat;
    uint32 m_indexBufferByteOffset;

    // these here a basically std::vectors, I really want to avoid all that dynamic allocations, becase of that I use boost::small_vector
    // But I'm not sure if this is the right thing to do? What is your solution?
    BoundCBuffersContainter m_boundCbuffers;
    BoundTexturesContainter m_boundTextures;
    BoundSamplersContainter m_boundSamplers;
    
    ResourceProxy<FrameTarget> m_frameTarget; // render targets + depth stencil
    Viewport m_viewport; // Currently I support only one viewport...
    
    // I'm concidering to combine those 3 in 1 object in order to shrink that strcture a bit.
    ResourceProxy<RasterizerState> m_rasterState;
    ResourceProxy<DepthStencilState> m_depthStencilState;
    ResourceProxy<BlendState> m_blendState;

    DrawExecDesc m_drawExec; // aka. the Draw/DrawIndexted/DrawIndexedInstanced/ect.
}


They achieve nothing unless you need to change the API at run-time, which you never will.

Well, actually I wanted to. Let's say a target machine doesn't support OpenGL 4 due to missing drivers.

I'd fall back to OpenGL 2.


Having a centralized location (a renderer module) trying to manage how all of these types of objects render is a gross violation of the single-responsibility principal and invariably leads to monolithic spaghetti code.

That's the problem I am trying to avoid at all cost.


The renderer module is low-level. Everything can access it and do what they want. It’s only job is to provide a universal interface so that models, terrain, etc. don’t have to worry about which API is being used.


So it's better to abstract the rendering api (perhaps into a stateless rendering api; I just stumbled upon this, kind of similar to hodgeman's approach, isn't it)?


About the sceneManager: Is it just a bunch of classes holding every mesh etc. in a std::vector (or something comparable)?
And is the sceneManager commiting the draw calls?


So, finally: Thanks to any who replied. I've got some awesome content to think about (@L.Spiro, @Hodgeman)
LG Julien

It's not a shame to make mistakes. As a programmer I am doing mistakes on a daily basis. - JonathanKlein

in my case my "DrawItem" isn't in API (D3D/GL/ect) "native form". (not sure if this is better or worse)
So everytime I execute a DrawCall(or actually for every type of call ClearBuffer/Execute a compute program/ect.) I translate that strcture to the Native API and then execute it.

If CPU-usage becomes an issue for you, you'll be able to optimize that later to pre-convert from agnostic (platform-independent) data into platform-specific data once in advance, instead of on every draw. Performing this optimization will make dynamic renderables a bit more cumbersome though -- e.g. often UI code, debug visualisations, some special effects, have their DrawItems recreated every frame, which is likely easier in your system.

// these here a basically std::vectors, I really want to avoid all that dynamic allocations, becase of that I use boost::small_vector
// But I'm not sure if this is the right thing to do? What is your solution?

I often use in-place, variable-length arrays, via ugly C-style code, which requires the size of your array to be immutable. IMHO pre-compiled DrawItems should be largely immutable anyway:


struct Widget
{
  uint8_t fooCount;
  uint8_t barCount;
  Foo* FooArray() { return (Foo*)(this+1);  }
  Bar* BarArray() { return (Bar*)(FooArray()+fooCount); }
  size_t SizeOf() const { return sizeof(Widget) + sizeof(Foo)*fooCount + sizeof(Bar)*barCount; }
};
static_assert( alignof(Foo)%alignof(Widget) == 0 || alignof(Widget)%alignof(Foo) == 0 );//assume that it's safe to allocate the arrays end-to-end like this...
static_assert( alignof(Bar)%alignof(Foo) == 0 || alignof(Foo)%alignof(Bar) == 0);

//Create a nice compact Widget from two std::vectors
Widget* MallocWidget( const std::vector<Foo>& inFoo,  const std::vector<Bar>& inBar )
{
  Widget temp = { inFoo.size(), inBar.size() };//init counts
  Widget* out = (Widget*)aligned_malloc( temp.SizeOf(), alignof(Widget) );//compute full size
  *out  = temp;//copy count members
  Foo* outFoo = out->FooArray();
  Bar* outBar = out->BarArray();
  for( size_t i=0, end=foo.size(); i!=end; ++i )
    outFoo[i] = inFoo[i];
  for( size_t i=0, end=bar.size(); i!=end; ++i )
    outBar[i] = inBar[i];
  return out;
}

So it's better to abstract the rendering api (perhaps into a stateless rendering api; I just stumbled upon this, kind of similar to hodgeman's approach, isn't it)?

Yep, after using a few stateless rendering APIs, they're now the only choice for me smile.png

IMHO it's also a very good idea to have a simple rendering API as the "base level", which doesn't know anything about scenes/etc... all it does is act like D3D/GL, but easier to use, and cross-platform. Your scene-manager(s) are then the next layer that is built upon this simple base API.

So, I am doing a prototype (a draft) based on the information you gave me.

I've following classes:


struct Mesh; // Data only
struct Light; // Data only

class  MeshRenderer;
class  LightRenderer;

class  Scene;
class  RenderQueue;
class  Renderer;

The idea is, that each renderable entity has it's own renderer. The structures are being submitted to the "Scene", the "*Renderer" class process the data and submit drawcalls to the "RenderQueue" using DrawItems. The RenderQueue sorts the draw calls (opacity hey ho!) and if it finds the same drawcalls (e.g. using same vertex/indexbuffers handles) it batches them into a "InstancedDrawCall".
Finally the "Renderer" processes these DrawCalls.

Is this the way to go? I am still not sure on how much I should abstract things. Should the renderQueue be aware of updating/creating resources (loading vertices in a VertexBuffer?).

EDIT:
Should the render queue be like the "deferred context" as known from DirectX 11, yet stateless?

It's not a shame to make mistakes. As a programmer I am doing mistakes on a daily basis. - JonathanKlein

Should the renderQueue be aware of updating/creating resources (loading vertices in a VertexBuffer?).

Absolutely not. The render-queue is nothing but a very small set of integers (or a single 64-bit integer if possible) which contains data needed for sorting. That means a shader ID, texture ID, any small ID numbers that you want to include for sorting, and the fractional bits of a normalized (0-1) float for depth.

Should the render queue be like the "deferred context" as known from DirectX 11, yet stateless?

See above. A render-queue has no relationship to contexts. It simply sorts draw calls. There is no reason it needs to know about resource creation or contexts or literally anything else but integers.


L. Spiro

I restore Nintendo 64 video-game OST’s into HD! https://www.youtube.com/channel/UCCtX_wedtZ5BoyQBXEhnVZw/playlists?view=1&sort=lad&flow=grid

I submit render queues into "GpuContexts", and a "GpuContext" is a wrapper around an Immediate Context / Deferred Context / OpenGL Context / Command List/etc...

Any thread can build a render-queue without even having a pointer to a GpuDevice/GpuContext, as it's just application data. After that, you can submit the queue into a context that is owned by the current thread.

This topic is closed to new replies.

Advertisement