• Advertisement

• ### Popular Now

• 11
• 14
• 12
• 10
• 11
• Advertisement
• ### Similar Content

• By fleissi
Hey guys!

I'm new here and I recently started developing my own rendering engine. It's open source, based on OpenGL/DirectX and C++.
The full source code is hosted on github:
https://github.com/fleissna/flyEngine

I would appreciate if people with experience in game development / engine desgin could take a look at my source code. I'm looking for honest, constructive criticism on how to improve the engine.
I'm currently writing my master's thesis in computer science and in the recent year I've gone through all the basics about graphics programming, learned DirectX and OpenGL, read some articles on Nvidia GPU Gems, read books and integrated some of this stuff step by step into the engine.

I know about the basics, but I feel like there is some missing link that I didn't get yet to merge all those little pieces together.

Features I have so far:
- Dynamic shader generation based on material properties
- Dynamic sorting of meshes to be renderd based on shader and material
- Rendering large amounts of static meshes
- Hierarchical culling (detail + view frustum)
- Limited support for dynamic (i.e. moving) meshes
- Normal, Parallax and Relief Mapping implementations
- Wind animations based on vertex displacement
- A very basic integration of the Bullet physics engine
- Procedural Grass generation
- Some post processing effects (Depth of Field, Light Volumes, Screen Space Reflections, God Rays)
- Caching mechanisms for textures, shaders, materials and meshes

Features I would like to have:
- Global illumination methods
- Scalable physics
- Occlusion culling
- A nice procedural terrain generator
- Scripting
- Level Editing
- Sound system
- Optimization techniques

Books I have so far:
- Real-Time Rendering Third Edition
- 3D Game Programming with DirectX 11
- Vulkan Cookbook (not started yet)

I hope you guys can take a look at my source code and if you're really motivated, feel free to contribute :-)
There are some videos on youtube that demonstrate some of the features:
Procedural grass on the GPU
Procedural Terrain Engine
Quadtree detail and view frustum culling

The long term goal is to turn this into a commercial game engine. I'm aware that this is a very ambitious goal, but I'm sure it's possible if you work hard for it.

Bye,

Phil
• By tj8146
I have attached my project in a .zip file if you wish to run it for yourself.
I am making a simple 2d top-down game and I am trying to run my code to see if my window creation is working and to see if my timer is also working with it. Every time I run it though I get errors. And when I fix those errors, more come, then the same errors keep appearing. I end up just going round in circles.  Is there anyone who could help with this?

Errors when I build my code:
1>Renderer.cpp 1>c:\users\documents\opengl\game\game\renderer.h(15): error C2039: 'string': is not a member of 'std' 1>c:\program files (x86)\windows kits\10\include\10.0.16299.0\ucrt\stddef.h(18): note: see declaration of 'std' 1>c:\users\documents\opengl\game\game\renderer.h(15): error C2061: syntax error: identifier 'string' 1>c:\users\documents\opengl\game\game\renderer.cpp(28): error C2511: 'bool Game::Rendering::initialize(int,int,bool,std::string)': overloaded member function not found in 'Game::Rendering' 1>c:\users\documents\opengl\game\game\renderer.h(9): note: see declaration of 'Game::Rendering' 1>c:\users\documents\opengl\game\game\renderer.cpp(35): error C2597: illegal reference to non-static member 'Game::Rendering::window' 1>c:\users\documents\opengl\game\game\renderer.cpp(36): error C2597: illegal reference to non-static member 'Game::Rendering::window' 1>c:\users\documents\opengl\game\game\renderer.cpp(43): error C2597: illegal reference to non-static member 'Game::Rendering::window' 1>Done building project "Game.vcxproj" -- FAILED. ========== Build: 0 succeeded, 1 failed, 0 up-to-date, 0 skipped ==========

Renderer.cpp
#include <GL/glew.h> #include <GLFW/glfw3.h> #include "Renderer.h" #include "Timer.h" #include <iostream> namespace Game { GLFWwindow* window; /* Initialize the library */ Rendering::Rendering() { mClock = new Clock; } Rendering::~Rendering() { shutdown(); } bool Rendering::initialize(uint width, uint height, bool fullscreen, std::string window_title) { if (!glfwInit()) { return -1; } /* Create a windowed mode window and its OpenGL context */ window = glfwCreateWindow(640, 480, "Hello World", NULL, NULL); if (!window) { glfwTerminate(); return -1; } /* Make the window's context current */ glfwMakeContextCurrent(window); glViewport(0, 0, (GLsizei)width, (GLsizei)height); glOrtho(0, (GLsizei)width, (GLsizei)height, 0, 1, -1); glMatrixMode(GL_PROJECTION); glLoadIdentity(); glfwSwapInterval(1); glEnable(GL_SMOOTH); glEnable(GL_DEPTH_TEST); glEnable(GL_BLEND); glDepthFunc(GL_LEQUAL); glHint(GL_PERSPECTIVE_CORRECTION_HINT, GL_NICEST); glEnable(GL_TEXTURE_2D); glLoadIdentity(); return true; } bool Rendering::render() { /* Loop until the user closes the window */ if (!glfwWindowShouldClose(window)) return false; /* Render here */ mClock->reset(); glfwPollEvents(); if (mClock->step()) { glClear(GL_COLOR_BUFFER_BIT | GL_DEPTH_BUFFER_BIT); glfwSwapBuffers(window); mClock->update(); } return true; } void Rendering::shutdown() { glfwDestroyWindow(window); glfwTerminate(); } GLFWwindow* Rendering::getCurrentWindow() { return window; } } Renderer.h
#pragma once namespace Game { class Clock; class Rendering { public: Rendering(); ~Rendering(); bool initialize(uint width, uint height, bool fullscreen, std::string window_title = "Rendering window"); void shutdown(); bool render(); GLFWwindow* getCurrentWindow(); private: GLFWwindow * window; Clock* mClock; }; } Timer.cpp
#include <GL/glew.h> #include <GLFW/glfw3.h> #include <time.h> #include "Timer.h" namespace Game { Clock::Clock() : mTicksPerSecond(50), mSkipTics(1000 / mTicksPerSecond), mMaxFrameSkip(10), mLoops(0) { mLastTick = tick(); } Clock::~Clock() { } bool Clock::step() { if (tick() > mLastTick && mLoops < mMaxFrameSkip) return true; return false; } void Clock::reset() { mLoops = 0; } void Clock::update() { mLastTick += mSkipTics; mLoops++; } clock_t Clock::tick() { return clock(); } } TImer.h
#pragma once #include "Common.h" namespace Game { class Clock { public: Clock(); ~Clock(); void update(); bool step(); void reset(); clock_t tick(); private: uint mTicksPerSecond; ufloat mSkipTics; uint mMaxFrameSkip; uint mLoops; uint mLastTick; }; } Common.h
#pragma once #include <cstdio> #include <cstdlib> #include <ctime> #include <cstring> #include <cmath> #include <iostream> namespace Game { typedef unsigned char uchar; typedef unsigned short ushort; typedef unsigned int uint; typedef unsigned long ulong; typedef float ufloat; }
Game.zip
• By lxjk
Hi guys,
There are many ways to do light culling in tile-based shading. I've been playing with this idea for a while, and just want to throw it out there.
Because tile frustums are general small compared to light radius, I tried using cone test to reduce false positives introduced by commonly used sphere-frustum test.
On top of that, I use distance to camera rather than depth for near/far test (aka. sliced by spheres).
This method can be naturally extended to clustered light culling as well.
The following image shows the general ideas

Performance-wise I get around 15% improvement over sphere-frustum test. You can also see how a single light performs as the following: from left to right (1) standard rendering of a point light; then tiles passed the test of (2) sphere-frustum test; (3) cone test; (4) spherical-sliced cone test

I put the details in my blog post (https://lxjk.github.io/2018/03/25/Improve-Tile-based-Light-Culling-with-Spherical-sliced-Cone.html), GLSL source code included!

Eric

• Good evening everyone!

I was wondering if there is something equivalent of  GL_NV_blend_equation_advanced for AMD?
Basically I'm trying to find more compatible version of it.

Thank you!

• Hello guys,

Please tell me!
How do I know? Why does wavefront not show for me?
I already checked I have non errors yet.
using OpenTK; using System.Collections.Generic; using System.IO; using System.Text; namespace Tutorial_08.net.sourceskyboxer { public class WaveFrontLoader { private static List<Vector3> inPositions; private static List<Vector2> inTexcoords; private static List<Vector3> inNormals; private static List<float> positions; private static List<float> texcoords; private static List<int> indices; public static RawModel LoadObjModel(string filename, Loader loader) { inPositions = new List<Vector3>(); inTexcoords = new List<Vector2>(); inNormals = new List<Vector3>(); positions = new List<float>(); texcoords = new List<float>(); indices = new List<int>(); int nextIdx = 0; using (var reader = new StreamReader(File.Open("Contents/" + filename + ".obj", FileMode.Open), Encoding.UTF8)) { string line = reader.ReadLine(); int i = reader.Read(); while (true) { string[] currentLine = line.Split(); if (currentLine[0] == "v") { Vector3 pos = new Vector3(float.Parse(currentLine[1]), float.Parse(currentLine[2]), float.Parse(currentLine[3])); inPositions.Add(pos); if (currentLine[1] == "t") { Vector2 tex = new Vector2(float.Parse(currentLine[1]), float.Parse(currentLine[2])); inTexcoords.Add(tex); } if (currentLine[1] == "n") { Vector3 nom = new Vector3(float.Parse(currentLine[1]), float.Parse(currentLine[2]), float.Parse(currentLine[3])); inNormals.Add(nom); } } if (currentLine[0] == "f") { Vector3 pos = inPositions[0]; positions.Add(pos.X); positions.Add(pos.Y); positions.Add(pos.Z); Vector2 tc = inTexcoords[0]; texcoords.Add(tc.X); texcoords.Add(tc.Y); indices.Add(nextIdx); ++nextIdx; } reader.Close(); return loader.loadToVAO(positions.ToArray(), texcoords.ToArray(), indices.ToArray()); } } } } } And It have tried other method but it can't show for me.  I am mad now. Because any OpenTK developers won't help me.
Please help me how do I fix.

And my download (mega.nz) should it is original but I tried no success...
- Add blend source and png file here I have tried tried,.....

PS: Why is our community not active? I wait very longer. Stop to lie me!
Thanks !
• Advertisement
• Advertisement

# OpenGL Cross-Platform Graphics Interface Design

This topic is 626 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

## Recommended Posts

I am working on my own project of a cross-platform graphics engine on Windows and Android using DirectX 11 and OpenGL/OpenGL ES2. Well, I do not have much experience in graphics programming so I use Unreal Engine 4 as my main source of reference. They have a cross-platform graphics interface called Rendering Hardware Interface (RHI) so my initial naive attempt is to make a basic rendering interface to create those low level graphics objects such as vertex shaders and vertex buffers on target platforms.

So far I have managed to draw some polygons on Windows with DirectX 11/OpenGL and Android with OpenGL ES2. Those low-level graphics objects I have been working on are as follows (textures haven't been added yet):

You may wonder what ShaderBond is for. Well, its DX implementation holds a D3D11InputLayout and its GL one holds an OpenGL linked program. Although OpenGL 3 introduces separate shader objects, since I aim to support OpenGL ES2 I have to avoid the use of those new features.

As I continue working on this project and examining the source code of UE4 RHI, I can't help wondering if this is a good design because I feel there are a lot of overheads and quirks due to the differences between OpenGL and DirectX. For example, DirectX is more OO and OpenGL is more like a machine with a bunch of switches, so in DirectX you can create a constant buffer and change its content directly while in OpenGL ES2 I need to keep those attributes of uniform variables and set the linked program to be the current one in use before changing their values.

Besides, although low-level rendering interfaces look like quite flexible, they have some quirks that you have to call certain functions before some other functions due to limitations on certain platforms. For instance, in UE4, before changing the content of a uniform buffer, you have to set the associate shader as the current one in use.

In addition, my current design is based on DirectX 11 and OpenGL ES 2 and bound to them. I am afraid that if I want to support different versions of DirectX and OpenGL or even other graphics APIs later on, I may need to change my interface more or less. That does not sound good.

Hence, another idea I come up with is to hide those low-level graphics objects. In the new design, I will have high-level geometry objects containing all those sources to generate the low-level objects and they will be converted into different graphics objects according to individual target APIs. That way, I should be able to minimize the overheads for consistent cross-platform interfaces since there are no such ones for low-level graphics objects. No control on low-level graphics objects is a huge disadvantage in some cases though, I guess.

Would anyone who has experience in cross-platform graphics interface design give me some opinions or suggestions over those two designs? Or even other designs?

BTW, in OpenGL ES2 there are no such things as vertex declaration or uniform buffers. They are just some custom bookkeeping objects in the design.

Edited by isatin

#### Share this post

##### Share on other sites
Advertisement

The design I use to hide the native APIs is a low-level stateless renderer: http://tiny.cc/gpuinterface

Hence, another idea I come up with is to hide those low-level graphics objects. In the new design, I will have high-level geometry objects containing all those sources to generate the low-level objects and they will be converted into different graphics objects according to individual target APIs. That way, I should be able to minimize the overheads for consistent cross-platform interfaces since there are no such ones for low-level graphics objects. No control on low-level graphics objects is a huge disadvantage in some cases though, I guess.
Yeah, this ensures the best performance on each platform, but greatly increases porting cost in the long run - every new graphics feature must be rewritten per platform.

You may wonder what ShaderBond is for. Well, its DX implementation holds a D3D11InputLayout and its GL one holds an OpenGL linked program. Although OpenGL 3 introduces separate shader objects, since I aim to support OpenGL ES2 I have to avoid the use of those new features.
That doesn't seem right. A D3D IL links the shader program to the IA stage (vertex attribute formats), similarly to a GL VAO. I'm not up to date with GLES, but the VAO state isn't part of a linked program, is it?

On that note, GLES has no UBO support?? :o

Besides, although low-level rendering interfaces look like quite flexible, they have some quirks that you have to call certain functions before some other functions due to limitations on certain platforms. For instance, in UE4, before changing the content of a uniform buffer, you have to set the associate shader as the current one in use.
That seems more like a failing of UE4 than something that's true in general.

The only "leaky abstraction" should be that GL requires projection matrices to be constructed differently due to its stupid symmetrical NDC definition.

#### Share this post

##### Share on other sites

Disclaimer: I'm coming across as quite anti-high-level platform abstractions, which isn't intentional. It's definitely still something to consider!

The implementation of these interfaces can then use direct API calls or use multi-API helpers where appropriate to minimize code duplication.

In my case of a having a low-level cross-platform API, I still also have a high-level cross-platform API too -- it's just that it's 100% using the above-mentioned multi-API helpers (being the low-level API-as-helper).
So there's kind of a continuous spectrum here between how much you abstract the low level -- e.g. from one end of the spectrum to the other:
* High-level renderer per platform, no code sharing between different implementations.
* High-level renderer per platform, but some common features are written using cross-platform helpers.
* High-level renderer is platform-agnostic, because it's entirely written using cross-platform helpers.

However, even though I'm using the bottom choice (where I completely hide the native API at the low-level and then write the high-level renderer using this low-level cross-platform API), there still is some platform-specific code in the high-level renderer.
Like you said, some algorithms might perform well on a PS4, but not on an Xbone, or vice versa -- these kinds of algorithm choices can still be made in the high-level renderer, but all the algorithms are portable by default. There's also some platform specific hints littered about the place -- e.g. the user can hint at which resources they would like to be present in ESRAM, which has an effect on Xbone but is ignored by PS4, or the user can "discard" a resource, which has an effect on Xb360 and mobile, but is ignored by D3D11 :)

There's also the case where I'm pulling features out of the low-level API and moving them to the high-level one! e.g. something like glGenerateMipmap doesn't map to hardware, so doesn't exist on many of the lower-level APIs. Currently I implement this feature as a low-level API feature, but we're currently deprecating it and moving it to a high-level API feature, built entirely using our portable low-level API. The reason we're doing this is to reduce code bloat (lots of similar code in each platform back-end), and to provide consistent performance/quality/control across every platform. This is actually quite similar to your suggestion of moving common high-level code (in a per-platform high-level design) into shared helpers, except that it's upside down!

This can also on _some_ platforms (not the common x86 ones so much, which is pretty much all of them now) have huge perf benefits; you work with things like shaders and buffers a _lot_ and having an extra layer of indirection and virtual calls to manipulate them really hurts in-order processors. Moving the interface to higher levels means that you're incurring those costs up the far less frequently-called renderer commands.
...
The easiest example is your projection matrix on D3D vs OpenGL; they are required to be different because they use different depth ranges in NDC space. Abstracting at the renderer level implicitly handles this problem (since your GLRenderer would just use a different matrix than your D3DRenderer) while abstracting on the resource level means that you have to additionally call different matrix construction routines depending on whether you're using a D3DShader or a GLShader.

There should be zero "interface cost" in a rendering abstraction, because the API to use should be a compile-time decision, not a runtime decision. Having your PS3 constantly call virtual methods in case it might want to use Window's D3D instead of Sony's GCM would just be silly (and yes, would be a performance disaster)  :wink:
Use compile-time polymorphism instead of runtime polymorphism and this isn't an issue.

There's other abstraction overheads that you can eliminate at compile time too -- e.g. often you want to create your own platform-agnostic enums that mirror the native enums, such as the fixed function blend equation, so that the user's code that configures such states can be cross-platform. If you don't want to pay the overhead of converting between your own enum values and the platform-specific ones, you can conditionally define them based on the current build type:

//blah.h
namespace BlendEquation{ enum Type
{
#if defined(BUILD_D3D11) || defined(BUILD_D3D12) || defined(BUILD_D3D9)
Add = 1,
Sub = 2,
RevSub = 3,
Min = 4,
Max = 5,
#elif defined(BUILD_OPENGL)
...
#endif
};}

// blah_d3d11.cpp
// At compile time, make sure that our platform-agnostic enum matches the native enum values so that it's safe to cast between them:
STATIC_ASSERT( BlendEquation::Add     == D3D11_BLEND_OP_ADD );
STATIC_ASSERT( BlendEquation::Sub     == D3D11_BLEND_OP_SUBTRACT );
STATIC_ASSERT( BlendEquation::RevSub  == D3D11_BLEND_OP_REV_SUBTRACT );
STATIC_ASSERT( BlendEquation::Min     == D3D11_BLEND_OP_MIN );
STATIC_ASSERT( BlendEquation::Max     == D3D11_BLEND_OP_MAX );

With the GL/D3D projection matrix difference, again you can solve this with an ifdef inside your functions that create projection matrices. There's no need to query the shader/device/etc at runtime to decide which format to use, because it's a compile-time decision.

Finally, there's just a ****load of complexity to abstracting certain resources (e.g. shaders, pipeline state object, command lists, etc.) and it can just result in a cleaner, easier-to-read codebase with less abstraction and duplication if you abstract at the renderer level instead of the resource level. Your D3D12Renderer can directly make use of D3D12 command lists while your GLRenderer and D3D9Renderer can either do their own thing for multi-threaded rendering or just not even pretend to support the feature.

You don't have to abstract things at the exact same level as the underlying API, e.g.
* GL has linked programs, covering all stages, but D3D9/11 allows each stage to be set individually without a linking step -- my abstraction copies GL in having a shader "program" covering all stages.

* GL2 and D3D9 don't have UBO's, but I emulate them anyway because it's a nicer abstraction than having uniforms being tied to a particular "shader instance".

** On this note -- The actual UBO implementations vary quite a bit - meaning this "low level" abstraction is still actually quite high-level above the hardware! :)

*** On a particular console from this GL2 era (where uniforms didn't exist in hardware), I've actually got to constantly create new copies of each shader program and patch in "copy literal value to register" instructions into those shaders from the user's UBOs!

*** On another platform, the user's UBO's are just plain old memory from malloc, and when they bind a UBO, I memcpy it into a per-frame ring-buffer of constant data, containing a tightly packed array of all the constants used in this frame, and then bind a pointer into this buffer as the actual native UBO. On another console it's the same, but the memory is allocated out of the actual native command buffer itself!
* GL has fine-grained state, D3D11 has coarse state, D3D12 has PSOs -- I expose a coarse-grained (D11 style) state setting abstraction on every platform, and resolve it at draw-item creation time, keeping draw-item submission as fast as possible.
* Command lists - GL/D3D9 don't really have them (without extensions or emulation) - a low-level abstraction can still have optional features. Some of these can be known at compile time (if building for D3D9 or PS3, I don't have geometry shaders), and/or queryable at runtime so that the high-level renderer can choose different algorithms. In D3D9/GL, I implement my own command lists, but in my "capability querying" API, I inform the user (the high level renderer) that these are emulated so should not be preferred unlike D3D12/vulkan command lists.

I just work with graphics professionals, pick their brains a lot, and tinker with graphics occasionally in my free time.

I'd be keen to hear what your coworkers think about a low-level stateless renderer as in my link above :D
The DrawItem concept scales perfectly between fine-grained state APIs (D3D9/GL), coarse-grained state APIs (D3D11) and PSO APIs (D3D12/Vulkan), and also hides the complexity of PSO management from the user in a very performant manner (PSO lookup is done once when preparing a draw-item, and then submission is cheap). It's also pretty performant across the board - I'm getting something like 3k draws per ms on D3D11 at the moment :)

Edited by Hodgman

#### Share this post

##### Share on other sites

The design I use to hide the native APIs is a low-level stateless renderer: http://tiny.cc/gpuinterface

Hence, another idea I come up with is to hide those low-level graphics objects. In the new design, I will have high-level geometry objects containing all those sources to generate the low-level objects and they will be converted into different graphics objects according to individual target APIs. That way, I should be able to minimize the overheads for consistent cross-platform interfaces since there are no such ones for low-level graphics objects. No control on low-level graphics objects is a huge disadvantage in some cases though, I guess.
Yeah, this ensures the best performance on each platform, but greatly increases porting cost in the long run - every new graphics feature must be rewritten per platform.

You may wonder what ShaderBond is for. Well, its DX implementation holds a D3D11InputLayout and its GL one holds an OpenGL linked program. Although OpenGL 3 introduces separate shader objects, since I aim to support OpenGL ES2 I have to avoid the use of those new features.
That doesn't seem right. A D3D IL links the shader program to the IA stage (vertex attribute formats), similarly to a GL VAO. I'm not up to date with GLES, but the VAO state isn't part of a linked program, is it?

On that note, GLES has no UBO support?? :o

Besides, although low-level rendering interfaces look like quite flexible, they have some quirks that you have to call certain functions before some other functions due to limitations on certain platforms. For instance, in UE4, before changing the content of a uniform buffer, you have to set the associate shader as the current one in use.
That seems more like a failing of UE4 than something that's true in general.

The only "leaky abstraction" should be that GL requires projection matrices to be constructed differently due to its stupid symmetrical NDC definition.

Thanks for the slides. I don't have the knowledge to understand most of them though.  :)

I have some questions about your draw items. They are stateless because they set all render states except render targets? Dosen't that produce redundant graphics function calls? Or you sort them by render states?

I am not sure what you meant by "That doesn't seem right." ShaderBond is an object I made up originally for OpenGL linked programs, and I also need an object to put InputLayout in so I chose it. By the way, I do not use VAO because my phone is too old so I have to aim for OpenGL ES 2. OpenGL ES 3 seems to support UBO, but I am 100% not sure. I think I will push it to OpenGL ES3 and give up ES2 support after changing my phone. Updating uniform variables requires much more function calls than uniform buffers.

Edited by isatin

#### Share this post

##### Share on other sites

D3D12 has PSOs -- I expose a coarse-grained (D11 style) state setting abstraction on every platform, and resolve it at draw-item creation time, keeping draw-item submission as fast as possible

Be careful with this, PSO generation can be expensive! We did this as well at first for D3D12 as a first step, but it had the tendency of introducing some pretty nasty frame time spikes whenever a specific PSO configuration was encountered for the first time (so no cached PSO was available). Encountering a single new PSO per frame is not that big of a deal, but encountering multiple can cause trouble. We ended up treating PSO descriptors as data so we could generate them up front at load time, as was recommended to us. Additionally the recommendation seems to be to cache any compiled PSOs on the user's machine so you don't have to recompile them on additional runs of the game.

I've been looking into designing a stateless cross-platform rendering API as well with D3D12 as a first-class citizen, but the whole root signature and PSO aspect of it is making this pretty challenging.

Ideally you'd want a system which can figure out exactly which PSOs will be required from within your content pipeline so you can generate metadata for them up front. I've been playing around with the idea of building a data-driven rendering pipeline which allows you to have knowledge about specific render passes and systems up front (i.e. they're defined in data, not code).

With a system like that you can already solve certain parts of the PSO puzzle, as it can specify RTV/DSV formats, Rasterizer/Depth/Blend state override settings, shader overrides, multi-sampling options, etc. In addition to this your material library can provide you with other required data such as shader programs, root signatures, and rasterizer/depth/blend state descriptors. You can assign material objects to be compatible with specific render passes to make sure that you only generate the PSOs you specifically need.

For the last piece of the puzzle you can look at any geometry data referencing the materials in your material library to determine all required input layouts, index buffer formats/conventions and primitive topology types. For those of you wondering what I do about stream-output: I actually don't have a proper solution for it, and I often just like to ignore the fact that stream-output functionality exists. Most of the stuff we used to do using stream-out has been moved to compute based solutions anyway.

Ideally something like this should get you 99.9% of the way towards eliminating runtime PSO generation. There's definitely going to be some exceptions like procedurally generated geometry in code which requires a specific PSO, PSOs for debug rendering functionality which won't end up in your shipping build, and other stuff like that. Ideally you'll know about those up front so you can build them at load time, but if that's not the case it shouldn't be a huge deal to generate them at draw time.

#### Share this post

##### Share on other sites

Be careful with this, PSO generation can be expensive!
Ideally you'd want a system which can figure out exactly which PSOs will be required from within your content pipeline so you can generate metadata for them up front. I've been playing around with the idea of building a data-driven rendering pipeline which allows you to have knowledge about specific render passes and systems up front (i.e. they're defined in data, not code). With a system like that you can already solve certain parts of the PSO puzzle, as it can specify RTV/DSV formats, Rasterizer/Depth/Blend state override settings, shader overrides, multi-sampling options, etc. In addition to this your material library can provide you with other required data such as shader programs, root signatures, and rasterizer/depth/blend state descriptors. You can assign material objects to be compatible with specific render passes to make sure that you only generate the PSOs you specifically need.

Yeah I warm up the PSO cache immediately after loading the game's shader archive from disk - on the initial loading screen.

There's already other platforms that require the fixed-function blend state, input assembler layout, and depth-stencil/render-target formats to be known at shader-compilation time, so these were kind of a precursor to the PSO data pipeline problem. I deal with this by forcing shader authors to annotate which fixed function states, which vertex-buffer layouts, and which render-target formats it's valid to use their shader with.
On platforms with no input-assembler, this allows you to compile permutations of the VS with the vertex-buffer decoding hard-coded in the VS.
On platforms with no fixed-function blend, this allows you to compile permutations of the PS with the blend logic appended.
On platforms with limited fixed-function render-target format conversion, , this allows you to compile permutations of the PS with format conversion logic appended.
On PSO platforms, this data also lets you warm up your PSO cache pessimistically :)

Alternatively, you can log the PSO's that get used in a play-through (or the combination of Dx11-style coarse states that were used with each shader), and use this logged information to construct a PSO cache on the user's machine the first time they start the game. That kind of system is always prone to accidentally missing a particular combination in your logged play-through though, so you'd have to make it gracefully deal with cache-misses and the associated framerate hitch :(

#### Share this post

##### Share on other sites

Alternatively, you can log the PSO's that get used in a play-through (or the combination of Dx11-style coarse states that were used with each shader), and use this logged information to construct a PSO cache on the user's machine the first time they start the game. That kind of system is always prone to accidentally missing a particular combination in your logged play-through though, so you'd have to make it gracefully deal with cache-misses and the associated framerate hitch

Sadly enough this is what we had to resort to in the end :(. I would've loved to have done a proper implementation, but you know how these kinds of things go when trying to meet a deadline. This particular title was not written with D3D12 in mind, and we didn't have the time or resources to re-architect it to be D3D12-friendly.

Most of my work is on PC titles and the occasional current-gen console title, so I generally don't have to deal with the cases you mentioned above. Having some of these tougher restrictions forced on you up-front actually does work out nicely in this situation!

<thread_derail>

On platforms with no input-assembler, this allows you to compile permutations of the VS with the vertex-buffer decoding hard-coded in the VS.

Recently I've been seeing more and more implementations which bypass the input assembler (and input layouts) entirely, instead opting to use a structured buffer to provide vertex data to the vertex shader. Adopting this approach globally would definitely simplify PSO generation. You could take geometry data out of the equation by defining some conventions on topology and index buffers. I do remember reading about some architectures already doing this under the hood to emulate the input assembler stage, but I'm afraid I don't remember specifics. I wonder whether there'd be any major downsides to taking this approach globally.

</thread_derail>

#### Share this post

##### Share on other sites

So there's kind of a continuous spectrum here between how much you abstract the low level

Agreed. I lean towards the high-level. For common low-level code, I've preferred the helper approach - inverted dependencies.

There should be zero "interface cost" in a rendering abstraction, because the API to use should be a compile-time decision, not a runtime decision.

Perhaps. In release, and then on non-PC. Certainly for certain old consoles the virtual cost was high, but everything these days has very little cost to virtual functions (unless you're doing something dumb that thrashes caches).

In development on PC, it's particularly handy IMO being able to toggle out rendering backends at runtime. Makes AB testing easy, avoids needing to recompile to do headless tests, etc.. :)

Release builds and platforms with specific implementation can - via a little careful design and some macros - use static polymorphism.

In my "day job engine" we have a very large quantity of interfaces. Way way too many, IMO. And largely for reasons that have almost nothing to do with polymorphism (e.g. the vast majority have only one implementation). The folks with PPC console experience scream to high heaven about it all the time, though it's really not a big deal on x86 hardware. Apparently it just hasn't been a big problem on the older consoles, either; the engine's been in use for 3-4 generations.

I'd be keen to hear what your coworkers think about a low-level stateless renderer as in my link above

A good deal of what you said echoes them. :)

I oversimplified a lot, but then a forum post can't do a good job of boiling down a graphics architecture; if it could, we'd all be competing for game engineering jobs with 13 year olds. :P

So far as command lists, I do know for sure that we use a high level abstraction there running over the low-level APIs. Outside of the graphics engine itself, everything is abstracted to the points I mentioned earlier: meshes, materials, cameras, post-process effects, etc. Hardware abstractions are present but used to the exact extent that they simplify the renderer and don't "leak" outside our render library.

I imagine the same is true for your architecture; we might be talking slightly past each other and concentrating on minute details?

#### Share this post

##### Share on other sites

• Advertisement