Jump to content
  • Advertisement
Sign in to follow this  
isatin

OpenGL Cross-Platform Graphics Interface Design

This topic is 865 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

I am working on my own project of a cross-platform graphics engine on Windows and Android using DirectX 11 and OpenGL/OpenGL ES2. Well, I do not have much experience in graphics programming so I use Unreal Engine 4 as my main source of reference. They have a cross-platform graphics interface called Rendering Hardware Interface (RHI) so my initial naive attempt is to make a basic rendering interface to create those low level graphics objects such as vertex shaders and vertex buffers on target platforms.
engine.png

 

So far I have managed to draw some polygons on Windows with DirectX 11/OpenGL and Android with OpenGL ES2. Those low-level graphics objects I have been working on are as follows (textures haven't been added yet):

graphics-objects.png

 

You may wonder what ShaderBond is for. Well, its DX implementation holds a D3D11InputLayout and its GL one holds an OpenGL linked program. Although OpenGL 3 introduces separate shader objects, since I aim to support OpenGL ES2 I have to avoid the use of those new features. 

 

As I continue working on this project and examining the source code of UE4 RHI, I can't help wondering if this is a good design because I feel there are a lot of overheads and quirks due to the differences between OpenGL and DirectX. For example, DirectX is more OO and OpenGL is more like a machine with a bunch of switches, so in DirectX you can create a constant buffer and change its content directly while in OpenGL ES2 I need to keep those attributes of uniform variables and set the linked program to be the current one in use before changing their values.

 

Besides, although low-level rendering interfaces look like quite flexible, they have some quirks that you have to call certain functions before some other functions due to limitations on certain platforms. For instance, in UE4, before changing the content of a uniform buffer, you have to set the associate shader as the current one in use. 

 

In addition, my current design is based on DirectX 11 and OpenGL ES 2 and bound to them. I am afraid that if I want to support different versions of DirectX and OpenGL or even other graphics APIs later on, I may need to change my interface more or less. That does not sound good.

 

Hence, another idea I come up with is to hide those low-level graphics objects. In the new design, I will have high-level geometry objects containing all those sources to generate the low-level objects and they will be converted into different graphics objects according to individual target APIs. That way, I should be able to minimize the overheads for consistent cross-platform interfaces since there are no such ones for low-level graphics objects. No control on low-level graphics objects is a huge disadvantage in some cases though, I guess. 

 

Would anyone who has experience in cross-platform graphics interface design give me some opinions or suggestions over those two designs? Or even other designs?

 

new-design.png

 

 

BTW, in OpenGL ES2 there are no such things as vertex declaration or uniform buffers. They are just some custom bookkeeping objects in the design.

Edited by isatin

Share this post


Link to post
Share on other sites
Advertisement

The design I use to hide the native APIs is a low-level stateless renderer: http://tiny.cc/gpuinterface

Hence, another idea I come up with is to hide those low-level graphics objects. In the new design, I will have high-level geometry objects containing all those sources to generate the low-level objects and they will be converted into different graphics objects according to individual target APIs. That way, I should be able to minimize the overheads for consistent cross-platform interfaces since there are no such ones for low-level graphics objects. No control on low-level graphics objects is a huge disadvantage in some cases though, I guess. 
 Yeah, this ensures the best performance on each platform, but greatly increases porting cost in the long run - every new graphics feature must be rewritten per platform.

You may wonder what ShaderBond is for. Well, its DX implementation holds a D3D11InputLayout and its GL one holds an OpenGL linked program. Although OpenGL 3 introduces separate shader objects, since I aim to support OpenGL ES2 I have to avoid the use of those new features. 
That doesn't seem right. A D3D IL links the shader program to the IA stage (vertex attribute formats), similarly to a GL VAO. I'm not up to date with GLES, but the VAO state isn't part of a linked program, is it?

On that note, GLES has no UBO support?? :o

 

Besides, although low-level rendering interfaces look like quite flexible, they have some quirks that you have to call certain functions before some other functions due to limitations on certain platforms. For instance, in UE4, before changing the content of a uniform buffer, you have to set the associate shader as the current one in use.
That seems more like a failing of UE4 than something that's true in general.

The only "leaky abstraction" should be that GL requires projection matrices to be constructed differently due to its stupid symmetrical NDC definition.

Share this post


Link to post
Share on other sites

Disclaimer: I'm coming across as quite anti-high-level platform abstractions, which isn't intentional. It's definitely still something to consider!

The implementation of these interfaces can then use direct API calls or use multi-API helpers where appropriate to minimize code duplication.

In my case of a having a low-level cross-platform API, I still also have a high-level cross-platform API too -- it's just that it's 100% using the above-mentioned multi-API helpers (being the low-level API-as-helper).
So there's kind of a continuous spectrum here between how much you abstract the low level -- e.g. from one end of the spectrum to the other:
* High-level renderer per platform, no code sharing between different implementations.
* High-level renderer per platform, but some common features are written using cross-platform helpers.
* High-level renderer is platform-agnostic, because it's entirely written using cross-platform helpers.
 
However, even though I'm using the bottom choice (where I completely hide the native API at the low-level and then write the high-level renderer using this low-level cross-platform API), there still is some platform-specific code in the high-level renderer.
Like you said, some algorithms might perform well on a PS4, but not on an Xbone, or vice versa -- these kinds of algorithm choices can still be made in the high-level renderer, but all the algorithms are portable by default. There's also some platform specific hints littered about the place -- e.g. the user can hint at which resources they would like to be present in ESRAM, which has an effect on Xbone but is ignored by PS4, or the user can "discard" a resource, which has an effect on Xb360 and mobile, but is ignored by D3D11 :)
 

There's also the case where I'm pulling features out of the low-level API and moving them to the high-level one! e.g. something like glGenerateMipmap doesn't map to hardware, so doesn't exist on many of the lower-level APIs. Currently I implement this feature as a low-level API feature, but we're currently deprecating it and moving it to a high-level API feature, built entirely using our portable low-level API. The reason we're doing this is to reduce code bloat (lots of similar code in each platform back-end), and to provide consistent performance/quality/control across every platform. This is actually quite similar to your suggestion of moving common high-level code (in a per-platform high-level design) into shared helpers, except that it's upside down!

This can also on _some_ platforms (not the common x86 ones so much, which is pretty much all of them now) have huge perf benefits; you work with things like shaders and buffers a _lot_ and having an extra layer of indirection and virtual calls to manipulate them really hurts in-order processors. Moving the interface to higher levels means that you're incurring those costs up the far less frequently-called renderer commands.
...
The easiest example is your projection matrix on D3D vs OpenGL; they are required to be different because they use different depth ranges in NDC space. Abstracting at the renderer level implicitly handles this problem (since your GLRenderer would just use a different matrix than your D3DRenderer) while abstracting on the resource level means that you have to additionally call different matrix construction routines depending on whether you're using a D3DShader or a GLShader.

There should be zero "interface cost" in a rendering abstraction, because the API to use should be a compile-time decision, not a runtime decision. Having your PS3 constantly call virtual methods in case it might want to use Window's D3D instead of Sony's GCM would just be silly (and yes, would be a performance disaster)  :wink:
Use compile-time polymorphism instead of runtime polymorphism and this isn't an issue.

There's other abstraction overheads that you can eliminate at compile time too -- e.g. often you want to create your own platform-agnostic enums that mirror the native enums, such as the fixed function blend equation, so that the user's code that configures such states can be cross-platform. If you don't want to pay the overhead of converting between your own enum values and the platform-specific ones, you can conditionally define them based on the current build type:

//blah.h
namespace BlendEquation{ enum Type
{
#if defined(BUILD_D3D11) || defined(BUILD_D3D12) || defined(BUILD_D3D9)
	Add = 1,
	Sub = 2,
	RevSub = 3,
	Min = 4,
	Max = 5,
#elif defined(BUILD_OPENGL)
...
#endif
};}

// blah_d3d11.cpp
// At compile time, make sure that our platform-agnostic enum matches the native enum values so that it's safe to cast between them:
STATIC_ASSERT( BlendEquation::Add     == D3D11_BLEND_OP_ADD );
STATIC_ASSERT( BlendEquation::Sub     == D3D11_BLEND_OP_SUBTRACT );
STATIC_ASSERT( BlendEquation::RevSub  == D3D11_BLEND_OP_REV_SUBTRACT );
STATIC_ASSERT( BlendEquation::Min     == D3D11_BLEND_OP_MIN );
STATIC_ASSERT( BlendEquation::Max     == D3D11_BLEND_OP_MAX );

With the GL/D3D projection matrix difference, again you can solve this with an ifdef inside your functions that create projection matrices. There's no need to query the shader/device/etc at runtime to decide which format to use, because it's a compile-time decision.

 

Finally, there's just a ****load of complexity to abstracting certain resources (e.g. shaders, pipeline state object, command lists, etc.) and it can just result in a cleaner, easier-to-read codebase with less abstraction and duplication if you abstract at the renderer level instead of the resource level. Your D3D12Renderer can directly make use of D3D12 command lists while your GLRenderer and D3D9Renderer can either do their own thing for multi-threaded rendering or just not even pretend to support the feature.

You don't have to abstract things at the exact same level as the underlying API, e.g.
* GL has linked programs, covering all stages, but D3D9/11 allows each stage to be set individually without a linking step -- my abstraction copies GL in having a shader "program" covering all stages.

* GL2 and D3D9 don't have UBO's, but I emulate them anyway because it's a nicer abstraction than having uniforms being tied to a particular "shader instance".

** On this note -- The actual UBO implementations vary quite a bit - meaning this "low level" abstraction is still actually quite high-level above the hardware! :)

*** On a particular console from this GL2 era (where uniforms didn't exist in hardware), I've actually got to constantly create new copies of each shader program and patch in "copy literal value to register" instructions into those shaders from the user's UBOs!

*** On another platform, the user's UBO's are just plain old memory from malloc, and when they bind a UBO, I memcpy it into a per-frame ring-buffer of constant data, containing a tightly packed array of all the constants used in this frame, and then bind a pointer into this buffer as the actual native UBO. On another console it's the same, but the memory is allocated out of the actual native command buffer itself!
* GL has fine-grained state, D3D11 has coarse state, D3D12 has PSOs -- I expose a coarse-grained (D11 style) state setting abstraction on every platform, and resolve it at draw-item creation time, keeping draw-item submission as fast as possible.
* Command lists - GL/D3D9 don't really have them (without extensions or emulation) - a low-level abstraction can still have optional features. Some of these can be known at compile time (if building for D3D9 or PS3, I don't have geometry shaders), and/or queryable at runtime so that the high-level renderer can choose different algorithms. In D3D9/GL, I implement my own command lists, but in my "capability querying" API, I inform the user (the high level renderer) that these are emulated so should not be preferred unlike D3D12/vulkan command lists.

I just work with graphics professionals, pick their brains a lot, and tinker with graphics occasionally in my free time.

I'd be keen to hear what your coworkers think about a low-level stateless renderer as in my link above :D
The DrawItem concept scales perfectly between fine-grained state APIs (D3D9/GL), coarse-grained state APIs (D3D11) and PSO APIs (D3D12/Vulkan), and also hides the complexity of PSO management from the user in a very performant manner (PSO lookup is done once when preparing a draw-item, and then submission is cheap). It's also pretty performant across the board - I'm getting something like 3k draws per ms on D3D11 at the moment :)

Edited by Hodgman

Share this post


Link to post
Share on other sites

The design I use to hide the native APIs is a low-level stateless renderer: http://tiny.cc/gpuinterface

 

 

Hence, another idea I come up with is to hide those low-level graphics objects. In the new design, I will have high-level geometry objects containing all those sources to generate the low-level objects and they will be converted into different graphics objects according to individual target APIs. That way, I should be able to minimize the overheads for consistent cross-platform interfaces since there are no such ones for low-level graphics objects. No control on low-level graphics objects is a huge disadvantage in some cases though, I guess. 
 Yeah, this ensures the best performance on each platform, but greatly increases porting cost in the long run - every new graphics feature must be rewritten per platform.

 

 

You may wonder what ShaderBond is for. Well, its DX implementation holds a D3D11InputLayout and its GL one holds an OpenGL linked program. Although OpenGL 3 introduces separate shader objects, since I aim to support OpenGL ES2 I have to avoid the use of those new features. 
That doesn't seem right. A D3D IL links the shader program to the IA stage (vertex attribute formats), similarly to a GL VAO. I'm not up to date with GLES, but the VAO state isn't part of a linked program, is it?

On that note, GLES has no UBO support?? :o

 

 

Besides, although low-level rendering interfaces look like quite flexible, they have some quirks that you have to call certain functions before some other functions due to limitations on certain platforms. For instance, in UE4, before changing the content of a uniform buffer, you have to set the associate shader as the current one in use.
That seems more like a failing of UE4 than something that's true in general.

The only "leaky abstraction" should be that GL requires projection matrices to be constructed differently due to its stupid symmetrical NDC definition.

 

Thanks for the slides. I don't have the knowledge to understand most of them though.  :)

I have some questions about your draw items. They are stateless because they set all render states except render targets? Dosen't that produce redundant graphics function calls? Or you sort them by render states?

 

I am not sure what you meant by "That doesn't seem right." ShaderBond is an object I made up originally for OpenGL linked programs, and I also need an object to put InputLayout in so I chose it. By the way, I do not use VAO because my phone is too old so I have to aim for OpenGL ES 2. OpenGL ES 3 seems to support UBO, but I am 100% not sure. I think I will push it to OpenGL ES3 and give up ES2 support after changing my phone. Updating uniform variables requires much more function calls than uniform buffers.

Edited by isatin

Share this post


Link to post
Share on other sites

D3D12 has PSOs -- I expose a coarse-grained (D11 style) state setting abstraction on every platform, and resolve it at draw-item creation time, keeping draw-item submission as fast as possible
 

 

Be careful with this, PSO generation can be expensive! We did this as well at first for D3D12 as a first step, but it had the tendency of introducing some pretty nasty frame time spikes whenever a specific PSO configuration was encountered for the first time (so no cached PSO was available). Encountering a single new PSO per frame is not that big of a deal, but encountering multiple can cause trouble. We ended up treating PSO descriptors as data so we could generate them up front at load time, as was recommended to us. Additionally the recommendation seems to be to cache any compiled PSOs on the user's machine so you don't have to recompile them on additional runs of the game.

 

I've been looking into designing a stateless cross-platform rendering API as well with D3D12 as a first-class citizen, but the whole root signature and PSO aspect of it is making this pretty challenging.

 

Ideally you'd want a system which can figure out exactly which PSOs will be required from within your content pipeline so you can generate metadata for them up front. I've been playing around with the idea of building a data-driven rendering pipeline which allows you to have knowledge about specific render passes and systems up front (i.e. they're defined in data, not code).

With a system like that you can already solve certain parts of the PSO puzzle, as it can specify RTV/DSV formats, Rasterizer/Depth/Blend state override settings, shader overrides, multi-sampling options, etc. In addition to this your material library can provide you with other required data such as shader programs, root signatures, and rasterizer/depth/blend state descriptors. You can assign material objects to be compatible with specific render passes to make sure that you only generate the PSOs you specifically need.

For the last piece of the puzzle you can look at any geometry data referencing the materials in your material library to determine all required input layouts, index buffer formats/conventions and primitive topology types. For those of you wondering what I do about stream-output: I actually don't have a proper solution for it, and I often just like to ignore the fact that stream-output functionality exists. Most of the stuff we used to do using stream-out has been moved to compute based solutions anyway.

 

Ideally something like this should get you 99.9% of the way towards eliminating runtime PSO generation. There's definitely going to be some exceptions like procedurally generated geometry in code which requires a specific PSO, PSOs for debug rendering functionality which won't end up in your shipping build, and other stuff like that. Ideally you'll know about those up front so you can build them at load time, but if that's not the case it shouldn't be a huge deal to generate them at draw time.

Share this post


Link to post
Share on other sites

Be careful with this, PSO generation can be expensive!
Ideally you'd want a system which can figure out exactly which PSOs will be required from within your content pipeline so you can generate metadata for them up front. I've been playing around with the idea of building a data-driven rendering pipeline which allows you to have knowledge about specific render passes and systems up front (i.e. they're defined in data, not code). With a system like that you can already solve certain parts of the PSO puzzle, as it can specify RTV/DSV formats, Rasterizer/Depth/Blend state override settings, shader overrides, multi-sampling options, etc. In addition to this your material library can provide you with other required data such as shader programs, root signatures, and rasterizer/depth/blend state descriptors. You can assign material objects to be compatible with specific render passes to make sure that you only generate the PSOs you specifically need.

Yeah I warm up the PSO cache immediately after loading the game's shader archive from disk - on the initial loading screen.

There's already other platforms that require the fixed-function blend state, input assembler layout, and depth-stencil/render-target formats to be known at shader-compilation time, so these were kind of a precursor to the PSO data pipeline problem. I deal with this by forcing shader authors to annotate which fixed function states, which vertex-buffer layouts, and which render-target formats it's valid to use their shader with.
On platforms with no input-assembler, this allows you to compile permutations of the VS with the vertex-buffer decoding hard-coded in the VS.
On platforms with no fixed-function blend, this allows you to compile permutations of the PS with the blend logic appended.
On platforms with limited fixed-function render-target format conversion, , this allows you to compile permutations of the PS with format conversion logic appended.
On PSO platforms, this data also lets you warm up your PSO cache pessimistically :)

 

Alternatively, you can log the PSO's that get used in a play-through (or the combination of Dx11-style coarse states that were used with each shader), and use this logged information to construct a PSO cache on the user's machine the first time they start the game. That kind of system is always prone to accidentally missing a particular combination in your logged play-through though, so you'd have to make it gracefully deal with cache-misses and the associated framerate hitch :(

Share this post


Link to post
Share on other sites

Alternatively, you can log the PSO's that get used in a play-through (or the combination of Dx11-style coarse states that were used with each shader), and use this logged information to construct a PSO cache on the user's machine the first time they start the game. That kind of system is always prone to accidentally missing a particular combination in your logged play-through though, so you'd have to make it gracefully deal with cache-misses and the associated framerate hitch

 

Sadly enough this is what we had to resort to in the end :(. I would've loved to have done a proper implementation, but you know how these kinds of things go when trying to meet a deadline. This particular title was not written with D3D12 in mind, and we didn't have the time or resources to re-architect it to be D3D12-friendly.

 

Most of my work is on PC titles and the occasional current-gen console title, so I generally don't have to deal with the cases you mentioned above. Having some of these tougher restrictions forced on you up-front actually does work out nicely in this situation!

 

<thread_derail>

On platforms with no input-assembler, this allows you to compile permutations of the VS with the vertex-buffer decoding hard-coded in the VS.
 

Recently I've been seeing more and more implementations which bypass the input assembler (and input layouts) entirely, instead opting to use a structured buffer to provide vertex data to the vertex shader. Adopting this approach globally would definitely simplify PSO generation. You could take geometry data out of the equation by defining some conventions on topology and index buffers. I do remember reading about some architectures already doing this under the hood to emulate the input assembler stage, but I'm afraid I don't remember specifics. I wonder whether there'd be any major downsides to taking this approach globally.

</thread_derail>

Share this post


Link to post
Share on other sites

So there's kind of a continuous spectrum here between how much you abstract the low level


Agreed. I lean towards the high-level. For common low-level code, I've preferred the helper approach - inverted dependencies.

There should be zero "interface cost" in a rendering abstraction, because the API to use should be a compile-time decision, not a runtime decision.


Perhaps. In release, and then on non-PC. Certainly for certain old consoles the virtual cost was high, but everything these days has very little cost to virtual functions (unless you're doing something dumb that thrashes caches).

In development on PC, it's particularly handy IMO being able to toggle out rendering backends at runtime. Makes AB testing easy, avoids needing to recompile to do headless tests, etc.. :)

Release builds and platforms with specific implementation can - via a little careful design and some macros - use static polymorphism.

In my "day job engine" we have a very large quantity of interfaces. Way way too many, IMO. And largely for reasons that have almost nothing to do with polymorphism (e.g. the vast majority have only one implementation). The folks with PPC console experience scream to high heaven about it all the time, though it's really not a big deal on x86 hardware. Apparently it just hasn't been a big problem on the older consoles, either; the engine's been in use for 3-4 generations.

I'd be keen to hear what your coworkers think about a low-level stateless renderer as in my link above


A good deal of what you said echoes them. :)

I oversimplified a lot, but then a forum post can't do a good job of boiling down a graphics architecture; if it could, we'd all be competing for game engineering jobs with 13 year olds. :P

So far as command lists, I do know for sure that we use a high level abstraction there running over the low-level APIs. Outside of the graphics engine itself, everything is abstracted to the points I mentioned earlier: meshes, materials, cameras, post-process effects, etc. Hardware abstractions are present but used to the exact extent that they simplify the renderer and don't "leak" outside our render library.

I imagine the same is true for your architecture; we might be talking slightly past each other and concentrating on minute details?

Share this post


Link to post
Share on other sites
Sign in to follow this  

  • Advertisement
×

Important Information

By using GameDev.net, you agree to our community Guidelines, Terms of Use, and Privacy Policy.

GameDev.net is your game development community. Create an account for your GameDev Portfolio and participate in the largest developer community in the games industry.

Sign me up!