• Advertisement
Sign in to follow this  

OpenGL OpenGL and Video Card Support

This topic is 2610 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

Hi there,

Due to how different video card manufacturers handle OpenGL (especially with regards to the presence of multiple video cards in the system), I am having to write some code that handles OpenGL based on the particular type of video card(s) that it will make use of in my game engine.

Generally speaking, nVidia and ATI/AMD are the only manufacturers of "true" video cards if I am correct? I know motherboards, laptops and such often have their own built in "fake" video cards (I'm assuming I won't have any problems with those since there is only one(?) of them anyway), so if nVidia and ATI/AMD are all that I need to worry about aside from that, then this should be relatively easy.

I can't think of any other video card manufacturers than nVidia and ATI/AMD off the top of my head.

Is there a list of possible values for glGetString GL_RENDERER and GL_VENDOR, or does OpenGL retrieve this information from somewhere else?

I have three ATI/AMD video cards hooked up to this system. Initialization of my game engine currently logs the following:

0056 Querying OpenGL version information...
0057 Renderer: ATI Radeon HD 4550
0058 Vendor: ATI Technologies Inc.
0059 Version: 3.3.10317 Compatibility Profile Context

When I initialize my game engine on my laptop, I get this:

0048 Querying OpenGL version information...
0049 Renderer: Intel Cantiga
0050 Vendor: Intel
0051 Version: 2.0.0 - Build

This is what I mean by "fake" video cards (I'm sure there is a technically correct term for this).

So basically speaking, I need to know if there are any other considerations I need to take into account aside from nVidia and ATI/AMD's differing support for OpenGL, which is under the assumption that there are no any other types of "true" video card manufacturers aside from them.

Also, is it correct that you can't mix multiple video cards from nVidia and ATI/AMD in a system? If so, does this also extend to the video cards' model, series or GPU?

Any feedback would be appreciated.

- Sig

Share this post

Link to post
Share on other sites
i believe the term you are looking for is "integrated". I know nvidia makes integrated cards as well (and I believe they are decent), I think ATI might too. why are you discounting them? that output shows your laptop is Opengl 2.0 capable (unless I'm missing something).

I think some high end motherboards allow you to combine to cards from different manufacturers. I don't know much about this though.

Share this post

Link to post
Share on other sites
Original post by PATrainwreck
i believe the term you are looking for is "integrated". I know nvidia makes integrated cards as well (and I believe they are decent), I think ATI might too. why are you discounting them?

I'm not discounting them.

OpenGL capability isn't the problem I'm trying to tackle, but rather differences in OpenGL implementation in general.

Share this post

Link to post
Share on other sites
If you restrict yourself to only functionality available in OpenGL 2.0, then the same program should work on every computer capable of at least OpenGL 2.0. If you want to use the features in OpenGL 3.3, then only 3.3 capable systems can run it. The basic OpenGL functions in gl.h/opengl32.lib will probably run on anything in the last 10 years or more. OpenGL 2.0 should run on anything reasonably modern. 3.x will run on NVidia/AMD from the last years if they have updated drivers.
So choose the version you want that runs on all your target hardware.

As for differences in the implementations of the same GL version, it's hard to say. You just have to test your program on both NVidia and AMD if you want to be sure.

It is possible to have several graphics cards from different vendors, but I believe it can have some problems and I don't know anyone who actually does it. It's not something you need to worry about when making a game.

Share this post

Link to post
Share on other sites
Original post by Sigvatr
OpenGL capability isn't the problem I'm trying to tackle, but rather differences in OpenGL implementation in general.
What sort of difference are we talking about?

Each card should have a baseline OpenGL version (i.e. 2.0), and all functionality present in that version should just work. Further functionality is provided by extensions, and if the card publishes an extension, that functionality should also just work.

If you are finding that a published version/extension has different behaviour between 2 different cards, then you should read the OpenGL specification carefully to see if you are actually using the feature correctly, and check the ATI/NVidia forums for known bugs, etc.

Off the top of my head, NVidia's shader compiler allows a lot of illegal/invalid shaders to compile, whereas ATI and Intel tend to enforce the standard a lot more thoroughly - things like this are worth looking into.

Share this post

Link to post
Share on other sites
Basically one of the issues I am trying to handle is that nVidia and ATI/AMD video cards share data between one another differently (especially with regards to Windows and their crappy OpenGL support). When using multiple video cards, I believe nVidia cards keep a copy of texture data and such one each card. However, ATI/AMD video cards don't do this, and you need to generate textures etc... for each video card. Otherwise, trying to draw a texture that is stored on a different video card will not work.

So basically I need to be able to figure out if the multiple video cards OpenGL is using are capable of sharing data between one another.

Share this post

Link to post
Share on other sites
Original post by Sigvatr
Basically one of the issues I am trying to handle is that nVidia and ATI/AMD video cards share data between one another differently (especially with regards to Windows and their crappy OpenGL support). When using multiple video cards, I believe nVidia cards keep a copy of texture data and such one each card. However, ATI/AMD video cards don't do this, and you need to generate textures etc... for each video card. Otherwise, trying to draw a texture that is stored on a different video card will not work.

So basically I need to be able to figure out if the multiple video cards OpenGL is using are capable of sharing data between one another.

There is no problem with OpenGL support on Windows, and if there is it has nothing to do with Windows as the OpenGL support is in the driver. Also, what you mention about textures is not an issue.

Iff you do something fancy that targets high-end systems with multiple workstation GPUs for tens of thousands of dollars and your only concern is speed, then and only then might you want to investigate multiple graphics cards working together in perfect harmony. Normally you just create an OpenGL context and it just works, you don't need to care about where the texture is stored.

If you are specifically concerned about multiple separate graphics cards, and need your game to work on any monitor, then the following page has some information: http://www.equalizergraphics.com/documentation/parallelOpenGLFAQ.html.
Basically, you can't do anything about it at all except create your window centered on the monitor you want to play on and let the driver do what it feels is best, unless you buy the $4000 Quadro model where you can use an extension to control which GPU handles which context. This still doesn't have anything to do with a texture within a context however. More importantly, it has nothing to do with games as no one plays games with those cards.

Share this post

Link to post
Share on other sites
Original post by Erik Rufelt

I've already read through this document thoroughly and I am not as pessimistic about implementing multiple monitor/video card support as you are. The bulk of my multiple monitor/video card support in my game engine is already mostly functional. The only problem I'm having is providing the engine with the capability of discerning whether or not it needs to manually share data between multiple video cards or if the video cards know how to do this.

Although multiple video cards are relatively uncommon at the moment, I want my game engine to be able to support them in the case that they do become commonplace one day. It already has become not unusual for desktop PCs to have two or more monitors, and some video cards these days contain more than a single GPU (I don't know what kind of implications this has from a programming perspective).

For the record, I have managed to successfully display multiple OpenGL contexts and scenes on all of my desktop's monitors (there are 5, connected to 3 video cards). However, these video cards are manufactured by ATI/AMD, so I do not know what would happen if I ran the engine in its current incarnation on a system with multiple nVidia video cards and monitors.

As far as I am currently aware, the only GPUs (integrated or not) that I am aware of are Intel, nVidia and ATI/AMD (AMD manufactures both integrated and external GPUs). If there are no other brands of GPUs, then most of my questions have been answered already.

Share this post

Link to post
Share on other sites
Everything in OpenGL belongs to one context. If enabled with wglShareLists or wglCreateContextAttribsARB, then two contexts can share data between them. This has nothing to do with the graphics cards, which is handled by the driver.

I agree that multi-monitor support is excellent, I just fail to see exactly what you're asking. There is no such thing as manually sharing data, unless you are referring to when sharing between contexts fails. If you use wglShareLists and it returns FALSE, then nothing is shared between the two contexts.

Perhaps I am misunderstanding you. When you say different graphics cards, do you mean different contexts created on different cards?
Nothing will ever be automatically shared between different contexts unless you ask for it. If sharing fails, then you have to manually create the same texture on each context, if the same texture is to be used on both contexts.

If you have one NVidia and one ATI card and have one context on each, then sharing between them will always fail according to the documentation for wglShareLists. That page, along with the documentation for pixel formats and http://www.opengl.org/registry/specs/ARB/wgl_create_context.txt has some more information on that.

As for support, I have tested on NVidia and Intel integraded graphics with the following results:

I have two NVidia cards and three monitors. Unless you have a Quadro then NVidia does not support choosing which card is used for a context (perhaps all are used together). The same OpenGL window can be dragged across all monitors and it works just the same. If the window is dragged to the monitor which does not belong to the same card as the primary monitor, then there is an extra overhead of a few ms, I guess from a transfer of the image across cards or similar. It never matters what HDC or window is used to create the context.

I also have a laptop with integrated Intel graphics. The primary monitor has good OpenGL performance, but a secondary monitor connected to the laptop has a bit worse performance. However, if the secondary monitor is made 'primary' in the monitor control panel then that has good performance instead, while the builtin display has worse performance. It never matters what HDC or window is used to create the context.

Because of these limitations, you can't really do more than create your game window on the monitor your user wishes to play on. It sounds however like you want to run your game on multiple monitors at the same time. Is that correct?
If you are designing such a game then I understand your concern, but according to what I have read you can't really do anything other than creating one window on each monitor.
You shouldn't need to handle this any differently on different graphics cards however, as that is taken care of by the driver. You just need to check the return value if you try to use sharing between contexts.

It is also possible to use the same context and SwapBuffers to multiple windows. The documentation for the wgl* functions has information on this. In particular, you need to select a compatible pixel format in the window you swap to. This is also likely to not work if you have two graphics cards of different vendors. Again, check the return values and display an error or create a separate context for a window if it fails.

Share this post

Link to post
Share on other sites
Original post by Erik Rufelt
Unless you have a Quadro then NVidia does not support choosing which card is used for a context (perhaps all are used together).

Actually, this is a relatively simple problem to solve; you simply need to specify the HDC that the window your OpenGL context will be associated with and the video card to be used with this context is determined by the position of your window (it works for me, although I might be technically incorrect).

It is also possible to use the same context and SwapBuffers to multiple windows. The documentation for the wgl* functions has information on this. In particular, you need to select a compatible pixel format in the window you swap to. This is also likely to not work if you have two graphics cards of different vendors.

I have considered using the above method when I first encountered difficulties sharing data across contexts/video cards. I have not tried it yet, although I am dubious about a few things, specifically which video cards are storing/handling what, the case you pointed out where the video cards are from different vendors and also how to handle different scenes on each monitor/context (ie, if one context does not need to perform a render when another does).

Any other thoughts?

If you are interested in my findings and successes so far, then I can provide you with information on how I achieved it, because you seem to be interested in this sort of thing.

Share this post

Link to post
Share on other sites
I just wanted to notify this topic's followers that I have successfully managed to display a single OpenGL context across all 5 of my monitors (using 3 video cards) without error. I haven't performed any extensive testing or benchmarks yet, but I am still hoping that I can configure my game engine to work with any combination of video cards.

I am beginning now to implement a video card object in my code that should help with this process.

Share this post

Link to post
Share on other sites
you asked what other vendors there are.
There is SiS, 3d labs, 3d fx, diamond multimedia, XGI, elsa, imagination technologies, matrox.

I have seen XGI on server boards.

Share this post

Link to post
Share on other sites
So basically I need to be able to figure out if the multiple video cards OpenGL is using are capable of sharing data between one another.

Call wglShareLists or wglCreateContextAttribsARB. If it succeeds, sharing is enabled and works. If it fails, no sharing exists between the two involved contexts.

Original post by Sigvatr
Actually, this is a relatively simple problem to solve; you simply need to specify the HDC that the window your OpenGL context will be associated with and the video card to be used with this context is determined by the position of your window (it works for me, although I might be technically incorrect).

No. This is not the case for Nvidia and Intel, only for ATI. Did you read my last reply?
It does not matter what window or HDC is used in creation of the context on an NVidia system. The OpenGL context always works on all monitors on the system, and this is handled automatically. You can never control which GPU is actually used in the rendering unless you have a Quadro driver with an extension for it.

If you are interested in my findings and successes so far, then I can provide you with information on how I achieved it, because you seem to be interested in this sort of thing.

I just wanted to notify this topic's followers that I have successfully managed to display a single OpenGL context across all 5 of my monitors (using 3 video cards) without error.

Yes I would be interested to hear. I've tried this on many different systems, and I've never had any errors. OpenGL windows can generally be dragged or stretched across multiple monitors, and for me it has always just worked, though sometimes with a performance penalty.
I've never tried it on a system with two separate ATI graphics cards though, and I would be interested to hear what problems there could be and how you solved them.

I'm still a bit uncertain what is the main issue we are discussing. Here are some different issues described below. Which of these are you interested in?
Or is there another problem that I have missed, that you could describe?

Issue 1. SwapBuffers a context to several different monitors.
We want an image from a single OpenGL context displayed on any monitor we choose.
If this fails, the window will just stay black on one or more monitors, but show the correct image only on some monitors.
I've never seen this fail.

Issue 2. Control which physical GPU is used to do the calculations when drawing on a context, which GPU actually runs the vertex and pixel shaders for example.
So if we have two GPUs of 500mhz, we may want to use one of them to render a billion triangles on one context, and use the other to render a post-processing effect on a billion pixels on the other, to achieve good parallelism.
As you have noticed you can control this on ATI by creating windows on different monitors and using the correct HDC. On NVidia this is impossible to control unless you have the Quadro extension.

Issue 3. Share the same texture object (or other things) on multiple contexts (different HGLRCs).
The point of this is if we do glGenTextures(1, &textureId) on one context, then we want to use the exact same textureId on a separate context without calling glGenTextures again.
This can be achieved only by calling wglShareLists or wglCreateContextAttribsARB. Those functions return success or failure, so it can be programmatically determined whether or not this works.
if(wglShareLists(..)) { shareTextures(); } else { recreateTextures(); }.

Issure 4. Performance.
When presenting a scene on different monitors, performance could suffer on some monitors. Is this what you are concerned about?
This basically comes down to Issue 2, and is not controllable unless you're on ATI or have the Quadro extension. It is however not really an issue if you give the user an option to choose which screen to render on, and default to the primary monitor which will always(?) have the best performance.

Share this post

Link to post
Share on other sites
I have some new information on my findings so far.

Rendering with a device context that spans across several screens and multiple video cards is extremely slow. It does solve the problem of sharing OpenGL data across screens/cards, but the results are less than thrilling.

I tried installing a much more powerful video card in place of one of the others and the improvement was very slight, perhaps 10 more frames per second or so. However, it seems that either:

a) OpenGL is making use of only a single video card to render to all of the monitors (perhaps using the other video cards as proxies to connect to the monitors).


b) The video cards are not optimized to work in parallel with one another.

So now the problem I am facing is either accepting an enormous loss in performance (a very significant amount indeed) or figuring out how to share data across contexts in the fashion that I have discussed in this thread previously.

Primarily the loss in performance comes when I make a call to SwapBuffers. Previously, I was looking at calling this function taking less than 1ms. Now that I've made use of one large device context instead of one for each monitor, SwapBuffers is taking longer than 20ms.

So, basically my experiments in using one giant video context have resulted in a performance loss of more than 95%; this is very discouraging.

I suppose on the bright side the support for single monitor rendering has not change...

There are still a few solutions I have not checked out, such as the GPU vender specific functions catering to multiple GPU rendering and the wglCreateContextAttribsARB function.

Any feedback would be much appreciated.

- Sig

Share this post

Link to post
Share on other sites
I need to amend my previous post slightly.

I believe my benchmarks for one device context and multiple were taken incorrectly. I'm now seeing no particular difference in performance for either.

I now believe that the capability of using multiple windows and video cards simply requires you to have particularly strong hardware, and my video cards simply aren't cutting it.

Share this post

Link to post
Share on other sites
Sign in to follow this  

  • Advertisement
  • Advertisement
  • Popular Tags

  • Advertisement
  • Popular Now

  • Similar Content

    • By too_many_stars
      Hello Everyone,
      I have been going over a number of books and examples that deal with GLSL. It's common after viewing the source code to have something like this...
      class Model{ public: Model(); void render(); private: GLSL glsl_program; }; ////// .cpp Model::Model(){ glsl_program.compileAndLinkShaders() } void Model::render(){ glsl_program.use() //render something glsl_program.unUse(); } Is this how a shader program should be used in real time applications? For example, if I have a particle class, for every particle that's created, do I want to compiling and linking a vertex, frag shader? It seems to a noob such as myself this might not be the best approach to real time applications.
      If I am correct, what is the best work around?
      Thanks so much for all the help,
    • By getoutofmycar
      I'm having some difficulty understanding how data would flow or get inserted into a multi-threaded opengl renderer where there is a thread pool and a render thread and an update thread (possibly main). My understanding is that the threadpool will continually execute jobs, assemble these and when done send them off to be rendered where I can further sort these and achieve some cheap form of statelessness. I don't want anything overly complicated or too fine grained,  fibers,  job stealing etc. My end goal is to simply have my renderer isolated in its own thread and only concerned with drawing and swapping buffers. 
      My questions are:
      1. At what point in this pipeline are resources created?
      Say I have a
      class CCommandList { void SetVertexBuffer(...); void SetIndexBuffer(...); void SetVertexShader(...); void SetPixelShader(...); } borrowed from an existing post here. I would need to generate a VAO at some point and call glGenBuffers etc especially if I start with an empty scene. If my context lives on another thread, how do I call these commands if the command list is only supposed to be a collection of state and what command to use. I don't think that the render thread should do this and somehow add a task to the queue or am I wrong?
      Or could I do some variation where I do the loading in a thread with shared context and from there generate a command that has the handle to the resources needed.
      2. How do I know all my jobs are done.
      I'm working with C++, is this as simple as knowing how many objects there are in the scene, for every task that gets added increment a counter and when it matches aforementioned count I signal the renderer that the command list is ready? I was thinking a condition_variable or something would suffice to alert the renderthread that work is ready.
      3. Does all work come from a singular queue that the thread pool constantly cycles over?
      With the notion of jobs, we are basically sending the same work repeatedly right? Do all jobs need to be added to a single persistent queue to be submitted over and over again?
      4. Are resources destroyed with commands?
      Likewise with initializing and assuming #3 is correct, removing an item from the scene would mean removing it from the job queue, no? Would I need to send a onetime command to the renderer to cleanup?
    • By Finalspace
      I am starting to get into linux X11/GLX programming, but from every C example i found - there is this XVisualInfo thing parameter passed to XCreateWindow always.
      Can i control this parameter later on - when the window is already created? What i want it to change my own non GLX window to be a GLX window - without recreating. Is that possible?
      On win32 this works just fine to create a rendering context later on, i simply find and setup the pixel format from a pixel format descriptor and create the context and are ready to go.
      I am asking, because if that doesent work - i need to change a few things to support both worlds (Create a context from a existing window, create a context for a new window).
    • By DiligentDev
      This article uses material originally posted on Diligent Graphics web site.
      Graphics APIs have come a long way from small set of basic commands allowing limited control of configurable stages of early 3D accelerators to very low-level programming interfaces exposing almost every aspect of the underlying graphics hardware. Next-generation APIs, Direct3D12 by Microsoft and Vulkan by Khronos are relatively new and have only started getting widespread adoption and support from hardware vendors, while Direct3D11 and OpenGL are still considered industry standard. New APIs can provide substantial performance and functional improvements, but may not be supported by older hardware. An application targeting wide range of platforms needs to support Direct3D11 and OpenGL. New APIs will not give any advantage when used with old paradigms. It is totally possible to add Direct3D12 support to an existing renderer by implementing Direct3D11 interface through Direct3D12, but this will give zero benefits. Instead, new approaches and rendering architectures that leverage flexibility provided by the next-generation APIs are expected to be developed.
      There are at least four APIs (Direct3D11, Direct3D12, OpenGL/GLES, Vulkan, plus Apple's Metal for iOS and osX platforms) that a cross-platform 3D application may need to support. Writing separate code paths for all APIs is clearly not an option for any real-world application and the need for a cross-platform graphics abstraction layer is evident. The following is the list of requirements that I believe such layer needs to satisfy:
      Lightweight abstractions: the API should be as close to the underlying native APIs as possible to allow an application leverage all available low-level functionality. In many cases this requirement is difficult to achieve because specific features exposed by different APIs may vary considerably. Low performance overhead: the abstraction layer needs to be efficient from performance point of view. If it introduces considerable amount of overhead, there is no point in using it. Convenience: the API needs to be convenient to use. It needs to assist developers in achieving their goals not limiting their control of the graphics hardware. Multithreading: ability to efficiently parallelize work is in the core of Direct3D12 and Vulkan and one of the main selling points of the new APIs. Support for multithreading in a cross-platform layer is a must. Extensibility: no matter how well the API is designed, it still introduces some level of abstraction. In some cases the most efficient way to implement certain functionality is to directly use native API. The abstraction layer needs to provide seamless interoperability with the underlying native APIs to provide a way for the app to add features that may be missing. Diligent Engine is designed to solve these problems. Its main goal is to take advantages of the next-generation APIs such as Direct3D12 and Vulkan, but at the same time provide support for older platforms via Direct3D11, OpenGL and OpenGLES. Diligent Engine exposes common C++ front-end for all supported platforms and provides interoperability with underlying native APIs. It also supports integration with Unity and is designed to be used as graphics subsystem in a standalone game engine, Unity native plugin or any other 3D application. Full source code is available for download at GitHub and is free to use.
      Diligent Engine API takes some features from Direct3D11 and Direct3D12 as well as introduces new concepts to hide certain platform-specific details and make the system easy to use. It contains the following main components:
      Render device (IRenderDevice  interface) is responsible for creating all other objects (textures, buffers, shaders, pipeline states, etc.).
      Device context (IDeviceContext interface) is the main interface for recording rendering commands. Similar to Direct3D11, there are immediate context and deferred contexts (which in Direct3D11 implementation map directly to the corresponding context types). Immediate context combines command queue and command list recording functionality. It records commands and submits the command list for execution when it contains sufficient number of commands. Deferred contexts are designed to only record command lists that can be submitted for execution through the immediate context.
      An alternative way to design the API would be to expose command queue and command lists directly. This approach however does not map well to Direct3D11 and OpenGL. Besides, some functionality (such as dynamic descriptor allocation) can be much more efficiently implemented when it is known that a command list is recorded by a certain deferred context from some thread.
      The approach taken in the engine does not limit scalability as the application is expected to create one deferred context per thread, and internally every deferred context records a command list in lock-free fashion. At the same time this approach maps well to older APIs.
      In current implementation, only one immediate context that uses default graphics command queue is created. To support multiple GPUs or multiple command queue types (compute, copy, etc.), it is natural to have one immediate contexts per queue. Cross-context synchronization utilities will be necessary.
      Swap Chain (ISwapChain interface). Swap chain interface represents a chain of back buffers and is responsible for showing the final rendered image on the screen.
      Render device, device contexts and swap chain are created during the engine initialization.
      Resources (ITexture and IBuffer interfaces). There are two types of resources - textures and buffers. There are many different texture types (2D textures, 3D textures, texture array, cubmepas, etc.) that can all be represented by ITexture interface.
      Resources Views (ITextureView and IBufferView interfaces). While textures and buffers are mere data containers, texture views and buffer views describe how the data should be interpreted. For instance, a 2D texture can be used as a render target for rendering commands or as a shader resource.
      Pipeline State (IPipelineState interface). GPU pipeline contains many configurable stages (depth-stencil, rasterizer and blend states, different shader stage, etc.). Direct3D11 uses coarse-grain objects to set all stage parameters at once (for instance, a rasterizer object encompasses all rasterizer attributes), while OpenGL contains myriad functions to fine-grain control every individual attribute of every stage. Both methods do not map very well to modern graphics hardware that combines all states into one monolithic state under the hood. Direct3D12 directly exposes pipeline state object in the API, and Diligent Engine uses the same approach.
      Shader Resource Binding (IShaderResourceBinding interface). Shaders are programs that run on the GPU. Shaders may access various resources (textures and buffers), and setting correspondence between shader variables and actual resources is called resource binding. Resource binding implementation varies considerably between different API. Diligent Engine introduces a new object called shader resource binding that encompasses all resources needed by all shaders in a certain pipeline state.
      API Basics
      Creating Resources
      Device resources are created by the render device. The two main resource types are buffers, which represent linear memory, and textures, which use memory layouts optimized for fast filtering. Graphics APIs usually have a native object that represents linear buffer. Diligent Engine uses IBuffer interface as an abstraction for a native buffer. To create a buffer, one needs to populate BufferDesc structure and call IRenderDevice::CreateBuffer() method as in the following example:
      BufferDesc BuffDesc; BufferDesc.Name = "Uniform buffer"; BuffDesc.BindFlags = BIND_UNIFORM_BUFFER; BuffDesc.Usage = USAGE_DYNAMIC; BuffDesc.uiSizeInBytes = sizeof(ShaderConstants); BuffDesc.CPUAccessFlags = CPU_ACCESS_WRITE; m_pDevice->CreateBuffer( BuffDesc, BufferData(), &m_pConstantBuffer ); While there is usually just one buffer object, different APIs use very different approaches to represent textures. For instance, in Direct3D11, there are ID3D11Texture1D, ID3D11Texture2D, and ID3D11Texture3D objects. In OpenGL, there is individual object for every texture dimension (1D, 2D, 3D, Cube), which may be a texture array, which may also be multisampled (i.e. GL_TEXTURE_2D_MULTISAMPLE_ARRAY). As a result there are nine different GL texture types that Diligent Engine may create under the hood. In Direct3D12, there is only one resource interface. Diligent Engine hides all these details in ITexture interface. There is only one  IRenderDevice::CreateTexture() method that is capable of creating all texture types. Dimension, format, array size and all other parameters are specified by the members of the TextureDesc structure:
      TextureDesc TexDesc; TexDesc.Name = "My texture 2D"; TexDesc.Type = TEXTURE_TYPE_2D; TexDesc.Width = 1024; TexDesc.Height = 1024; TexDesc.Format = TEX_FORMAT_RGBA8_UNORM; TexDesc.Usage = USAGE_DEFAULT; TexDesc.BindFlags = BIND_SHADER_RESOURCE | BIND_RENDER_TARGET | BIND_UNORDERED_ACCESS; TexDesc.Name = "Sample 2D Texture"; m_pRenderDevice->CreateTexture( TexDesc, TextureData(), &m_pTestTex ); If native API supports multithreaded resource creation, textures and buffers can be created by multiple threads simultaneously.
      Interoperability with native API provides access to the native buffer/texture objects and also allows creating Diligent Engine objects from native handles. It allows applications seamlessly integrate native API-specific code with Diligent Engine.
      Next-generation APIs allow fine level-control over how resources are allocated. Diligent Engine does not currently expose this functionality, but it can be added by implementing IResourceAllocator interface that encapsulates specifics of resource allocation and providing this interface to CreateBuffer() or CreateTexture() methods. If null is provided, default allocator should be used.
      Initializing the Pipeline State
      As it was mentioned earlier, Diligent Engine follows next-gen APIs to configure the graphics/compute pipeline. One big Pipelines State Object (PSO) encompasses all required states (all shader stages, input layout description, depth stencil, rasterizer and blend state descriptions etc.). This approach maps directly to Direct3D12/Vulkan, but is also beneficial for older APIs as it eliminates pipeline misconfiguration errors. With many individual calls tweaking various GPU pipeline settings it is very easy to forget to set one of the states or assume the stage is already properly configured when in fact it is not. Using pipeline state object helps avoid these problems as all stages are configured at once.
      Creating Shaders
      While in earlier APIs shaders were bound separately, in the next-generation APIs as well as in Diligent Engine shaders are part of the pipeline state object. The biggest challenge when authoring shaders is that Direct3D and OpenGL/Vulkan use different shader languages (while Apple uses yet another language in their Metal API). Maintaining two versions of every shader is not an option for real applications and Diligent Engine implements shader source code converter that allows shaders authored in HLSL to be translated to GLSL. To create a shader, one needs to populate ShaderCreationAttribs structure. SourceLanguage member of this structure tells the system which language the shader is authored in:
      SHADER_SOURCE_LANGUAGE_DEFAULT - The shader source language matches the underlying graphics API: HLSL for Direct3D11/Direct3D12 mode, and GLSL for OpenGL and OpenGLES modes. SHADER_SOURCE_LANGUAGE_HLSL - The shader source is in HLSL. For OpenGL and OpenGLES modes, the source code will be converted to GLSL. SHADER_SOURCE_LANGUAGE_GLSL - The shader source is in GLSL. There is currently no GLSL to HLSL converter, so this value should only be used for OpenGL and OpenGLES modes. There are two ways to provide the shader source code. The first way is to use Source member. The second way is to provide a file path in FilePath member. Since the engine is entirely decoupled from the platform and the host file system is platform-dependent, the structure exposes pShaderSourceStreamFactory member that is intended to provide the engine access to the file system. If FilePath is provided, shader source factory must also be provided. If the shader source contains any #include directives, the source stream factory will also be used to load these files. The engine provides default implementation for every supported platform that should be sufficient in most cases. Custom implementation can be provided when needed.
      When sampling a texture in a shader, the texture sampler was traditionally specified as separate object that was bound to the pipeline at run time or set as part of the texture object itself. However, in most cases it is known beforehand what kind of sampler will be used in the shader. Next-generation APIs expose new type of sampler called static sampler that can be initialized directly in the pipeline state. Diligent Engine exposes this functionality: when creating a shader, textures can be assigned static samplers. If static sampler is assigned, it will always be used instead of the one initialized in the texture shader resource view. To initialize static samplers, prepare an array of StaticSamplerDesc structures and initialize StaticSamplers and NumStaticSamplers members. Static samplers are more efficient and it is highly recommended to use them whenever possible. On older APIs, static samplers are emulated via generic sampler objects.
      The following is an example of shader initialization:
      ShaderCreationAttribs Attrs; Attrs.Desc.Name = "MyPixelShader"; Attrs.FilePath = "MyShaderFile.fx"; Attrs.SearchDirectories = "shaders;shaders\\inc;"; Attrs.EntryPoint = "MyPixelShader"; Attrs.Desc.ShaderType = SHADER_TYPE_PIXEL; Attrs.SourceLanguage = SHADER_SOURCE_LANGUAGE_HLSL; BasicShaderSourceStreamFactory BasicSSSFactory(Attrs.SearchDirectories); Attrs.pShaderSourceStreamFactory = &BasicSSSFactory; ShaderVariableDesc ShaderVars[] = {     {"g_StaticTexture", SHADER_VARIABLE_TYPE_STATIC},     {"g_MutableTexture", SHADER_VARIABLE_TYPE_MUTABLE},     {"g_DynamicTexture", SHADER_VARIABLE_TYPE_DYNAMIC} }; Attrs.Desc.VariableDesc = ShaderVars; Attrs.Desc.NumVariables = _countof(ShaderVars); Attrs.Desc.DefaultVariableType = SHADER_VARIABLE_TYPE_STATIC; StaticSamplerDesc StaticSampler; StaticSampler.Desc.MinFilter = FILTER_TYPE_LINEAR; StaticSampler.Desc.MagFilter = FILTER_TYPE_LINEAR; StaticSampler.Desc.MipFilter = FILTER_TYPE_LINEAR; StaticSampler.TextureName = "g_MutableTexture"; Attrs.Desc.NumStaticSamplers = 1; Attrs.Desc.StaticSamplers = &StaticSampler; ShaderMacroHelper Macros; Macros.AddShaderMacro("USE_SHADOWS", 1); Macros.AddShaderMacro("NUM_SHADOW_SAMPLES", 4); Macros.Finalize(); Attrs.Macros = Macros; RefCntAutoPtr<IShader> pShader; m_pDevice->CreateShader( Attrs, &pShader );
      Creating the Pipeline State Object
      After all required shaders are created, the rest of the fields of the PipelineStateDesc structure provide depth-stencil, rasterizer, and blend state descriptions, the number and format of render targets, input layout format, etc. For instance, rasterizer state can be described as follows:
      PipelineStateDesc PSODesc; RasterizerStateDesc &RasterizerDesc = PSODesc.GraphicsPipeline.RasterizerDesc; RasterizerDesc.FillMode = FILL_MODE_SOLID; RasterizerDesc.CullMode = CULL_MODE_NONE; RasterizerDesc.FrontCounterClockwise = True; RasterizerDesc.ScissorEnable = True; RasterizerDesc.AntialiasedLineEnable = False; Depth-stencil and blend states are defined in a similar fashion.
      Another important thing that pipeline state object encompasses is the input layout description that defines how inputs to the vertex shader, which is the very first shader stage, should be read from the memory. Input layout may define several vertex streams that contain values of different formats and sizes:
      // Define input layout InputLayoutDesc &Layout = PSODesc.GraphicsPipeline.InputLayout; LayoutElement TextLayoutElems[] = {     LayoutElement( 0, 0, 3, VT_FLOAT32, False ),     LayoutElement( 1, 0, 4, VT_UINT8, True ),     LayoutElement( 2, 0, 2, VT_FLOAT32, False ), }; Layout.LayoutElements = TextLayoutElems; Layout.NumElements = _countof( TextLayoutElems ); Finally, pipeline state defines primitive topology type. When all required members are initialized, a pipeline state object can be created by IRenderDevice::CreatePipelineState() method:
      // Define shader and primitive topology PSODesc.GraphicsPipeline.PrimitiveTopologyType = PRIMITIVE_TOPOLOGY_TYPE_TRIANGLE; PSODesc.GraphicsPipeline.pVS = pVertexShader; PSODesc.GraphicsPipeline.pPS = pPixelShader; PSODesc.Name = "My pipeline state"; m_pDev->CreatePipelineState(PSODesc, &m_pPSO); When PSO object is bound to the pipeline, the engine invokes all API-specific commands to set all states specified by the object. In case of Direct3D12 this maps directly to setting the D3D12 PSO object. In case of Direct3D11, this involves setting individual state objects (such as rasterizer and blend states), shaders, input layout etc. In case of OpenGL, this requires a number of fine-grain state tweaking calls. Diligent Engine keeps track of currently bound states and only calls functions to update these states that have actually changed.
      Binding Shader Resources
      Direct3D11 and OpenGL utilize fine-grain resource binding models, where an application binds individual buffers and textures to certain shader or program resource binding slots. Direct3D12 uses a very different approach, where resource descriptors are grouped into tables, and an application can bind all resources in the table at once by setting the table in the command list. Resource binding model in Diligent Engine is designed to leverage this new method. It introduces a new object called shader resource binding that encapsulates all resource bindings required for all shaders in a certain pipeline state. It also introduces the classification of shader variables based on the frequency of expected change that helps the engine group them into tables under the hood:
      Static variables (SHADER_VARIABLE_TYPE_STATIC) are variables that are expected to be set only once. They may not be changed once a resource is bound to the variable. Such variables are intended to hold global constants such as camera attributes or global light attributes constant buffers. Mutable variables (SHADER_VARIABLE_TYPE_MUTABLE) define resources that are expected to change on a per-material frequency. Examples may include diffuse textures, normal maps etc. Dynamic variables (SHADER_VARIABLE_TYPE_DYNAMIC) are expected to change frequently and randomly. Shader variable type must be specified during shader creation by populating an array of ShaderVariableDesc structures and initializing ShaderCreationAttribs::Desc::VariableDesc and ShaderCreationAttribs::Desc::NumVariables members (see example of shader creation above).
      Static variables cannot be changed once a resource is bound to the variable. They are bound directly to the shader object. For instance, a shadow map texture is not expected to change after it is created, so it can be bound directly to the shader:
      PixelShader->GetShaderVariable( "g_tex2DShadowMap" )->Set( pShadowMapSRV ); Mutable and dynamic variables are bound via a new Shader Resource Binding object (SRB) that is created by the pipeline state (IPipelineState::CreateShaderResourceBinding()):
      m_pPSO->CreateShaderResourceBinding(&m_pSRB); Note that an SRB is only compatible with the pipeline state it was created from. SRB object inherits all static bindings from shaders in the pipeline, but is not allowed to change them.
      Mutable resources can only be set once for every instance of a shader resource binding. Such resources are intended to define specific material properties. For instance, a diffuse texture for a specific material is not expected to change once the material is defined and can be set right after the SRB object has been created:
      m_pSRB->GetVariable(SHADER_TYPE_PIXEL, "tex2DDiffuse")->Set(pDiffuseTexSRV); In some cases it is necessary to bind a new resource to a variable every time a draw command is invoked. Such variables should be labeled as dynamic, which will allow setting them multiple times through the same SRB object:
      m_pSRB->GetVariable(SHADER_TYPE_VERTEX, "cbRandomAttribs")->Set(pRandomAttrsCB); Under the hood, the engine pre-allocates descriptor tables for static and mutable resources when an SRB objcet is created. Space for dynamic resources is dynamically allocated at run time. Static and mutable resources are thus more efficient and should be used whenever possible.
      As you can see, Diligent Engine does not expose low-level details of how resources are bound to shader variables. One reason for this is that these details are very different for various APIs. The other reason is that using low-level binding methods is extremely error-prone: it is very easy to forget to bind some resource, or bind incorrect resource such as bind a buffer to the variable that is in fact a texture, especially during shader development when everything changes fast. Diligent Engine instead relies on shader reflection system to automatically query the list of all shader variables. Grouping variables based on three types mentioned above allows the engine to create optimized layout and take heavy lifting of matching resources to API-specific resource location, register or descriptor in the table.
      This post gives more details about the resource binding model in Diligent Engine.
      Setting the Pipeline State and Committing Shader Resources
      Before any draw or compute command can be invoked, the pipeline state needs to be bound to the context:
      m_pContext->SetPipelineState(m_pPSO); Under the hood, the engine sets the internal PSO object in the command list or calls all the required native API functions to properly configure all pipeline stages.
      The next step is to bind all required shader resources to the GPU pipeline, which is accomplished by IDeviceContext::CommitShaderResources() method:
      m_pContext->CommitShaderResources(m_pSRB, COMMIT_SHADER_RESOURCES_FLAG_TRANSITION_RESOURCES); The method takes a pointer to the shader resource binding object and makes all resources the object holds available for the shaders. In the case of D3D12, this only requires setting appropriate descriptor tables in the command list. For older APIs, this typically requires setting all resources individually.
      Next-generation APIs require the application to track the state of every resource and explicitly inform the system about all state transitions. For instance, if a texture was used as render target before, while the next draw command is going to use it as shader resource, a transition barrier needs to be executed. Diligent Engine does the heavy lifting of state tracking.  When CommitShaderResources() method is called with COMMIT_SHADER_RESOURCES_FLAG_TRANSITION_RESOURCES flag, the engine commits and transitions resources to correct states at the same time. Note that transitioning resources does introduce some overhead. The engine tracks state of every resource and it will not issue the barrier if the state is already correct. But checking resource state is an overhead that can sometimes be avoided. The engine provides IDeviceContext::TransitionShaderResources() method that only transitions resources:
      m_pContext->TransitionShaderResources(m_pPSO, m_pSRB); In some scenarios it is more efficient to transition resources once and then only commit them.
      Invoking Draw Command
      The final step is to set states that are not part of the PSO, such as render targets, vertex and index buffers. Diligent Engine uses Direct3D11-syle API that is translated to other native API calls under the hood:
      ITextureView *pRTVs[] = {m_pRTV}; m_pContext->SetRenderTargets(_countof( pRTVs ), pRTVs, m_pDSV); // Clear render target and depth buffer const float zero[4] = {0, 0, 0, 0}; m_pContext->ClearRenderTarget(nullptr, zero); m_pContext->ClearDepthStencil(nullptr, CLEAR_DEPTH_FLAG, 1.f); // Set vertex and index buffers IBuffer *buffer[] = {m_pVertexBuffer}; Uint32 offsets[] = {0}; Uint32 strides[] = {sizeof(MyVertex)}; m_pContext->SetVertexBuffers(0, 1, buffer, strides, offsets, SET_VERTEX_BUFFERS_FLAG_RESET); m_pContext->SetIndexBuffer(m_pIndexBuffer, 0); Different native APIs use various set of function to execute draw commands depending on command details (if the command is indexed, instanced or both, what offsets in the source buffers are used etc.). For instance, there are 5 draw commands in Direct3D11 and more than 9 commands in OpenGL with something like glDrawElementsInstancedBaseVertexBaseInstance not uncommon. Diligent Engine hides all details with single IDeviceContext::Draw() method that takes takes DrawAttribs structure as an argument. The structure members define all attributes required to perform the command (primitive topology, number of vertices or indices, if draw call is indexed or not, if draw call is instanced or not, if draw call is indirect or not, etc.). For example:
      DrawAttribs attrs; attrs.IsIndexed = true; attrs.IndexType = VT_UINT16; attrs.NumIndices = 36; attrs.Topology = PRIMITIVE_TOPOLOGY_TRIANGLE_LIST; pContext->Draw(attrs); For compute commands, there is IDeviceContext::DispatchCompute() method that takes DispatchComputeAttribs structure that defines compute grid dimension.
      Source Code
      Full engine source code is available on GitHub and is free to use. The repository contains tutorials, sample applications, asteroids performance benchmark and an example Unity project that uses Diligent Engine in native plugin.
      Atmospheric scattering sample demonstrates how Diligent Engine can be used to implement various rendering tasks: loading textures from files, using complex shaders, rendering to multiple render targets, using compute shaders and unordered access views, etc.

      Asteroids performance benchmark is based on this demo developed by Intel. It renders 50,000 unique textured asteroids and allows comparing performance of Direct3D11 and Direct3D12 implementations. Every asteroid is a combination of one of 1000 unique meshes and one of 10 unique textures.

      Finally, there is an example project that shows how Diligent Engine can be integrated with Unity.

      Future Work
      The engine is under active development. It currently supports Windows desktop, Universal Windows, Linux, Android, MacOS, and iOS platforms. Direct3D11, Direct3D12, OpenGL/GLES backends are now feature complete. Vulkan backend is coming next, and Metal backend is in the plan.
    • By LifeArtist
      Good Evening,
      I want to make a 2D game which involves displaying some debug information. Especially for collision, enemy sights and so on ...
      First of I was thinking about all those shapes which I need will need for debugging purposes: circles, rectangles, lines, polygons.
      I am really stucked right now because of the fundamental question:
      Where do I store my vertices positions for each line (object)? Currently I am not using a model matrix because I am using orthographic projection and set the final position within the VBO. That means that if I add a new line I would have to expand the "points" array and re-upload (recall glBufferData) it every time. The other method would be to use a model matrix and a fixed vbo for a line but it would be also messy to exactly create a line from (0,0) to (100,20) calculating the rotation and scale to make it fit.
      If I proceed with option 1 "updating the array each frame" I was thinking of having 4 draw calls every frame for the lines vao, polygons vao and so on. 
      In addition to that I am planning to use some sort of ECS based architecture. So the other question would be:
      Should I treat those debug objects as entities/components?
      For me it would make sense to treat them as entities but that's creates a new issue with the previous array approach because it would have for example a transform and render component. A special render component for debug objects (no texture etc) ... For me the transform component is also just a matrix but how would I then define a line?
      Treating them as components would'nt be a good idea in my eyes because then I would always need an entity. Well entity is just an id !? So maybe its a component?
  • Advertisement