Max performance and support

Started by
11 comments, last by TheChubu 9 years, 1 month ago

After working on my engine for a while, I finally got to that point where I felt like it was time for a major refactoring!

Cleaning up everything, pointing out the things that work, the things that didn't or the ones that should change, and what not.

Well this got me thinking, one major thing I don't have and really don't know how to do is:

How do you check if a user's hardware can actually handle the OpenGL calls you're using?

Do you call them all on start up and check if you get a OpenGL Error back? (I hope its not something like this)

Or is it automatically done when you make your OpenGL context and target a specific version? Basically you get a guarantee that all the calls for that version are usable. EG I make a context for OpenGL 3.3 giving me guaranteed support for all OpenGL 3.3 and below calls

On a side note, one of my main things is a sprite batcher, which means lots of dynamic data is being sent to the GPU per frame.

So I was wondering what would be a solid performance test? How do I know I'm getting close to that 'magic number' that says "Using what you got now, you really can't do any better"?

Advertisement

I'm using a two level mechanism.

At application start-up, a context is created with the highest supported version (supported by the application). If it fails, then the next lower version is tried. If a context was made successfully, then the belonging dynamic library (with the rendering implementation) is loaded. The library was linked statically against all OpenGL functions that could be expected to exist for that version. Hence, if it fails to load, the context is destroyed and the next lower version is tried. This continues to happen until the lowest supported version has failed which, of course, means that the game cannot be run.

After successfully loading a library, its initialization code creates an own context, and looks up for extensions that will provide better implementations for some features. If not found, then the particular basic implementation will be used. Graphic rendering is controlled by enqueued graphic rendering jobs, so it is already some kind of data driven routine invocation. The library's initialization code hence sets some of its job executing function pointers to other code.

In theory checking the GL_VERSION supported by your driver and cross-checking that with the appropriate GL specification should tell you what you need.

In practice OpenGL itself makes no guarantee that any given feature is going to be hardware accelerated. It's perfectly legal for a driver to advertise a feature and have full support for it, but to drop you back to a software emulated path if you actually try to use it.

OpenGL gives you absolutely no way of knowing if this is going to happen, and if you drop back to software emulation per-vertex you may not even notice it - it will be slower but it may not be sufficiently slow for you to clearly determine if it's a software fallback or if it's just a more generic performance issue in your own code.

If you drop back to software emulation at a deeper part of the pipeline - per fragment or in the blend stage - you'll almost definitely notice it because you'll be getting about 1 fps.

The only way to satisfactorily know this is to know which features do or don't play well with current hardware and any previous generations you want to support. For example, some of the earliest GL 2.0 hardware (from 10 years ago now, so you almost definitely don't want to support it, but it's useful to cite as an example) advertised support for non-power-of-two textures but only supported them in a software emulated path, so it's hello to 1 fps.

A rule of thumb might be to only rely on a feature being hardware accelerated if it's from a GL_VERSION or two below what the driver advertises, but that's obviously also a (fairly heavy-handed) constraint - most features are actually perfectly OK, but it's those handful that cause trouble that you need to watch out for.

Direct3D has need of instancing, but we do not. We have plenty of glVertexAttrib calls.

I think both of you are saying yes to there is a overall check for supported features for that version.
That if I can successfully create a OpenGL context using my desired target version, then any function documented to be apart of that implementation can be used.

Things I'm still kind of shady on:
Does successfully creating a OpenGL context (EG say a OpenGL 4.0 context gets created successfully), give me the guarantee that lower versions of OpenGL functions (EG OpenGL 3.3) can be used too? Not functionality in terms of the hardware like supporting textures that are non-power of two (Or is this considered OpenGL functionality). But in the sense of being able to use glMapBufferRange or glBufferSubData without having separate contexts.

Or is this interchangeable, because a higher version of OpenGL automatically has support for lower level OpenGL functions?

The library was linked statically against all OpenGL functions that could be expected to exist for that version


Are you implying that this is automatically done when the context is created?
As in if I do nothing special, no library from absolute scratch implementation, create a OpenGL context targeting my applications highest supported version. I will automatically get support for those version's functions

Or are you saying that the library that you are using does some additional work to check?

After successfully loading a library, its initialization code creates an own context, and looks up for extensions that will provide better implementations for some features.


Are talking about the ARB extensions here? Functions such as GL_ARB_Map_Buffer_Range?

In practice OpenGL itself makes no guarantee that any given feature is going to be hardware accelerated

Can you force OpenGL to only use hardware accelerated features? If you can how is this done?

In practice OpenGL itself makes no guarantee that any given feature is going to be hardware accelerated

Can you force OpenGL to only use hardware accelerated features? If you can how is this done?

No, there is no such notion of hardware acceleration. OpenGL was created in a time where graphics commands were sent from a client computer into a main server over ethernet running SGI's hardware.
The main goal was to have the graphics rendered at all costs.

However mhagain may be exaggerating the state considering current OpenGL's status; it was very common and annoying until 5 or 6 years ago; but I haven't seen a GL implementation that fallbacks to software rendering in quite a long time.


In practice OpenGL itself makes no guarantee that any given feature is going to be hardware accelerated. It's perfectly legal for a driver to advertise a feature and have full support for it, but to drop you back to a software emulated path if you actually try to use it.
Do you know of any concrete example of an implementation that provides an OpenGL 3.2+ core context but it emulates some features in software? I keep hearing this, yet while being true in the spec, isn't followed by examples of situations in which it happened.

"I AM ZE EMPRAH OPENGL 3.3 THE CORE, I DEMAND FROM THEE ZE SHADERZ AND MATRIXEZ"

My journals: dustArtemis ECS framework and Making a Terrain Generator

In practice OpenGL itself makes no guarantee that any given feature is going to be hardware accelerated. It's perfectly legal for a driver to advertise a feature and have full support for it, but to drop you back to a software emulated path if you actually try to use it.

Do you know of any concrete example of an implementation that provides an OpenGL 3.2+ core context but it emulates some features in software? I keep hearing this, yet while being true in the spec, isn't followed by examples of situations in which it happened.

Not 3.2, but on 2.1 I used dynamic indexing of an array of uniform variables inside a fragment shader, and my FPS dropped from 60 to 1 -- a sure sign that the driver has reverted to software emulation sad.png

but I haven't seen a GL implementation that fallbacks to software rendering in quite a long time.

iOS partially does in certain circumstances, such as when attributes in a vertex buffer are misaligned.
I say “partially” because it isn’t emulating the full rendering pipeline, it just adds a huge CPU cost because it manually, on the CPU, makes an aligned copy of the vertex data every frame, and basically causes a drastic change in performance without giving you a clue that it is doing it, making it basically the same problem as going into emulation mode (nothing on iOS is fully emulated—it either is hardware accelerated or it fails).

In my own engine for desktop OpenGL 3.3/4.5, I suspect I’ve hit a slow path unknowingly too.

I’ve been very careful with its development and putting in a lot of effort to make sure the OpenGL ports run as close to the performance of Direct3D 11 as they can, and until last week I was at roughly 80-90%.

Suddenly after getting some new models for play-testing Direct3D 9 and Direct3D 11 are around 11,000 and 14,000 FPS respectively, whereas OpenGL dropped to 400 FPS.

I intend to allocate some time to investigate this in detail this weekend, but basically while full-on emulation is rare these days, there are still a million cases that cause it to do unnecessary CPU work.

L. Spiro

I restore Nintendo 64 video-game OST’s into HD! https://www.youtube.com/channel/UCCtX_wedtZ5BoyQBXEhnVZw/playlists?view=1&sort=lad&flow=grid


Not 3.2, but on 2.1 I used dynamic indexing of an array of uniform variables inside a fragment shader, and my FPS dropped from 60 to 1 -- a sure sign that the driver has reverted to software emulation

Of course but I asked for OpenGL 3 hardware examples for a reason, as to raise the point that this complaint keeps coming up. It seems as if OpenGL users are like the ARB, holding onto the past too much tongue.png

In any case, knowing if feature X is emulated in Y cards is good knowledge to have, which is also why I asked.

iOS partially does in certain circumstances, such as when attributes in a vertex buffer are misaligned.

ie, not aligned to 16 bytes? ES 2? I've seen thrown around that explicit 16 byte alignment is good for some desktop hardware too, AMD cards apparently. I'm assuming if you're using another API (Mantle? ES 3?) you'd have to do proper alignment in any case.

"I AM ZE EMPRAH OPENGL 3.3 THE CORE, I DEMAND FROM THEE ZE SHADERZ AND MATRIXEZ"

My journals: dustArtemis ECS framework and Making a Terrain Generator

ie, not aligned to 16 bytes?

Not aligned according to the guidelines here.

At least that page mentions the extra work that needs to be done. For every other OpenGL * implementation you are just guessing.

L. Spiro

I restore Nintendo 64 video-game OST’s into HD! https://www.youtube.com/channel/UCCtX_wedtZ5BoyQBXEhnVZw/playlists?view=1&sort=lad&flow=grid

This topic is closed to new replies.

Advertisement