DirectX vs OpenGL ?

Started by
39 comments, last by 21st Century Moose 12 years, 7 months ago

OpenGL's OS portability is better, yes, but it's graphics hardware portability is actually significantly worse.

Plus D3D is faster:
[quote name='YogurtEmperor' timestamp='1313933377' post='4851886']DirectX is also significantly faster than OpenGL on most Windows implementations


I'll second that. I've consistently found the same, and so has this guy: http://aras-p.info/blog/2007/09/23/is-opengl-really-faster-than-d3d9/
[/quote]
I would be very careful with such bold claims. If you take a look at the numbers in that blog (up to 50% slowdown on OpenGL), it becomes painfully obvious that this guys OpenGL path is doing something horribly wrong. Personally I have witnessed balance going both ways depending on the position of the bottleneck in the pipeline (NVidia, modern GL4 code versus comparable DX11 code). However, differences were rarely significant (and again, both ways) and can majorly be attributed to the differences in the ASM generated by the GLSL and HLSL shader compilers.

It is very difficult to compare different graphics APIs on a performance level, especially if they use different paradigms. I would guess that in 99% of all cases a person claims that API x is faster than API y, the difference can be attributed to incorrect or suboptimal usage of API y. A call to function A may be more efficient than a call to function B on one API, but it may present itself the other way round on the other API. Accommodating such differences may require a substantial change to the rendering logic and possibly even to rendering algorithms. Most people don't do that and quickly slip into what can be a pathological case for one API or another.

A clear exception is the D3D9 draw call overhead mentioned in the blog above. Other than what the blog claims, this behaviour is well documented and perfectly reproducible (and acknowledged by Microsoft). Fixing it was a major improvement of D3D10.
Advertisement
As for DirectX 11, there is no comparison to any other graphics API in existence. In my most simplistic and optimal cases, the same scene is over 5 times faster in Direct3D 11 than it is in Direct3D 9.
I literally get 400 FPS in Direct3D 9 and 2,000 FPS in Direct3D 11.


As for OpenGL vs. Direct3D 9, I kept reliable records.
My records are based off the construction of my engine from the very start, when the engine was nothing more than activating a vertex buffer, an optional index buffer, a shader, and 0 or 1 textures.
Both the Direct3D and OpenGL pipelines were equally this simple, and both fully equipped to eliminate any redundant states that could occur from this primitive system.

My results.

Frankly I was generous in saying that OpenGL was 95% the speed of Direct3D 9 in all cases, because when I upgraded my DirectX SDK to the latest version that number became 75% but I didn’t have the heart to upgrade my blog accordingly.

As of the latest DirectX API, there is no contest. And unless OpenGL can make a 5-fold improvement on their speed without multi-threaded rendering, there never will be, compared to DirectX 11.


I am talking about a rendering system that started as primitive as it could be, without even texture support or index buffer support (eliminating most of the potential pitfalls that could slow down my OpenGL implementation), and has grown to be fairly modernized. In all of that time, throughout all stages, OpenGL has never been able to keep up with the speed of DirectX.
But as mentioned in the post, this is on Windows, where the OpenGL specification is not required to meet any certificate of quality. Many vendors may just be lazy when it comes to OpenGL on PC, but may be great on Linux or Macintosh.


L. Spiro

I restore Nintendo 64 video-game OST’s into HD! https://www.youtube.com/channel/UCCtX_wedtZ5BoyQBXEhnVZw/playlists?view=1&sort=lad&flow=grid


As for OpenGL vs. Direct3D 9, I kept reliable records.
My records are based off the construction of my engine from the very start, when the engine was nothing more than activating a vertex buffer, an optional index buffer, a shader, and 0 or 1 textures.
Both the Direct3D and OpenGL pipelines were equally this simple, and both fully equipped to eliminate any redundant states that could occur from this primitive system.

My results.

I'm sorry to be so blunt, but your results are bogus from a performance analysis point of view.

First of all, you are comparing benchmarks in frames per second, ie. in non-linear space. Read this article for an explanation how this approach is flawed.

Second, you are benchmarking with a far too high framerate. As a rough and dirty guideline, all benchmarks that include FPS rates above 1000 are useless. The frametime at 1000 fps is 1ms. No API is designed to operate at this framerate. You will run into all kinds of tiny constant overheads that may affect performance in every possible unpredictable way. You don't leave the time for the GPU to amortize overheads. For real solid benchmarking results, you must do much heavier work. Not necessarily more complex, just more. Get your frametimes up, remove constant driver and API overheads. Benchmark your API within the range it is supposed to operate in. And be sure to know what part of the pipeline you're actually benchmarking, which leads us to the next point.

Third, benchmarking a graphics subsystem is much more complex than just hitting out a single number. What part of the pipeline are you benchmarking ? Do you actually know ? In your case, you essentially always benchmark the API overhead - function calls, internal state management, maybe command buffer transfer. In other words, you are benchmarking the CPU, not even the GPU ! Read this document. Although a bit dated (and it doesn't represent modern pipelines very well anymore), it gives a good basic overview over how to measure performance in a graphics system. A real engine is going to be bottlenecked within the GPU. So you should actually measure this.

As a conclusion, it is important to learn about how to properly conduct meaningful benchmarks before making claims based on flawed data.


As of the latest DirectX API, there is no contest. And unless OpenGL can make a 5-fold improvement on their speed without multi-threaded rendering, there never will be, compared to DirectX 11.

This does not make any sense. Again, read up on bottlenecks. Even if an API would reduce its overhead to zero (which is partially possible by removing it entirely and talking to the hardware directly, as it is done in many consoles), the final impact on game performance is often very small. Sometimes it's not even measurable if the engine bottleneck is on the GPU (which is almost always the case on modern engines). The more work is offloaded to the GPU, the less important API overhead becomes.

The much more important question, which can indeed make a large difference between APIs and drivers, is the quality of the optimizers for the APIs native shader compiler.
I would be very careful with such bold claims. If you take a look at the numbers in that blog (up to 50% slowdown on OpenGL), it becomes painfully obvious that this guys OpenGL path is doing something horribly wrong.


As he's one of the guys behind Unity I'd work on the assumption that he has a reasonable enough idea of what he's talking about. Plus his renderer was originally written for, and optimized around, OpenGL (also mentioned in the blog post) and had D3D shoehorned on after the fact.

Direct3D has need of instancing, but we do not. We have plenty of glVertexAttrib calls.

The blog post is also from Sept 23rd 2007; that is long enough ago that the data point is no longer relevant. Hell, AMD alone between then and now would have put out around 47 driver updates in that time.

I'm sorry to be so blunt, but your results are bogus from a performance analysis point of view.

First of all, you are comparing benchmarks in frames per second, ie. in non-linear space. Read this article for an explanation how this approach is flawed.

I remember that from some time ago but wasn’t thinking too hard about it during my tests. At the time I wasn’t planning on it being a serious benchmark; just using it for my own reference to see what made the system go up and down. What is skewed my presentation is the amount of change.
I will go back to my old test cases and change to frame time and update my blog, but but the results will still show Direct3D 9 as the clear winner in actual speed.


Second, you are benchmarking with a far too high framerate. As a rough and dirty guideline, all benchmarks that include FPS rates above 1000 are useless. The frametime at 1000 fps is 1ms. No API is designed to operate at this framerate. You will run into all kinds of tiny constant overheads that may affect performance in every possible unpredictable way. You don't leave the time for the GPU to amortize overheads. For real solid benchmarking results, you must do much heavier work. Not necessarily more complex, just more. Get your frametimes up, remove constant driver and API overheads. Benchmark your API within the range it is supposed to operate in. And be sure to know what part of the pipeline you're actually benchmarking, which leads us to the next point.

I have tried many more tests than just what I posted. I have tried complex models and small ones, and on many types of computers with various graphics cards.
The numbers go up and down per machine, but never are they disproportional. The results, under all conditions on all Windows x86 and x64 machines using various ATI and GeForce cards I tried is that OpenGL loses in speed.
After upgrading to the latest DirectX SDK OpenGL doesn’t even win under fluke conditions. The gap is just too high.




This does not make any sense. Again, read up on bottlenecks. Even if an API would reduce its overhead to zero (which is partially possible by removing it entirely and talking to the hardware directly, as it is done in many consoles), the final impact on game performance is often very small. Sometimes it's not even measurable if the engine bottleneck is on the GPU (which is almost always the case on modern engines). The more work is offloaded to the GPU, the less important API overhead becomes.

The much more important question, which can indeed make a large difference between APIs and drivers, is the quality of the optimizers for the APIs native shader compiler.

It is true that I have yet to use extensions specific to vendors.
But until now I have limited chances to do so.
The reason I say my results are reliable (regardless of my flawed presentation of them) is that they represent the most primitive set of API calls possible.
What happens on each frame:
#1: Activate vertex buffers.
#2: Activate shaders.
#3: Update uniforms in shaders.
#4: Render with or without an index buffer.

Not even assigning textures to slots for those benchmarks. Not changing any non-essential states such as culling or lighting. All matrix operations are from my own library and exactly the same overhead in DirectX.
I have very little room for common OpenGL performance pitfalls. In such a simple system, I am open to ideas of what I may have missed that could help OpenGL come closer to Direct3D.
I tried combinations of VBO’s. VBO’s for only large buffers, VBO’s for all buffers, etc.
Redundancy checks prevent the same shader from being set twice in a row. That did help a lot. But it helped Direct3D just as much.

I have no logical loop. Only rendering, so although you say I am benchmarking the CPU, I am only doing so as far as I am bound to call OpenGL API functions from the CPU, which is how all others are bound too.


I said clearly in my blog that OpenGL on Windows suffers compared to OpenGL on other platforms. I will be able to benchmark those soon, but the facts are that OpenGL implementations on Windows are mostly garbage. I am not saying OpenGL itself is to blame directly, but indirectly they are to blame for not forcing quality checks on vendor implementations.


L. Spiro


PS: Those are some good articles. I consider anything worth doing is worth doing right and fully, so I will be sure to follow the advice of those articles in further testing and re-testing.

I restore Nintendo 64 video-game OST’s into HD! https://www.youtube.com/channel/UCCtX_wedtZ5BoyQBXEhnVZw/playlists?view=1&sort=lad&flow=grid

Perhaps a pro/contra-comparison would help the thread-starter a bit more in his decision.
And I'm also interested in a comparison :)

What about completing a list:

DirectX:
+ has multiple "modules" for everything you need (3D, sound, input...)
+ "Managed DirectX" (works with .NET) => many programming languages
+ can load meshes and textures
+ good tools (PIX, debug runtime, visual assist...)
+ some professional programs (Autodesk Inventor)
+ frequently used (mostly: games) => good support
+ msdn documentation

- more difficult to understand
- only works with Windows (~> wine on linux)


OpenGL:
+ easy to get first results (but also difficult with complex ones)
+ runs on different platforms
+ frequently used with professional programs (CAD...), sometimes also in games
+ many open source libs (SDL, GLFW...)

- suffers from bad drivers (mostly with Windows, low cost graphic cards)
- GLSL: different errors (or no errors) on different machines
- "extension chaos"
- new standard takes long time


Any amendments, completions or mistakes I made?
(note: I omit the performance issue, cause there seem to be many conflicting analysis/benchmarks)

[EDITS:
* changed "open source" to "many open source libs"
* changed "easier" in "easy to get first results"
* removed closed source from DX
* added "good tools" in DX
* added "some prof progs" in DX
* added GLSL problem
]

OpenGL
+ open source


This misconception needs to be killed stone dead. OpenGL is not open source and it's not actually even software. The "Open" in it's name refers to it's status as a standard and has absolutely nothing whatsoever to do with source code. OpenGL is a specification for an interface between your program and the graphics hardware, and vendors implement this interface in their drivers (hence the fact that you see frequent references to your "OpenGL implementation" in the docs).

I'd remove "closed source" as a minus for D3D too as it's not a relevant comparison (due to the above), and add the fact that D3D has better tools (Pix, the debug runtimes, etc) for developers. And maybe also add that D3D is now a frequent choice for CAD tools too (Autodesk Inventor being one example that I linked to earlier).

Direct3D has need of instancing, but we do not. We have plenty of glVertexAttrib calls.


[quote name='Gl_Terminator' timestamp='1313512841' post='4849912']
Use directX i was an OpenGL fan but finally i gave up, directX has a lot more support, and complex stuff are handled more easy,

Could you be more specific please?
[/quote]

heheh has you even tried to enable full screen anti-aliasing with openGl, or tried to draw 2D content, or use VBO, or better has you even tried to make your own GSL script. dude I am telling you OpenGL at the end is more difficult than DX and I find that out after making my own game in opengl and then porting it to DX.

The reason I say my results are reliable (regardless of my flawed presentation of them) is that they represent the most primitive set of API calls possible.

As outlined above, unless you completely change your approach to benchmarking the APIs, your results are invalid. It's not a question about presentation, it's your core benchmarking method that is flawed. If you want reliable benchmarking results, then I suggest your first learn about how to properly benchmark a graphics system. Then build a performance analysis framework around this. You will be surprised by the results - because there will more than likely be zero difference, except for the usual statistical fluctuations.


Not even assigning textures to slots for those benchmarks. Not changing any non-essential states such as culling or lighting. All matrix operations are from my own library and exactly the same overhead in DirectX.
I have very little room for common OpenGL performance pitfalls. In such a simple system, I am open to ideas of what I may have missed that could help OpenGL come closer to Direct3D.
I tried combinations of VBO’s. VBO’s for only large buffers, VBO’s for all buffers, etc.
Redundancy checks prevent the same shader from being set twice in a row. That did help a lot. But it helped Direct3D just as much.

I have no logical loop. Only rendering, so although you say I am benchmarking the CPU, I am only doing so as far as I am bound to call OpenGL API functions from the CPU, which is how all others are bound too.

This perfectly outlines what I was trying to explain to you in my post above: you don't understand what you are benchmarking. You are analyzing the CPU bound API overhead. Your numbers may even be meaningful within this particular context. However, and that is the point, these numbers don't say anything about the API 'performance' (if there even is such a thing) !

I will try to explain this a bit more. What you need to understand is that the GPU is an independent processing unit that largely operates without CPU interference. Assume you have a modern SM4+ graphics card. Assuming further a single Uber-shader (which may not always be a good design choice, but let's take this as an example), fully atlased/arrayed textures, uniform data blocks and no blending / single pass rendering. Rendering a full frame would essentially look like this:

ActivateShader()
ActivateVertexStreams()
UploadUniformDataBlock()
RenderIndexedArray()
Present/SwapBuffers() -> Internal_WaitForGPUFrameFence()

In pratice you would use at least a few state changes and possibly multiple passes, but the basic structure could look like this. What happens here ? The driver (through the D3D/OpenGL API) sends some very limited data to the GPU (the large data blocks are already in VRAM) - and then waits for the GPU to complete the frame unless it can defer a new frame or cue up more frames to the command FIFO. Yup, the driver waits. This is a situation that you call GPU-bound. Being fill-rate limited, texture or vertex stream memory bandwidth bound, vertex transform bound - all these are GPU-bound scenarios.

And now comes the interesting part: neither OpenGL nor D3D have anything to do with all this ! Once the data and the commands are on the GPU, the processing will be exactly the same, whether you're using OpenGL or D3D. There will be absolutely no performance differences. Zero. Nada.

What you are measuring is only the part where data is manipulated CPU side. This part is only relevant if you are CPU bound, ie. if the GPU is waiting for the CPU. And this is a situation that an engine programmer will do anything to avoid. It's a worst case scenario, because it implies you aren't using the GPU to its fullest potential. Yet, this is exactly the situation you currently benchmark !

If you are CPU bound, then this usually means that you are not correctly batching your geometry or that you are doing too many state changes. This doesn't imply that the API is 'too slow', it usually implies that your engine design, data structures or assets need to be improved. Sometimes this is not possible or a major challenge, especially if you are working on legacy engines designed around a high frequency state changing / FFP paradigm. But often it is, especially on engines designed around a fully unified memory and shader based paradigm, and you can tip the balance back to being GPU bound.

So in conclusion: CPU-side API performance doesn't really matter. If you are GPU bound, even a ten-fold performance difference would have zero impact on framerate, since it would be entirely amortized while waiting for the GPU. Sure, you can measure these differences - but they are meaningless in a real world scenario. It is much more important to optimize your algorithms, which will have orders of magnitude more effect on framerate than the CPU call overhead.


* changed "open source" to "many open source extensions"

There are no "open source extensions". OpenGL has absolutely nothing to do with open source.

This topic is closed to new replies.

Advertisement